Archiving Gmail Chat Logs with Ruby

by J.R. Gutierrez on April 5, 2011

So recently, I took it upon myself to migrate from one Google Account to another, finding it fairly easy to transfer my Gmail Archives via IMAP. I quickly found out that the chat logs were not accessible via the IMAP interface. A quick Google search brought up Collin’s python script that uses libgmail. Unfortunately, libgmail is not being maintained anymore and doesn’t work with the latest Gmail interface. In the comments however, it was discussed that you can use Google Gears to backup your chatlogs. That also turned out to be a dead end, mainly because long chats are truncated, just like how they are in the regular interface, and RAW files aren’t available.

I decided to write my own solution, using Ruby. Download gmail_export.rb here. You’re going to need the Mechanize and DataMapper gems before it works. Basically, you edit gmail_export.rb with the correct username/password, which label to export, and the delay between each page get. What I did was do a search of “is:chat”, and label the result as “chatlogs”. I put in the right variables, and let it run. 2800 chatlogs took about 2hrs.

Next I used Thunderbird with the ImportExportTools Add-on to upload it into the new Gmail account. Unfortunately, you lose timestamps within the chats and Gmail refuses to recognize the messages as chats, but you can search just like you would any other item in your mailbox.

Interesting design notes: I use DataMapper and sqlite3 to record the URLs visited already, just in case you lose connection for some reason and have to restart the script. I specifically wanted to protect myself from account lockouts and network timeouts. You can also download any other message aside from chats using the script, but I’d rather use IMAP instead.

Needless to say, you are probably breaking Google’s ToS, so I’m not responsible for any damaged caused. Enjoy and happy scraping.

Installation instructions:

wget http://9seats.com/wp-content/uploads/gmail_export.rb
gem install datamapper dm-sqlite-adapter mechanize
ruby gem_export.rb

Update:

  • 0.1 – Initial upload.
  • 0.2 – 2-way authentication support added.
  • 0.3 – ruby19 check and ruby18 workaround.
  • 0.4 – Status reports are more informative. Error checking for nil objects. Code cleanup.
  • 0.5 – Adds username and password input if the default values don’t work.
  • 0.5.1 – Fix URI being changed by Google.

Google is now allowing IMAP export of chatlogs! http://dataliberation.blogspot.com/2011/09/gmail-liberates-recorded-chat-logs-via.html

{ 46 comments… read them below or add one }

Daniel Drucker April 6, 2011 at 7:44 am

Can you think of a way to have this work with two-factor? E.g., maybe it prompts you at runtime for the code?

Reply

J.R. Gutierrez April 6, 2011 at 7:57 am

Yeah, I’ll look into it. I ran into this problem also and couldn’t use the Application Specific Passwords. Just disabled two-factor for a couple of hours, and turned it back on when I was done.

J.R. Gutierrez April 6, 2011 at 1:44 pm

Ok, 2-way support is complete and uploaded. Go ahead and re-download the script.

Daniel Drucker April 6, 2011 at 11:27 am

When you disabled/reenabled two-factor, did it kill all your existing app-specific passwords, or did they remain valid?

Reply

J.R. Gutierrez April 6, 2011 at 11:30 am

They remained valid. Only your one-time emergency passwords get reset to new values.

Daniel Drucker April 6, 2011 at 5:34 pm

Please enter verification pin: 844132
Starting.
gmail_export.rb:94:in `strip_invalid_utf8_chars’: undefined method `valid_encoding?’ for # (NoMethodError)
from gmail_export.rb:128:in `run’
from gmail_export.rb:124:in `each’
from gmail_export.rb:124:in `run’
from gmail_export.rb:119:in `each’
from gmail_export.rb:119:in `run’
from gmail_export.rb:165

Reply

J.R. Gutierrez April 6, 2011 at 6:07 pm

Ugh, uploaded version 0.3, checking for RUBY version. 1.8 doesn’t have that method for strings.

Daniel Drucker April 7, 2011 at 7:42 am

It ran for about two hours, and then died without writing any output:

…………………….X…………………….X…………………….X…..Error, Retrying…execution expired
.gmail_export.rb:130:in `run’: undefined method `links’ for # (NoMethodError)
from gmail_export.rb:126:in `each’
from gmail_export.rb:126:in `run’
from gmail_export.rb:170

Maybe it would be nice if it could checkpoint its state somehow to disk?

Reply

J.R. Gutierrez April 7, 2011 at 8:46 am

The history is there. It should start from where you left off when you run it again. Looks like it wasn’t able to parse the page properly or you got temporarily banned or something. I’ll start outputting the body of the lat page, and going ahead with the dump of captured eml files.

version 0.4 is up.

Daniel Drucker April 7, 2011 at 11:03 am

…. and now Google has locked me out of my account. Fuck.

Reply

Daniel Drucker April 7, 2011 at 12:46 pm

I’m back in now, but, yeah, this script is probably a bad idea to use.

J.R. Gutierrez April 7, 2011 at 12:52 pm

Thats why I have a delay variable setup. I’ve been doing 1 sec consistently when doing the modifications, and doing 0.5 when I’m logged off on my normal browser. 5.0 seconds between reads is ultrasafe, but will take 5x as long.

I’m guessing Google uses an average number of clicks per time interval and disables you if you pass a threshold. I might’ve accidentally left the delay at 0.5 the last upload, so sorry.

Ronnie May 7, 2011 at 12:31 am

I basically installed Ruby for the first time in my life to run this script. But Now I’m getting this error
—————————————————————————————–
DataObjects::URI.new with arguments is deprecated, use a Hash of URI components (C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-do-adapter-1.1.0/lib/dm-do-adapter/adapter.rb:231:in `new’)
—————————————————————————————–
Seems like somethings is wrong with the Datamapper gem. Any suggestions/ideas on what i should do to make this work?

Also do you have any reasoning on why you chose 2 seconds as delay?

Reply

J.R. Gutierrez May 7, 2011 at 10:59 am

That’s just a warning. According to this (http://groups.google.com/group/datamapper/browse_thread/thread/51d6c6414b177273/2b49b7fbe6f2a588), it will be silenced the next release. You can safely ignore it, it should still allow you to continue. Run the following line to make sure you have all the gems to run this program:

gem install datamapper dm-sqlite-adapter mechanize

As for the 2 second delay, I’ve found anything under 2 seconds will most likely get you temporarily banned by Google. You can go with a shorter delay, but you can’t be on your Gmail web interface at the same time. I believe Google measure the number of requests in a certain time interval and you get auto-banned once a threshold is passed.

Ronnie May 8, 2011 at 10:49 am

Er well if that is just a warning then what is this ugly error stack after it? (I really hate to put this stuff here and having you do mundane debugging for me but googling didnt really help me and well lets just say ruby doesn’t seem to my cup of tea). I would really appreciate if you could help me a little with it.

I:\>ruby GetGmailChats.rb
DataObjects::URI.new with arguments is deprecated, use a Hash of URI components (C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-do-adapter-1.1.0/lib/dm-do-adapter/adapter.rb:231:in `new’)
C:/Ruby192/lib/ruby/gems/1.9.1/gems/data_objects-0.10.5/lib/data_objects/connection.rb:79:in `initialize’: unable to open database file (DataObjects::ConnectionError)
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/data_objects-0.10.5/lib/data_objects/connection.rb:79:in `new’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/data_objects-0.10.5/lib/data_objects/pooling.rb:177:in `block in new’
from :10:in `synchronize’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/data_objects-0.10.5/lib/data_objects/pooling.rb:172:in `new’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/data_objects-0.10.5/lib/data_objects/pooling.rb:119:in `new’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/data_objects-0.10.5/lib/data_objects/connection.rb:68:in `new’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-do-adapter-1.1.0/lib/dm-do-adapter/adapter.rb:251:in `open_connection’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-do-adapter-1.1.0/lib/dm-do-adapter/adapter.rb:276:in `with_connection’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-do-adapter-1.1.0/lib/dm-do-adapter/adapter.rb:33:in `select’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-migrations-1.1.0/lib/dm-migrations/adapters/dm-sqlite-adapter.rb:43:in `table_info’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-migrations-1.1.0/lib/dm-migrations/adapters/dm-sqlite-adapter.rb:18:in `storage_exists?’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-migrations-1.1.0/lib/dm-migrations/adapters/dm-do-adapter.rb:90:in `create_model_storage’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-migrations-1.1.0/lib/dm-migrations/adapters/dm-do-adapter.rb:57:in `upgrade_model_storage’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-migrations-1.1.0/lib/dm-migrations/auto_migration.rb:71:in `upgrade_model_storage’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-migrations-1.1.0/lib/dm-migrations/auto_migration.rb:143:in `auto_upgrade!’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-migrations-1.1.0/lib/dm-migrations/auto_migration.rb:45:in `block in repository_execute’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/descendant_set.rb:66:in `block in each’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/subject_set.rb:212:in `block in each’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/ordered_set.rb:321:in `block in each’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/ordered_set.rb:321:in `each’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/ordered_set.rb:321:in `each’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/subject_set.rb:212:in `each’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-core-1.1.0/lib/dm-core/support/descendant_set.rb:65:in `each’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-migrations-1.1.0/lib/dm-migrations/auto_migration.rb:44:in `repository_execute’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-migrations-1.1.0/lib/dm-migrations/auto_migration.rb:27:in `auto_upgrade!’
from GetGmailChats.rb:29:in `’

Reply

J.R. Gutierrez May 8, 2011 at 3:14 pm

This is the real error: unable to open database file (DataObjects::ConnectionError)

It looks like you’re using Windows, so you need to change the folowing line:
DataMapper.setup(:default, ‘sqlite:///tmp/datamapper.db’) to
DataMapper.setup(:default, ‘sqlite:///C:/temp/datamapper.db’) or whatever works.

Ronnie May 8, 2011 at 7:57 pm

Awesome stuff man! It worked, just changed it use the ram to make it work. And with 2 secs, it didn’t have any adverse effects on my gmail account. Great stuff!
And thanks again for your help.

Reply

Ronnie May 9, 2011 at 6:11 am

One more thing, now I just want a record of my chats on my disk. The eml files look to have some xml sort of structure. Now should I find a way to convert those eml files to txt? (which I tried with a software called Total Mail Converter. It failed spectacularly, I got almost similar results by renaming the .eml files to .txt.)
Or should I look into doing something to the script to create txt files instead of eml. Does gmail give us the messages in the eml format?

And btw, opening the eml files with thunderbird also didn’t help. Thunderbird retained the xml structure.

Jonathan June 4, 2011 at 6:15 pm

Hi Ronnie and JR,

I have the same problem as Ronnie listed above. Could you please tell me how you fix your error?

I am running Ruby 1.9.1 pm Win7 x64

Thanks!

-Jonathan

Mohit May 16, 2011 at 11:28 pm

i dont understand ne of it …but i rely need to download my chat history in text form.. 😐

Reply

Ronnie May 18, 2011 at 4:59 pm

Sent you a mail! I guess the file I sent and the one you would be getting should be same. I’m inclined to interpret/use the xml part to recreate the chat text since it has all the info but not sure if any xml container in programming language will be able to accept it. Also the added = at the end of each line doesn’t ring any bell to me. i’m thinking of removing all those =’s and using some xml container in java and see if I can get it to work..

Reply

Gordon May 24, 2011 at 10:58 am

hi,

tried running this, have installed all the gems as suggested, my error is;

DataObjects::URI.new with arguments is deprecated, use a Hash of URI components (/var/lib/gems/1.8/gems/dm-do-adapter-1.1.0/lib/dm-do-adapter/adapter.rb:231:in `new’)
Agent initializing…
gmail_export.rb:37:in `initialize’: undefined method `com’ for nil:NilClass (NoMethodError)
from gmail_export.rb:178:in `new’
from gmail_export.rb:178

Reply

Gordon May 24, 2011 at 11:26 am

ignore last comment, i’m a dope!

Reply

Frieder May 24, 2011 at 2:04 pm

Does this still work for you these days? I get a bunch of HTML/Javascript after a message saying

“*** Something bad happened with the scraper. Dumping last page body. ***”

Could it be a problem with gmails GUI having changed recently?

Reply

J.R. Gutierrez June 4, 2011 at 6:31 pm

I have it hard coded to use the normal/US version of Gmail. It looks like its trying to forward to google.fr via javascript, but it can’t because ruby mechanize doesn’t consider javascript commands. Try and change your locale to US and see how it goes.

Frieder May 28, 2011 at 2:39 am

The error message I get is…

*** Something bad happened with the scraper. Dumping last page body. ***
Redirecting

// Accessing window.external members can cause IE to throw exceptions.
// Any code that acesses window.external members must be try/catch wrapped
/** @preserveTry */

[…]

Reply

Frieder May 28, 2011 at 2:40 am

*** Something bad happened with the scraper. Dumping last page body. ***
Redirecting

// Accessing window.external members can cause IE to throw exceptions.
// Any code that acesses window.external members must be try/catch wrapped
/** @preserveTry */

Reply

Frieder May 28, 2011 at 2:47 am

at the end of the error message I get:

[…]
#
[“./gmail_export.rb:80:in `login'”,
“./gmail_export.rb:119:in `run'”,
“./gmail_export.rb:181”]
Exporting 0 items to eml format.

Reply

Jonathan June 4, 2011 at 6:38 pm

Hi JR,

Where do the logs download to? Can I specify where they are downloaded to?

Thank you!

-Jonathan

Reply

Graham June 12, 2011 at 7:59 pm

I’m getting the same error as Frieder above. I’m in Canada, and changing my Google account’s location to the US doesn’t seem to help. I’m wondering if Google is automatically trying to forward me to google.ca based on my IP address. Here’s part of the page dump that was returned from gmail_export:

location.replace(“http://www.google.ca/accounts/SetSID?ssdc\x3d1\x26sidt\x3d[snipped for brevity]3D1”)
#

I appreciate your work into exporting Gmail chats! Hope I can get it working soon.

Reply

Daniel Drucker June 24, 2011 at 2:48 pm

There’s now official API access … http://code.google.com/googleapps/appsscript/class_gmailapp.html

maybe you’re a good candidate to write a backup utility?

Reply

Pramod Ghuge June 29, 2011 at 4:52 am

Google Apps script to export Google Talk chat logs to a Google Docs spreadsheet

https://gist.github.com/1051628

Reply

Hugo Osvaldo Barrera July 9, 2011 at 8:36 am

Even though I don’t like the “EML” format too much, this is a great peace of software – even if it’s just a script, it’s still software 🙂
Congrats on this, I managed to download everything off a really old account, and finally get rid of it, having my logs in my own PC. I’ve been searching for a way to achieve this for years!

Oh, and I use thunderbird, and dates seem fine when opening a log (contrary to what you said to uploading/opening them with gmail).

Since I no longer use gmail, I’ll see if I have any time to create a script (probably in python) to rename the files acording to thier date/contact or something similar 🙂

Cheers!

Reply

x7o July 31, 2011 at 7:35 am

Hey, folks. I came up with a solution to this which is the simplest I’ve seen yet. Check it out here:

http://freshhorse.wordpress.com/2011/07/29/leaving-gmail-and-bringing-your-chats-with-you/

Hope it’s useful to someone!

Reply

Yizhou Liu August 20, 2011 at 10:44 am

Thanks for your work!

It’s better than Google App Script since it has time stamps for each chat log. This script works perfect except it only download the last message in a mult-message thread. To download them all, I added:
if expand_all = @page.links.find { |l| l.text =~ /expand all/i}
@page = with_retries { @agent.click expand_all}
end
before:
original_links = @page.links.find_all { |l| l.text =~ /show original/i }

Reply

J.R. Gutierrez August 20, 2011 at 7:37 pm

I added this fix to the main file. Thanks!

Vinod August 31, 2011 at 8:32 pm

Hi,

I have around 1500 chats that i want to archive…I tried to use this tool..But, i get this error:

#<NameError: undefined local variable or method `expand_all' for #>
[“gmail_export.rb:141:in `block (2 levels) in run'”,
“gmail_export.rb:58:in `with_retries'”,
“gmail_export.rb:141:in `block in run'”,
“gmail_export.rb:137:in `each'”,
“gmail_export.rb:137:in `run'”,
“gmail_export.rb:182:in `'”]
Exporting 6 items to eml format.
gmail_export.rb:165:in `initialize’: Invalid argument – ?&v=om&th=132114b73670f2
63.eml (Errno::EINVAL)
from gmail_export.rb:165:in `open’
from gmail_export.rb:165:in `block in to_eml’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-core-1.1.0/lib/dm-core/colle
ction.rb:507:in `block in each’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-core-1.1.0/lib/dm-core/suppo
rt/lazy_array.rb:411:in `block in each’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-core-1.1.0/lib/dm-core/suppo
rt/lazy_array.rb:411:in `each’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-core-1.1.0/lib/dm-core/suppo
rt/lazy_array.rb:411:in `each’
from C:/Ruby192/lib/ruby/gems/1.9.1/gems/dm-core-1.1.0/lib/dm-core/colle
ction.rb:504:in `each’
from gmail_export.rb:164:in `to_eml’
from gmail_export.rb:190:in `’

Can you pls help me by fixing this issue ?

Thanks in advance.

Reply

J.R. Gutierrez September 3, 2011 at 3:33 pm

I just uploaded the fix. Download the new script and try again.

Vinod September 5, 2011 at 7:33 pm

Hi,

Looks like the script downloads ALL the mails in the specified label & converts them to .eml files..Can i get ONLY chats ?? Other than chat, other mails i could get using IMAP.

Reply

Justin September 6, 2011 at 1:23 am

Just wanted to say thanks, this worked great for me. The only thing is that chats that are also part of an email exchange cause the script to fail. I just went through and weeded all these out of my “chatlogs” label and it seems to be going through fine. Thanks!

Reply

Vinod September 6, 2011 at 10:05 am

I’ve got a problem here..Can someone help me out with this ??
After running ur script, i’m able to get the eml files..If i open the eml files locally, using outlook, the contents are displayed properly..However, when i import these files to gmail & view, the contents are really wierd:

MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary=”—-=_Part_29182_22590956.1315328604889″

——=_Part_29182_22590956.1315328604889
Content-Type: text/xml; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi Maplavaaya vaa…/cli:body>xstamp=3D”20100105T07:07:49″ xmlns=3D”jabbe=………………………….

Am i missing something ??

Reply

J.R. Gutierrez September 6, 2011 at 6:53 pm

Use Thunderbird. Outlook and Outlook Express do not import properly.

Vinod September 6, 2011 at 8:48 pm

I imported using Thunderbird only…

Vinod September 6, 2011 at 9:02 pm

For viewing the eml files & for importing, i use thunderbird…When i view the eml files, the contents, sender name & mail subject are displayed properly…but, after importing to gmail, when i view the mail from gmail interface, the contents appear wierd (but, viewing from some other web based email agents like livego, etc., displays the contents properly…looks like some MIME encoding issue)…huhh… 🙁

However, one main issue is…all the imported mails do not have the old subject & sender name…it says (Unknown sender) as sender and (no subject) as subject…

J.R. Gutierrez May 9, 2011 at 6:18 pm

I just ran it, and it looks like it’s in an EML format, with the body as two parts, one in google’s xml format, and another using a regular format that doesn’t have timestamps. It only displays the 2nd in thunderbird and the new gmail. If you want to help me troubleshoot, shoot me an email at jr [at] 9seats [dot] com with a few of your eml files from the ruby script attached.

Reply

J.R. Gutierrez June 4, 2011 at 6:27 pm

The eml files that are outputted are the raw files that Gmail outputs, exactly what I wanted. If you want to convert it to some other chat log format, you’re going to have to do it yourself. I wanted to keep it the raw format. It should open up in any email program fine, without timestamps.

Reply

Leave a Comment