Archiving Gmail Chat Logs with Ruby

by J.R. Gutierrez on April 5, 2011

So recently, I took it upon myself to migrate from one Google Account to another, finding it fairly easy to transfer my Gmail Archives via IMAP. I quickly found out that the chat logs were not accessible via the IMAP interface. A quick Google search brought up Collin’s python script that uses libgmail. Unfortunately, libgmail is not being maintained anymore and doesn’t work with the latest Gmail interface. In the comments however, it was discussed that you can use Google Gears to backup your chatlogs. That also turned out to be a dead end, mainly because long chats are truncated, just like how they are in the regular interface, and RAW files aren’t available.

I decided to write my own solution, using Ruby. Download gmail_export.rb here. You’re going to need the Mechanize and DataMapper gems before it works. Basically, you edit gmail_export.rb with the correct username/password, which label to export, and the delay between each page get. What I did was do a search of “is:chat”, and label the result as “chatlogs”. I put in the right variables, and let it run. 2800 chatlogs took about 2hrs.

Next I used Thunderbird with the ImportExportTools Add-on to upload it into the new Gmail account. Unfortunately, you lose timestamps within the chats and Gmail refuses to recognize the messages as chats, but you can search just like you would any other item in your mailbox.

Interesting design notes: I use DataMapper and sqlite3 to record the URLs visited already, just in case you lose connection for some reason and have to restart the script. I specifically wanted to protect myself from account lockouts and network timeouts. You can also download any other message aside from chats using the script, but I’d rather use IMAP instead.

Needless to say, you are probably breaking Google’s ToS, so I’m not responsible for any damaged caused. Enjoy and happy scraping.

Installation instructions:

wget http://9seats.com/wp-content/uploads/gmail_export.rb
gem install datamapper dm-sqlite-adapter mechanize
ruby gem_export.rb

Update:

  • 0.1 – Initial upload.
  • 0.2 – 2-way authentication support added.
  • 0.3 – ruby19 check and ruby18 workaround.
  • 0.4 – Status reports are more informative. Error checking for nil objects. Code cleanup.
  • 0.5 – Adds username and password input if the default values don’t work.
  • 0.5.1 – Fix URI being changed by Google.

Google is now allowing IMAP export of chatlogs! http://dataliberation.blogspot.com/2011/09/gmail-liberates-recorded-chat-logs-via.html

{ 46 comments }