Index ¦ Archives ¦ Atom

Gmail Backup and Archival

This message has been archived since I didn't do this in the end. I ended up using gmail to retrieve my pop messages instead of using getmail. I had problems sending the almost 20 thousand emails through either my ISP or through gmail (locked out due to sending too many emails). So, I used tools built into Gmail entirely and never ended up using getmail or procmail.

~spig

With the addition of GMail's pulling of POP messages into gmail, I decided to use my gmail account exclusively instead of my various gmail hosted accounts for receiving and sending email. I'm not sure how new a feature it is, but Gmail recently added the ability to send via SMTP so all my email can appear to be coming from the domains that I have forwarding to my Gmail account. I like doing this for multiple reasons but one of which is having one Google docs account and all my information for google accounts is primarily in once place. So, here's how I did it.

First, Move all email back to the primary Gmail account

Since I've been using other accounts as my primary accounts I wanted to move all my email back to gmail. This is fairly trivial but since I've been forwarding my gmail to various accounts at different times I wanted to make sure not to send the emails back that have already been forwarded from gmail. So, I ran getmail, delivering to a local procmail instance and that then emails me at my gmail account.

Due to large messages being moved, I had to patch the implementation of imaplib.py as described at the property maps blog like so:

data = self.sslobj.read(size-read)

is changed to

data = self.sslobj.read(min(size-read, 16384))

My getmailrc file looks like this:

  [options]
verbose = 1
delete = false
read_all = false
message_log = ~spig/.getmail/log

[retriever]
type = SimpleIMAPSSLRetriever
server = imap.gmail.com
username = user@example.net
password = password
mailboxes = ("INBOX","[Gmail]/All Mail",)

[destination]
type = MDA_external
path = /usr/bin/procmail
user = spig
arguments = ("-f", "%(sender)", "-m", "/Users/spig/.procmailrc")

And my procmailrc looks like this:

  VERBOSE=YES
LOGFILE=procmail.log

SUBJECT=`formail -xSubject:`

:0 fhw
|formail -i "Subject: $SUBJECT (getmail-archive)"

:0
* ! ^X-Forwarded-For:.*
!test@gmail.com

:0
/dev/null

Getmail only handles the last retriever section so make certain to comment out the ones you don't want to use if you have more than one.

Second, archive all Gmail email

I don't think I need to archive very large messages, so I put in a rule to keep out most attachments and other giant messages. I don't really think the messages add enough value for what I want to archive. Once this is setup I think I'll set up a crontab to keep it going while I have my machine running. Never another thought about it.

I also wanted my email separated out by year and month so I bzip all email by month and then added a folder per month in each year. This should make for a decently small archive.

Here is my procmailrc to accomplish my archiving goals:

  VERBOSE=YES
LOGFILE=procmail.log
DATE_RECEIVED=`formail -xDate:`
MONTHFOLDER=`ruby -e "require 'Date'; d=Date.parse('$DATE_RECEIVED');print d.month.to_s"`
YEARFOLDER=`ruby -e "require 'Date'; d=Date.parse('$DATE_RECEIVED');print d.year.to_s"`
DUMMY=`test -d /Users/spig/Documents/mail_archives/$YEARFOLDER/$MONTHFOLDER || mkdir -p /Users/spig/Documents/mail_archives/$YEARFOLDER/$MONTHFOLDER`

# filter out anything greater than 256 K
# I might consider writing headers to a file to verify that I don't want these emails backed up
:0fw
* < 256000
/dev/null

:0
| /usr/bin/bzip2 >>$HOME/Documents/mail_archives/${YEARFOLDER}/${MONTHFOLDER}/archive.bz2

© Steve Spigarelli. Built using Pelican. Theme by Giulio Fidente on github.