IMAP migration and backup

I recently had to migrate two decades worth of email from one mailserver to another (as a mere user at both ends), so herewith a few notes about the experience gained. Note that none of the tools described currently (late 2021) supports any form of multi factor authentication (2FA/MFA). Note too that I accept no responsibility for any errors in the below, particularly as the versions of the various tools you use are unlikely to be identical to the versions I used. Test carefully!

The Basics

IMAP is a common protocol for accessing email on a server (but not for sending it). It says nothing about how the server actually stores the email.

Two common formats for storing email are the very old mbox format, and the slightly more recent maildir format. There are many others. These may impose restrictions which the IMAP protocol does not impose. For instance, folder names differing only in case may not be allowed, or a dot may not be allowed in a folder name (as it is a common IMAP hierarchy separator). Folder names might be restricted to 7-bit ASCII, or UTF8 might be allowed. Characters special to the underlying OS, such as / or :, might also be disallowed.

The mbox format uses a single file for all emails in a folder. Appending is fast. Deletion, save from near the end, is slow. Incremental backups don't really work. The file is simply a concatenation of the emails, so even counting the number of emails in it, let alone extracting their subjects, requires reading the whole thing. Locking is essential if two processes might try to modify the same folder simultaneously.

The maildir format uses a single file per email, and one directory per email folder. It relies on the filesystem being good at acting as a database. Some filesystems become rather slow once a directory has a large number of files in it. The old Linux ext3 filesystem, if HTrees for directories are not enabled, starts to struggle at about 10,000 files per directory.

Flags

An email consists of a body, some headers, and some flags. The flags are relevant to the server, and are not intended to be sent when the email is forwarded or even downloaded. They include things such as whether the email has been read (or, conversely, whether it is new/unread), whether the email has been answered, forwarded, etc.

One immediate issue is that different formats and protocols define different ranges of flags. IMAP defines six: Seen (read), Answered, Flagged, Deleted, Draft, and Recent. Many IMAP implementations also support $Forwarded. In the mbox format, flags are stored as extra header lines called "Status:", "X-Status:" and "X-Keywords:". The forwarded flag ends up in the X-Keywords: header, and this will not be read unless the first email in each mbox files contains an "X-IMAP" header which lists the keywords in use.

The maildir format stores the flags in the filename. It has a flag called Passed, which almost, but not quite, maps to Forwarded, but is also set if the message has been bounced. There are extensions to maildir, notably the Courier extension, which do add a genuine Forwarded flag.

IMAP Downloaders / Migrators

A brief summary of some of the tools I tried.

OfflineIMAP

GPL v2, and python 2.x, with a python 3 port still in development. Versions after 7.2.4 require the rfc6555 python module too. The configuration file syntax is a little awkward too, so I gave up on this.

imapbackup3

MIT, and python 3.x. Very easy to install with pip. Will download from IMAP to mbox or maildir, but will not synchronise two IMAP accounts. Makes no attempt to preserve any flags either, so information about which emails you have answered is lost.

Beware that python's own mail-handling libraries (which this calls) may have bugs. I was able to trigger a "KeyError: 'content-transfer-encoding'" error from line 558 of python3.8/email/message.py, but I cannot reduce this to an example which can reasonably be filed as a bug report.

If you wish to try preserving flags with this, then you might wish to consider a modified version of imapbackup.py as a starting point.

isync / mbsync

GPL v2 and C. Will synchronise two IMAP accounts, or download to maildir. Version 1.4.0 is needed if one wishes to preserve the $Forwarded flag (standard IMAP flags are preserved in earlier versions). Ubuntu currently (21.04) ships with 1.3.x. Version 1.4.0 deprecates master / slave terminology, which was required in previous configuration files.

The naming is confusing. It used to be called isync, but a significant change to the command-line syntax caused the executable to be renamed mbsync. The name of the project remains isync.

The configuration file syntax is not ideal, so below I show an example of copying from one IMAP server to another, using the now-deprecated syntax.

# If writing to a local maildir, one may wish to suppress fsync()s
FSync no

# Define a store
IMAPStore Hermes
Host imap.hermes.cam.ac.uk
User spqr1
SSLType IMAPS

# Define another store
IMAPStore MB
Host imap.example.com
# Some servers like @s in usernames
User fred@mydomain.com
SSLType STARTTLS
# Prefix to be used to find/create folders
Path INBOX.

# Define a channel which links two stores
Channel test
Master :Hermes:
Slave :MB:
# Do no delete at either end
Sync Pull
# When should non-existant folders be created?
Create Slave
CopyArrivalDate yes
# When both ends are IMAP, need somewhere local to store state
SyncState ~/.mail/imap-transfer
# What to synchronise. Omit for everything. If one just specifies
# exclusions (with !), one must also specify everything (%).
Patterns % !INBOX !spam

Then mbsync can be invoked as mbsync -V test where -V makes it more verbose, and test specifies the channel to run. If passwords are not given in the configuration file, it will prompt for them. It will run once, then exit.

uw-mailutils

The University of Washington's mailutils package is based on c-client, the library which pine/alpine uses for IMAP and local file store support. It is available as a package (uw-mailutils) for Debian and Ubuntu, and presumably other Linux distributions.

Being based on c-client, it is highly compatible with alpine's way of doing things, it supports mbox local files, and it does not support maildir. It offers both copying of individual folders, and bulk copies (transfers) of many folders.

The syntax for copying a remote folder to the local computer is

$ mailutil -kw copy '{imap.hermes.cam.ac.uk/user=spqr1/ssl}remote_folder' dir/local_mbox

Note that the local destination will be interpreted with respect to one's home directory, not one's current directory.

Bulk transfers to perform uploads are tricky, as the full name of the local folder is replicated at the remote end. So supposing one has

$ ls Hermes_test/*
Hermes_test/Test18  Hermes_test/Test19

then

$ mailutil -kw transfer Hermes_test '{imap.hermes.cam.ac.uk/user=spqr1/ssl}'

will prompt for a password, and create a folder called Hermes_test on the remote server. That folder will then contain two subfolders. Probably not what was wanted. So the following, undocumented, and therefore surely unreliable, incantation may be more helpful.

$ HOME=${HOME}/Hermes_test mailutil -kw transfer -m prompt '%' '{imap.hermes.cam.ac.uk/user=spqr1/ssl}'

It appears that mailutil determines one's home directory from the value of the environment variable HOME, and that a percent sign is a useful placeholder wildcard character. (I don't understand why a period fails here, as does a forward slash.)

Unless the destination is completely empty, some form of -m option will be necessary. Else the transfer will fail as soon as a folder which exists on both source and destination is encountered. That folder is probably called "Sent", or maybe "Drafts".

imapsync.pl

The imapsync project (see also imapsync at GitHub) gets many recommendations. It is written in perl, and offers Windows and MacOS binaries too. It copies from IMAP to IMAP only; it does not support local files. I have heard from those who have used it for large projects, and are very satisfied with it. (By the time I came across it, I was interested in copying to and from local files, so I have not tested it.)

Although imapsync is free, paid-for support is also offered.