Archiveopteryx: database schema

This page is an introduction to the Archiveopteryx data model, and describes how it is used to store a typical message. There is also a much more detailed schema description.

As an example, we'll consider the delivery of the following message to nirmala@example.com.

Date: Wed, 29 Dec 2004 15:21:27 +0530
From: Abhijit Menon-Sen <ams@example.com>
To: Nirmala Candida <nirmala@example.com>
Subject: GPS map of my house
Message-ID: <20041229095127.GA20211@example.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="mYCpI+"

--mYCpI+
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

I've attached a PNG version of the map I plotted.

-- ams
--mYCpI+
Content-Type: image/png
Content-Disposition: attachment; filename="map.png"
Content-Transfer-Encoding: base64

iVBORw0KGgoAAAANSUhEUgAAAboAAAFoCAYAAAAhAT6h
AP+gvaeTAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJ
EwAAIABJREFUeNrtnUuzJMd5nt+vus9tLsAABEHwJpEQ

--mYCpI+--

We've truncated the image data and removed some bulky header fields (such as Received), for the sake of brevity. We'll examine every component of this message in detail below. (Each table name below is linked to its definition.)

Message delivery

When the server receives this message, it must first decide what to do with it. It looks up the recipient address in the aliases table to ensure that it is valid. From this it discovers both the mailbox (in mailboxes) to which the alias refers, and the user (in users) to which it belongs. The final decision is taken by the user's active Sieve script (defined in the scripts table), but the default action is to store the message in the mailbox corresponding to the alias (which is usually the user's inbox).

The storage of every message in Archiveopteryx begins with an entry in the messages table. This table contains little more than a "serial" primary key, and the tables that store components of the message (header fields, bodyparts, etc.) all refer to the message using its unique messages.id.

IMAP assigns a unique part number to every MIME body part in a message (our example has two: one text/plain, one image/png). Each body part is individually stored in bodyparts along with a hash of its contents, and the part_numbers table links messages with their constituent bodyparts. This allows us to store each unique bodypart only once, no matter how many messages it is attached to.

Depending on the content-type of the body part, the data may be stored in either the text (for text/plain) or data (for anything else) columns of bodyparts. For some types (e.g. HTML), a plain-text version of the data may also be stored as text to facilitate searching.

The message has a top-level header, and every body part may have its own MIME header. The position, field, and value of each header field is stored in header_fields, with references to the message and the relevant part number (which is empty for the RFC 822 header fields). The value is stored in this table as an unstructured string.

Header fields that contain email addresses (From/To/Cc etc.) are stored in address_fields instead. This table is like header_fields, but the value of each address in the field is stored as a reference to the addresses table, into which every unique address used in a message is first inserted. (So no address is stored more than once, no matter how often it is used.) There is also a date_fields table, where the value is stored as a timestamp in addition to (and not instead of, as is the case for address_fields) the string in header_fields.

Once the message itself is stored, it must be associated with the mailbox (or mailboxes) that it is being delivered into. This is done by inserting rows into mailbox_messages. In IMAP, a message is identified by a unique id within a mailbox. The uid is assigned from the uidnext value for the mailbox in mailboxes. Each row in mailbox_messages associates a message id with an IMAP (mailbox,uid). (So a message is stored only once, no matter how many mailboxes it is copied into.)

Within a mailbox, a message may have IMAP flags and annotations. Both tables associate a flag or annotation value with a (mailbox,uid) row. (Flags refer to the mailbox_messages row because they are a property of the message only as it exists within a mailbox. A message has the same headers in every mailbox, but it may be marked as \Flagged in the inbox, while being \Seen and \Answered in a different mailbox.)

When a user wants to deletes a message from a mailbox, the server first checks whether the retention_policies allows that. If it does, (mailbox,uid,message) row is moved from mailbox_messages to deleted_messages. Once there are no more references to a message id in mailbox_messages, the message (i.e. the row in messages, and the rows in bodyparts/header_fields/etc. that refer to it) may be permanently deleted. (This is what the "aox vacuum" command does.)

Schema upgrades

Newer releases of Archiveopteryx sometimes change the database schema, either to add new functionality or to fix bugs.

The current schema revision is stored in the mailstore table, and examined at startup by the server. If the revision is not current, the server won't start. The "aox upgrade schema" command executes the SQL commands necessary to upgrade an existing database to the latest version of the schema.

Very rarely, more extensive changes to the database are required, and these are done by running "aox update database" after installing and starting the new version.

Relevant links

In case you have any questions, please write to info@aox.org.

About this page

Last modified: 2010-11-19
Location: aox.org/schema