Archiveopteryx architecture

This page describes the architecture of Archiveopteryx. There is also a brief overview, if you didn't want this level of detail.

Archiveopteryx consists of a set of server programs that provide access to mail stored in a relational database. The following diagram illustrates the externally-visible components of the system, and how a message from Alice to Bob flows through the system.

Architecture overview

The life cycle of messages

In typical configurations, a standard MTA (such as Postfix) accepts incoming mail, performs any filtering necessary, and delivers the mail into Archiveopteryx via LMTP.

In the illustration, the MTA accepts a message from user Alice and delivers it to the LMTP server. (It may send the message via a virus scanner.)

The LMTP server parses the message and stores its canonical representation in the central database. Invalid messages are corrected and accepted whenever possible, or rejected promptly. Once the message has been stored, all of the other server processes learn about the delivery.

At the time Alice's message is stored, Bob is reading mail using IMAP.

The IMAP server that's serving Bob learns about the mail when it's stored, and it tells Bob's MUA (mail reader) as soon as possible. When Bob's MUA asks to see a message, it is retrieved from the database. (We support all of IMAP's bandwidth-saving features, including partial fetches and compression, so good clients can access a large mailbox using very little bandwidth.)

If Bob deletes the message, it disappears from the mailbox, but can be recovered until it is permanently removed by a background process a week later (but, depending on site policy, Bob may not be permitted to delete mail).

Frontend servers

Mail is delivered into the system through an SMTP/LMTP server. Mail is retrieved from the system through an IMAP server. (A POP server is also available.)

These services are all handled by one or more archiveopteryx(8) daemon processes, each of which handles multiple client connections and manages a pool of persistent database connections to store or retrieve mail.

These servers may all run on the same machine, or may be distributed over a cluster for increased capacity. Each service may be handled by an arbitrary number of hosts.

Administration tools

Archiveopteryx provides the aox command-line program for administration.

Database

Mail is stored in a PostgreSQL database, to which all other servers are connected. In the future, other SQL servers may also be supported if there's sufficient demand.

The database stores a normalized representation of each message, which greatly benefits retrieval, at the expense of having to work harder during message delivery. Here's a detailed overview of the database schema, and the actual schema definition.

Source code architecture

Archiveopteryx is written in C++, using strict coding standards. Smalltalk naming conventions are used. On principle, the source code is kept simple, both in terms of C++ syntax, choice of algorithms and features used.

The source code is fully documented, to make life simpler as the team changes and expands. At the time of writing, we have over 300,000 test cases of different types, and are always extending the number and variety of tests.

Very few libraries are used, for several reasons. Most importantly, libraries are difficult for us to test. Few come with any kind of test suite.

As platform, a small part of Unix is used. Archiveopteryx is probably easy to compile on almost all versions of Unix, since it depends on so few features.

Relevant links

If you have any questions, please write to info@aox.org.

About this page

Last modified: 2010-11-19
Location: aox.org/architecture