wissel.net

Usability - Productivity - Business - The web - Singapore & Twins

Mail archive with Apache CouchDB / IBM Cloudant - Part 1


Like it or not, your eMail turned into the archive of your (working) past. One of the challenges with this archive is the tendency to switch eMail systems from time to time. IBM Notes won't open your Outlook PSD file, nor would Outlook open your Notes NSF database.
So a vendor and format neutral solution is required. The obvious choice here is MIME, which is for one, the format any message crossing the internet is encoded in, secondly all eMail applications support MIME - to some extend. Just storing each message into a directory structure isn't a good solution either, since navigation and search leave much to be desired, so some more work is needed.
Of course open standards tend to be ambiguous enough to allow different interpretation or the implementation of propriety extension. MIME is no exception. You can send any type of attachments, including malicious payloads, which are encoded and outside the MIME standard.
So looking at an archival solution here is my list of requirements:
  • Needs to be able to store MIME messages
  • Mime headers an other id fields need to be captured in database fields
  • Need to be able to sync on different locations for backup/availability
  • Need to be able to provide navigation access by sorted, filtered lists
  • Interface to do some analytics
  • Full text search
  • HTML and text content should be displayed directly, all other types should be listed as attachments
  • Inline images (href / src in the html content pointing to other mime parts) need to be dealt with
  • Import capabilities
  • Source code for inspection available, OpenSource if possible
Looking at the requirement, I concluded: I got a clear idea what I want to have, but I haven't found it. The logical next step: Let's build it.The moving parts:
  • Apache CouchDB to store eMails and attachments. Also available in IBM Bluemix as Cloudant database. CouchDB provides JavaScript based Map-Reduce access to lists
  • Java to write the import capabilities for IBM Notes databases (obviously requires a running Notes client or server)
  • Ektorpto write to CouchDB from Java - it has all the nice abstractions for documents and attachments
  • The usual suspects for UI and logic: node.js, angular.js and Twitter bootstrap
  • Elastic Search with the logstash for CouchDB adapter
The sequence of export - import is:
Exporting from different mail systems to JSON
There are a number of design decisions for such a system to be considered:
  • should the propriety adapters stop with the extraction of the MIME document? Or should they go all the way to the JSON format?
  • How much vendor specific additional fields should be in the JSON?
  • How shall folders be handled? Outlook stays close to the idea of physical folders: an eMail can be in one and only one folder. GMail and IBM Notes follow a tagging approach. An eMail gets tagged with 0..n folders. Notes additionally knows categories
  • How shall embedded information be handled (e.g. OLE)?
Stay tuned for updates

Posted by on 24 December 2015 | Comments (0) | categories: Bluemix IBM Notes

Comments

  1. No comments yet, be the first to comment