eMail archival in PDF and electronic record keeping
The question pops up quite regularly: " Our compliance department has decided to use PDF/A for long term record storage, how can I save my eMail to it?" (The question applies to ALL eMail systems). The short answer: Not as easy as you think. The biggest obstacle is legal need vs. user expectation. To make that clear: I'm not a lawyer, this is not legal advise, just my opinion, talk to your legal counsel before taking action. User expectation (and thus problem awareness): "Storing as PDF is like storing on paper, so what's the big deal?" In reality electronic record keeping has a few different requirement (and NO printing an eMail as seen on screen is NOT record keeping - more on this in a second). Every jurisdiction has their own regulations, but they are strikingly similar (for the usual devil in the details ask your lawyer), so I just take Singapore's electronic transactions act as a sample:
Retention of electronic records
9. —(1) Where a rule of law requires any document, record or information to be retained, or provides for certain consequences if it is not, that requirement is satisfied by retaining the document, record or information in the form of an electronic record if the following conditions are satisfied:
(a) the information contained therein remains accessible so as to be usable for subsequent reference;
(b) the electronic record is retained in the format in which it was originally generated, sent or received, or in a format which can be demonstrated to represent accurately the information originally generated, sent or received;
(c) such information, if any, as enables the identification of the origin and destination of an electronic record and the date and time when it was sent or received, is retained; and
(d) any additional requirements relating to the retention of such electronic records specified by the public agency which has supervision over the requirement for the retention of such records are complied with.
(colour emphasis mine)
So as there is "more than meets the eyes". A eMail record is only completely kept if you keep the header information. Now you have 2 possibilities: change the way you "print" to PDF to include all header / hidden fields (probably at the end of the message) or you use PDF capabilities to retain them accessible as PDF properties. The later case is more interesting since it resembles the user experience in your mail client: users don't see the "techie stuff" but it is a click away to have a peek. There are a number of ways how to create the PDF:
As usual YMMV
Retention of electronic records
9. —(1) Where a rule of law requires any document, record or information to be retained, or provides for certain consequences if it is not, that requirement is satisfied by retaining the document, record or information in the form of an electronic record if the following conditions are satisfied:
(a) the information contained therein remains accessible so as to be usable for subsequent reference;
(b) the electronic record is retained in the format in which it was originally generated, sent or received, or in a format which can be demonstrated to represent accurately the information originally generated, sent or received;
(c) such information, if any, as enables the identification of the origin and destination of an electronic record and the date and time when it was sent or received, is retained; and
(d) any additional requirements relating to the retention of such electronic records specified by the public agency which has supervision over the requirement for the retention of such records are complied with.
(colour emphasis mine)
So as there is "more than meets the eyes". A eMail record is only completely kept if you keep the header information. Now you have 2 possibilities: change the way you "print" to PDF to include all header / hidden fields (probably at the end of the message) or you use PDF capabilities to retain them accessible as PDF properties. The later case is more interesting since it resembles the user experience in your mail client: users don't see the "techie stuff" but it is a click away to have a peek. There are a number of ways how to create the PDF:
- Use a commercial package like DominoPDF, AGE Exporter or IntelliPrint that are capable of generating PDF directly. They have their limitations of what they can output around signature and encryption. Big advantage: vendor support
- Use a PDF printer driver that can be programmed to automatically assign a known file name. Advantage: works like printing. Disadvantage: depends on the OS printing system
- Export your eMail as MIME or DXL and use a XSLT transformation to generate XSL:FO that can be saved as PDF. There is Apache FOP and a series of commercial tools. Advantage of this approach: you could have more than one pipeline (e.g. email, Notes apps, client apps, web apps) that end at the XSL:FO processor. Disadvantage: XSL:FO is simply a beast
- Generate the PDF in a discrete way using a Java library. Two Opensource libraries are quite popular: iText and PDFBox. iText is Affero GPL licensed, so it might not be suitable for your project. PDFBox is licenced under the Apache license
Sample 1: Store all regular Notes fields as custom properties in PDF
/**
* @param pDoc The PDF Document that will receive the meta data
* @param nDoc The Notes Document where the data will reside
* @throws NotesException
*/
@SuppressWarnings ( "unchecked" )
public void saveNoteToMeta (PDDocument pDoc, Document nDoc ) throws NotesException {
Vector allItems = nDoc. getItems ( ) ;
PDDocumentInformation info = pDoc. getDocumentInformation ( ) ;
for ( int i = 0 ; i < allItems. size ( ) ; i ++ ) {
Item curItem = (Item ) allItems. get (i ) ;
// TODO: exclude more items
if (curItem. getType ( ) != Item. RICHTEXT && curItem. getType ( ) != Item. ATTACHMENT && curItem. getType ( ) != Item. EMBEDDEDOBJECT ) {
String itemName = curItem. getName ( ) ;
String itemValue = curItem. getValueString ( ) ;
info. setCustomMetadataValue (itemName, itemValue ) ;
}
}
}
I will, so time permits, publish more samples for meta data storage: Sample 2: Store all regular Notes fields as XML in PDF , Sample 3: Store the entire Note as XML in PDF and Sample 4: store the Note as MIME entries.
* @param pDoc The PDF Document that will receive the meta data
* @param nDoc The Notes Document where the data will reside
* @throws NotesException
*/
@SuppressWarnings ( "unchecked" )
public void saveNoteToMeta (PDDocument pDoc, Document nDoc ) throws NotesException {
Vector allItems = nDoc. getItems ( ) ;
PDDocumentInformation info = pDoc. getDocumentInformation ( ) ;
for ( int i = 0 ; i < allItems. size ( ) ; i ++ ) {
Item curItem = (Item ) allItems. get (i ) ;
// TODO: exclude more items
if (curItem. getType ( ) != Item. RICHTEXT && curItem. getType ( ) != Item. ATTACHMENT && curItem. getType ( ) != Item. EMBEDDEDOBJECT ) {
String itemName = curItem. getName ( ) ;
String itemValue = curItem. getValueString ( ) ;
info. setCustomMetadataValue (itemName, itemValue ) ;
}
}
}
As usual YMMV
Posted by Stephan H Wissel on 26 January 2011 | Comments (3) | categories: Show-N-Tell Thursday Software