wissel.net

Usability - Productivity - Business - The web - Singapore & Twins

Loading HTML or XML Content in LotusScript over HTTP


Your application needs data that are stored on a web server. If that data is available through a web service your are lucky. Since R8 web service clients are supported in LotusScript. If you want to load data from a URL you are out of luck. Typically you would resort to ActiveX and use the IE component to do the retrieval which introduces 3 evils: a Windows dependency, an IE dependency and an ActiveX dependency. The other way is to use Java, which turns a lot of LotusScript developer off. The solution is to use a ready made library that can wraps all the Java you need into a convenient LotusScript class. The use case I had was to read HTML from a remote site and return a specific table for further processing. So my class has an XPath parameter that allows to slice out some part of the returned HTML. This is how you would use it in LotusScript:
%REM
    Agent UpdateHTMLOnChange
    Created May 28, 2010 by Stephan H Wissel
    Description: Reads all documents that have been flagged
    as changed and retrieves the update HTML
%END REM

Option Public
Option Declare


Use "HTTPUpdatesLS"

   
Sub Initialize
    Dim updateClass As HTTPUpdates
    Set updateClass = New HTTPUpdates
    Call updateClass. UpdatePendingDocuments ( )
    Set updateClass = nothing
End Sub
 
And this is the complete LotusScript class:
%REM
    Library HTTPUpdatesLS
    Created May 29, 2010 by Stephan H Wissel
    Description: Wrapper Class around the LS2J Classes
    for HTTP driven updates
%END REM

Option Public
Option Declare

Use "HTTPUpdates"
UseLSX "*javacon"
Use "OpenLogFunctions"
%REM
    Class HTTPUpdates
    Description: LotusScript Wrapper around
    Java Class for HTTPUpdates
%END REM


Public Class HTTPUpdates
    'For the LS2JReader
    Private jSession   As JavaSession
    Private httpReaderClass As JavaClass
    Private httpReader As JavaObject
    Private s As NotesSession
    Private db As NotesDatabase
    Private viewName As String
    Private serverName As String
    Private dbName As String
   
    %REM
        Sub new
        Description: Initialize the class
    %END REM

    Public Sub New
        Call populateDefaults ( )
    End Sub
   
   
    %REM
        Sub Crap
        Description: Comments for Sub
    %END REM

    Public Sub UpdatePendingDocuments

        Dim fullURL As String
        Dim v As NotesView
        Dim doc As NotesDocument
        Dim nextDoc As NotesDocument
       
        Set db = s. Currentdatabase
        Set v = db. Getview ( "pendingHTMLUpdates" )
       
       
        'Now through the view
       
       
        Set doc = v. Getfirstdocument ( )
       
        Do Until doc Is Nothing
            Set nextDoc = v. Getnextdocument (doc )
           
            Call updateDocHTML (doc )
           
            Set doc = nextDoc
        Loop
       
        'And close it down
        Call httpReader. recycle ( )
       
    End Sub
   
   
    %REM
    Sub updateDocHTML
    Description: Here goes the update of one individual document
    %END REM

    Public Sub updateDocHTML (doc As NotesDocument )
       
        'We don't let one error derail us
        On Error GoTo Err_updateDocHTML
       
        Dim unid As String
        Dim result As String
        Dim htmlItem As NotesItem
       
        unid = doc. universalid
        'Here all the magic happens
        result = httpReader. getDocument (unid )
       
        If result <> "" Then 'We only save if we got something
            If doc. hasItem ( "FinalHTML" ) Then
                Call doc. removeItem ( "FinalHTML" )
            End If
            Call doc. replaceItemValue ( "FinalHTML",result )
            Call doc. replaceItemValue ( "HTMLStatus", "1" )
            Call doc. save ( True, True )
        End If
       
Exit_updateDocHTML:
        Exit Sub
       
Err_updateDocHTML:
        Call logErrorEx ( Error$,SEVERITY_HIGH,doc )
        Resume Exit_updateDocHTML
       
    End Sub
   
    %REM
    Sub updateDocHTML
    Description: get arbitrary HTML
    %END REM

    Public Function getRemoteHTML (url As String ) As String
       
        'We don't let one error derail us
        On Error GoTo Err_getRemoteHTML
       
        getRemoteHTML = httpReader. getURL (url )
       
Exit_getRemoteHTML:
        Exit Function
       
Err_getRemoteHTML:
        Call logErrorEx ( Error$,SEVERITY_HIGH, nothing )
        getRemoteHTML = "<h3>Error:"+ Error$+ "</h3>"
        Resume Exit_getRemoteHTML
       
    End Function
   
    %REM
        Sub populateDefaults
        Description: Populate default setting
        presuming the Notes server is known with
        its Common name in the DNS
    %END REM

    Private Sub populateDefaults
        Set s = New NotesSession
        Set db = s. Currentdatabase
       
        If db. Server = "" Then ' it runs local
            servername = "http://localhost"
        Else
            servername = me. GetServerURL (db. Server )
        End If
       
        'We use the replicaid to be save from
        'moving of databases and the peril of local nsf names
        dbName = "__"+db. Replicaid+ ".nsf"
        viewname = "0"
       
        'Create an HTTPReader Instance
        Set jSession = New Javasession
        Set httpReaderClass = jSession. getClass ( "org.lotususers.tools.HTTPReader" )
        Set httpReader = httpReaderClass. CreateObject ( )

        Call httpReader. setServerURL (servername )
        Call httpReader. setDatabaseURL (dbName )
        Call httpReader. setViewName (viewname )
        Call httpReader. setXPath ( "//body/*" )
        Call httpReader. setUseSSO ( False )
        'ToDo: Username & Password from a profile - don't hardcode here!
        If Not db. Server = "" then
            'Call httpReader.setUserName("user")
            'Call httpReader.setPassWord("password")
        End if
       
    End Sub
   
    ' Cleanup
    Public Sub Delete
        On Error Resume next
        If Not me. httpReader Is Nothing Then
            Call httpReader. recycle ( )
        End If
       
    End Sub
   
    %REM
        Function GetServerURL
        Description: Gets the server name from the NAB
    %END REM

    Private Function GetServerURL (serverName As String ) As String
        Dim domDir As NotesDatabase
        Dim sView As NotesView
        Dim doc As NotesDocument
        Dim n As NotesName
        Set domDir = New NotesDatabase (serverName, "names.nsf" )
        If Not domDir. Isopen Then
            Call domDir. Open ( "", "" )
        End If
        If Not domDir. isOpen Then
            Set n = s. Createname (serverName )
            GetServerURL = "http://"+n. Common
            Exit function
        End If
       
        Set sView = domDir. Getview ( "($Servers)" )
        Set doc = sView. Getdocumentbykey (serverName, true )
        If Doc Is Nothing Then
            Set n = s. Createname (serverName )
            GetServerURL = "http://"+n. Common          
        End If
       
        GetServerURL = "http://"+ doc. Getitemvalue ( "NetAddresses" ) ( 0 )
       
    End Function
   
   
End Class
Of course that class is useless without the Java class. Here you go: HTTPReader.java
You need some libraries as dependencies:
Dependencies
  1. Apache HTTP Client
  2. Apache Commons logging (a HTTP client dependency)
  3. HTML Cleaner which ensures that HTML you read is transformed into valid HTML so it can be parsed
As usual YMMV.

Posted by on 01 December 2010 | Comments (6) | categories: Show-N-Tell Thursday

Comments

  1. posted by Peter on Thursday 02 December 2010 AD:
    Nice work, but don't forget that you can get a web page through pure LS too: { Link }

    /Peter

  2. posted by Peter on Thursday 02 December 2010 AD:
    Do you have a link to some technote or something saying that the web retriever has been discontinued? I'm still using it in 8.5.2...

    I definately agree that there are several disadvantages in using it, but it also has some advantages.

  3. posted by Stephan H. Wissel on Thursday 02 December 2010 AD:
    @Peter
    the web retriever task has been discontinued in R7, the code breaks on clients easily, you need to modify your database design, doesn't work for XML and doesn't allow clipping.

    Too many disadvantages for pure LS. Even IBM suggested Java. I like the HTTP Client over bare bone URLConnection since it handles cookies, signon and redirections.
  4. posted by Stephan H. Wissel on Friday 03 December 2010 AD:
    The task is still there (fiercely backward compatible) but hasn't got any upgrade for a long long time. That's what I meant with "discontinued"
  5. posted by Hinshaw on Sunday 05 December 2010 AD:
    Awesome work,i think that there is nothing can be perfectible.
  6. posted by Mark Haller on Wednesday 07 November 2012 AD:
    Hi Stephan

    Love this post. Is this still your preferred method for retrieving a webpage in LS? Reason I ask - I'm fed up trying to fix msxml3 errors on my servers and need an alternate after using MSXML for a loooong time!

    Just followed you on Twitter (@LogicSpot)

    Would love some help soonest! Emoticon smile.gif

    Mark