wissel.net

Usability - Productivity - Business - The web - Singapore & Twins

A filtering proxy server with vert.x


Scenario

You have this nice application running in your (cloud or on premises) environment and then a big scare hits. Suddenly you need to remove or mask different streams of data depending on all sorts of conditions your legal department is torturing advising you.

Until your applications natively can do that, you might resort to a content filter that sits as a proxy between you and the application (technically it is a reverse proxy, but that's fine print).

To explore the feasibility of such an approach I created SampleProxy based on work of Julien Viet using vert.x as my runtime environment.

Requirements

  • Needs to be a content filter, not a URL blocker
  • Need to provide functionality for practical use out-of-the-box, but needs to be extensible (configuration over code)
  • Need to be able to filter HTML, JSON, XML and Text. No need to filter binary formats. Contemplating about JavaScript (you could use the text filter for that)
  • Filter based on mime-type and URL as standard, but extensible to use anything in the request or reply to decide what to filter
  • Configurable FilterChain: a filter decides what to filter (with the mime-type as minimum condition) and hands actual filter operation to a chain of subfilters that do the actual stream manipulation
  • configurable subfilters. E.g. a filter that can remove JSON nodes from JSON data should read the qualifier from a configuration, so the same filter class can be reused for different filter purposes
  • CSS isn't on the radar yet, but contributions would be happily accepted

The flow

Flow from browser to proxy to application and back

Things I learned along the way

There are always a few lessons to be had, here are some from this project:

  • http is a chunked beast. When you send larger amount of content, probability approaches 1 that your server uses chunked - until HTTP/2 resolves us from it. A hard choice needs to be made to either use a stream based processing of a chunk (think SAX) or collecting the Junks to be able to process a DOM. To be fully flexible I opted for a DOM/Object based approach, but you are free to create whatever you deem necessary
  • Jsoup is a reliable HTML parser. It supports CSS selectors that make addressing HTML elements a breeze. Solves one of the hardest problems: targeting
  • Targeting JSON data is much harder that it needs to be, the very moment Arrays appear in your JSON structure. There is RFC6901 JSON Pointer, but it targets exactly one element, while a typical use case would be: "from the list (array) of discussion posts, pick the list of comments and those who have an eMail, mask them". So I implemented 2 variations: a simple path style address /discussion/posts/comments/email which automatically traverses arrays and an XPath based approach where I convert JSON to a strict XML syntax and back. More detail here, examples in a future post

Items on the ToDo list

  • Better documentation
  • Code cleanup
  • Tests
  • Deploy to Heroku button
  • More filters

Your turn

Go check it out and let me know what you think! (Yeah - documentation needs some work).

Caveat (a.k.a disclaimer): this is a prototype and work in progress, YMMV!


Posted by on 04 March 2018 | Comments (0) | categories: Heroku Java vert.x

Comments

  1. No comments yet, be the first to comment