18 Apr 2016
There are various text mining libraries, packages and tools available, many of them as freeware. Yet, when it comes to putting it all together in an enterprise environment, there is actually not too much information available on the web. This article is about how I would design a general-purpose text mining engine that is fit for today's standard Java-stack enterprise environment and the typical problems one encounters in these environments. A lot of what I write below is from hands-on experience with existing tools and the typical difficulties I had.
10 Jan 2016
I had assumed that reading from and writing to files in Apache Camel v2.16.1 should be a straight-forward thing to accomplish. Turns out I was wrong. It took me quite a while to figure out the correct syntax of the
04 Oct 2015
Skill cartridges built with Luxid 7 usually contain a mix of customized and standard software artefacts. These artefacts can be data artefacts such as tailored vocabularies or taxonomies, syntactic or similar rules to extract certain types of entities, or they can be a set of configuration files that parameterize the skill cartridge at hand. For this reason, skill cartridges must be treated as productive code and must therefore be subject to a build and deployment process as well as be checked into a version control system. The good news is that Temis has made it really easy to set up your own version of this process. The bad news is that at least in Luxid 7.0.1 there does not seem to exist any documentation on the corresponding tools.
21 Sep 2015
I wanted to know whether/how it is possible to embed R in a website. Looking around the internet I found a few interesting initiatives, each one dedicated to a slightly different purpose: RStudio, Shiny, Jupyter Notebook, RApache, OpenCPU and RAppArmor.
02 Sep 2015
As I was not able to find any tutorials on the web on how to use Temis Luxid 7.0.1 Webstudio, I simply decided to write my own. Luxid Webstudio is a tool that is intended for different use cases. One thing it does very well is to assist a taxonomy expert to build a new taxonomy or enrich an existing one with new terms. Furthermore, once a taxonomy is created it can be "plugged in" to the STF skill cartridge, which then is able to extract all the taxonomy terms from documents. By exporting this customized skill cartridge from Webstudio, you can simply deploy it to a dedicated annotation server running in your production environment. Some of Webstudio's functionality overlaps with the Eclipse-RCP based Luxid 7.0.1 Annotation Workbench, however Webstudio is simply more comfortable to use. Only in some cases it is necessary to switch to Annotation Workbench because it exposes even more functionality to the user than Webstudio does.