Yes, but I'm a data scientist...
Being a data scientist is no excuse for writing sloppy code. Yeah, I know that Java is not your first coding language, but you should really not write spaghetti code.
Being a data scientist is no excuse for writing sloppy code. Yeah, I know that Java is not your first coding language, but you should really not write spaghetti code.
There are various text mining libraries, packages and tools available, many of them as freeware. Yet, when it comes to putting it all together in an enterprise environment, there is actually not too much information available on the web. This article is about how I would design a general-purpose text mining engine that is fit for today’s standard Java-stack enterprise environment and the typical problems one encounters in these environments. A lot of what I write below is from hands-on experience with existing tools and the typical difficulties I had.
I had assumed that reading from and writing to files in Apache Camel v2.16.1 should be a straight-forward thing to accomplish. Turns out I was wrong. It took me quite a while to figure out the correct syntax of the from
and to
commands.
Skill cartridges built with Luxid 7 usually contain a mix of customized and standard software artefacts. These artefacts can be data artefacts such as tailored vocabularies or taxonomies, syntactic or similar rules to extract certain types of entities, or they can be a set of configuration files that parameterize the skill cartridge at hand. For this reason, skill cartridges must be treated as productive code and must therefore be subject to a build and deployment process as well as be checked into a version control system. The good news is that Temis has made it really easy to set up your own version of this process. The bad news is that at least in Luxid 7.0.1 there does not seem to exist any documentation on the corresponding tools.
I wanted to know whether/how it is possible to embed R in a website. Looking around the internet I found a few interesting initiatives, each one dedicated to a slightly different purpose: RStudio, Shiny, Jupyter Notebook, RApache, OpenCPU and RAppArmor.