Fabian Kostadinov

How I Created a Simple DBMS using GitHub, Jekyll, Prose and Heroku

Recently, I created a simple database management system including user rights management relying on GitHub, Jekyll, Prose, Heroku and a few other open source products. The basic idea is to store all data inside a _data directory in a GitHub repository. A user can access this data through a website (Prose) and manipulate it through a HTML form (JSONForm), but she needs to be authorized to do so (Jekyll-Auth). The solution also offers a simple, yet functional search functionality. Here is a list of products I relied upon:

First, a general understanding of Jekyll's data files feature is needed. Current versions of Jekyll server allow to create a directory named /_data. All valid CSV-, JSON- or Yaml-files added to this directory are queried by Jekyll. This allows one to create Liquid-templates that display aggregate information from all these files. These templates can easily be included in a standard HTML page. This is the basic functionality we rely upon.

I further installed Jekyll-DB as part of my website. According to its documentation it can loop through all posts in a _post directory, but Jekyll-DB works just as fine with data files kept in _data. Jekyll-DB is really easy to install and works well with a few hundred file entries. It's not very well suited for full text search in free text forms though that contain longer text entries. But for my purposes it was sufficient. Jekyll-DB is shown in the next picture. In the example, all entries containing the string "tomat" are filtered and displayed.

Jekyll-DB Screenshot

The generated HTML pages are part of a whole website that must be hosted by a web server. Of course we could simply put all our files in a GitHub repository's gh-pages branch. Jekyll running on GitHub would then host and serve the full website. This indeed works. The only problem is that I had to handle more sensitive data. Remember: Even if the repository itself is private, whatever is published into the gh-pages branch is available to the whole world. Thus, I could not rely on GitHub's internal Jekyll server. Of course I could have installed and run a Jekyll server somewhere on my own machine, but this would imply that I needed to run my own (virtual) server, secure the server environment, maintain it etc. Furthermore, for every update pushed to the GitHub repository I'd manually have to update the website's content on my server.
Fortunately, there's just another handy tool for exactly this situation: Jekyll-Auth. I have written an extensive article on how Jekyll-Auth works. Basically, Jekyll-Auth is a Gem file plus a collection of Ruby files that are added to your website which can be installed using Bundler. By calling bundle install a Jekyll server is installed on a Heroku account acting as a web server for your website. The deal is that whenever a person tries to access the website the user has to provide GitHub username and password. Jekyll running on Heroku then tries to authenticate and authorize the user by sending an authorization request to GitHub. GitHub checks whether the user is actually a member of a GitHub organization and team that is allowed to view and modify content stored in the GitHub repository. GitHub then returns an authorization token to Jekyll running on Heroku that either allows or denies access to the website.

Jekyll-Auth by itself is not updated when a user pushes changes to the GitHub repository. In earlier days we could have used something like Heroku Deployer relying on a webhook to achieve this, but nowadays there's built-in support to integrate Heroku with GitHub. With a few clicks in my Heroku's dashboard I can easily configure Heroku to update whenever a (push) event is triggered at my GitHub repository.

Let's just quickly recall what we have now. First, we can store CSV-/JSON-/Yaml- data files inside a _data directory and use Liquid templates to dynamically generate HTML output files aggregating data from all these data files. Second, we have a rudimentary search capability over all data files by using Jekyll-DB. Third, using a combination of private repository with Jekyll-Auth running on Heroku we can actually grant access to out website to only those users who are also members of our GitHub organization and team. Not bad.

What is still missing is a capability of easily manipulating data files. Of course we could simply do this either in our local clone of the GitHub repository and push our changes upstream. Or we could do this directly online through GitHub's website. For my users I wanted to have something more user-friendly. Usually users are presented with an input form containing text and number fields where they can manipulate the data and then simply press a Submit button to validate the entered data and send the changes to the server.

Because my data files were all in a JSON format, I decided to use a JSON schema aware input form generator tool like JSONForm or Alpaca Forms. I chose JSONForm but since I ran into a few bugs and shortcomings (e.g. missing multi-selection lists, some CSS problems when trying to integrate JSONForm into an existing HTML page with its own CSS) and because it is no longer actively developed I'd probably recommend something else now. The next picture shows a JSONForm-based input form embedded in the Prose editor. Currently, a file 101-tomaten-prod.json is opened.

JSONForm Prose Screenshot

One must be aware that these client-side input form frameworks can of course only provide client-side form validation, which is naturally limited by functionality and insecure. However, server-side validation would require a significantly more complicated solution when using Jekyll running on Heroku. For my purposes, client-side validation was enough.

JSONForm helped me to create input forms, but I still needed a basic framework where to embed these forms. The framework needs to accept user input, call JSONForm's validation routine, as well as add, commit and push data to the GitHub repository. This can be done either directly through the GitHub API or with the help of a Javascript library such as js-git or maybe the (outdated?) Git.js. However, I would probably have ended up still writing quite an amount of code myself. So, instead of just using a Javascript library, I downloaded Prose and tweaked it a little. Prose is basically an online file editor backed by GitHub. You can get the source code yourself and run it on your own server, or you can use the free online version running at Prose.io. My idea was that whenever a user would open a file that matches a certain file name pattern - for example *-blah.json - prose would not open the standard text editor but instead open an input form fitting on this file type backed by JSONForm. I must admit that my solution is quite an ugly hack, but it was not too easy for me to see through Prose's design as the documentation is, uhm, somewhat lacking details that would make your life a little easier. I ended up basically modifying only a few files:

As I said my code is quite a hack, but if anyone is interested in the source code I can provide it.

One should be aware that because Prose requires a running gatekeeper server that handles authorization and authentication with GitHub, I had to also install the gatekeeper server on Heroku. Thus, I ended up running two different server applications - Jekyll and Prose Gatekeeper.

Finally, I had everything in place. With Prose and JSONForm I provided a framework for data manipulation through user input forms. In combination with all the other things I finally had created a simple, yet effective database management system based on GitHub, Jekyll, Prose and Heroku.

If you want to try out yourself, we made the code publicly available under https://github.com/Eaternity/eaternity-database-public.

comments powered by Disqus