How simple is simple? (from MongoDB to Elasticsearch)

IT must be easy, oh yeah. After years of courses and university and everything else, just because they were put together by the brightest educational minds you should be able in a second to program this new web application, integrate external web services, add a new GUI and adapt your friend’s broken Excel import, all while cleaning up grandma’s computer for the hundredth time. Even kids are to be taught programming, so it can’t be much in it right? How MANY times I heard this told in various ways “it can’t be so difficult”… Everybody has challenging jobs but as soon these involve computers everything should become a breeze, somehow. When did we get into this???

Trouble

But I digress. So, first a bit of context: I thought it would be big time I followed my own thinking from a month ago (see “Proof of concept, with a concept”) and actually migrate the proof of concept code from “Ractive.js and vert.x integration over the event bus” to use Elasticsearch. Why Elasticsearch, you might ask? Well, there’s this schema-less requirement as my data to be stored can look like anything, and my experience (and everybody’s) using a SQL database when you can’t decide your schema is not exactly optimal. I took a shot first at Cassandra (longer story) but when I realized I would need to implement all my flexible searching based on a database which indexes only a few things, I got scared and the alternative came up obvious: Elasticsearch.

My starting point was based on:

  • ractive.js in browser, as frontend
  • vert.x as platform, bus and whatever
  • MongoDB vert.x module for data storage backend

…and the hoped output status was:

  • ractive.js in browser, as frontend
  • vert.x as platform, bus and whatever
  • Elasticsearch vert.x module for data storage backend

Backend module swap… plug out, plug in, should be a breeze, right? If you ever developed software with your own hands you already know the answer: heck no.

First of all, after simply including the vert.x module the whole thing wouldn’t start anymore. It also logged a funny exception in the Elasticsearch module “NoSuchMethodException” for init(). Whaat?? Going back to RTFM for the module I found nothing about that particular error, but one note that it needs *drumroll* dependency injection. How could I overlook that, such a huge hammer to fix a tiny vert.x module… so I felt compelled to invest some googling time to hopefully understand why DI at all. Reading mailing lists revealed that the module creators worked on a project integrating also Jersey with vert.x and I think this dependency fell somehow as collateral damage. I may be wrong but I didn’t check the code itself, nor asked the developers – already beyond my actual scope. So, HK2 in, error out. Good.

At this moment I thought I’d install Gradle IDE in Eclipse to see how’s life without the “gradlew something” command line. Should I accept that “Do what the fuck you want public license”? What?! o_O After I recovered from the initial shock I clicked “accept”, installed the studio, imported my project… not worky. Eclipse Gradle won’t find mod.json to properly handle vert.x modules. Hm, I don’t know whether Gradle for Eclipse is just as broken as M2E Maven for Eclipse is, or this time it’s just the convoluted vert.x module system… at least from what I understand, the upcoming vert.x 3 will give up modules. Note to me: check Gradle IDE again later this year, and until then back to the command line.

After a few more hiccups – the HK2 dependency version was actually unpublished, Gradle was grinding for minutes and minutes successfully doing nothing even with correct proxy settings, my Git repository got somehow corrupted, some documentation discrepancies – I finally managed to get the application stack starting. This means I got to the point of actually migrate the loading and initializing code, phew.

Changing the test data was straightforward, did that and wanted to see how the Elasticsearch gets filled. Whoops, errors and errors with MissingIndexException. Turns out that the vert.x ES module has no “put” support so there’s no way to add the index programmatically in case it’s not existing. It’s actually not so bad, indexing a test document will create the index if found missing (which means just a bit of new code). Or, I could make use of the great Marvel and Sense web interface of ES, perfect for such small additions.

It turns out that the vert.x ES module is also missing delete or bulk operations. Paginated search wasn’t working properly either because there was no way to tell ES which result page to fetch. As I’m not using ES sharding (my use case will likely not need it) I was thinking to make use of the simple search so… I quickly added the missing “from” pagination parameter and while I was at it, also the “delete” operation, tested/documented them and sent a pull request to the vert.x ES module developers on GitHub.

With everything working as expected, it was time to go home happy – not without a few learned lessons:

  • “easy” means actually searching, thinking, trying, fixing (but I already knew that)
  • If you use open source and are able fix something, just do it! Folks will be thankful.
  • All this investment, for no real use so far… I need some search frontend to this application! (to be continued…)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s