Fabian Kostadinov

Yes, but I'm a data scientist...

Being a data scientist is no excuse for writing sloppy code. Yeah, I know that Java is not your first coding language, but you should really not write spaghetti code.

Being a data scientist does not mean you don’t have to check in your code to Git. Oh, and this definitely also includes your iPython notebooks. Yes, that’s considered code too.

There is no need to demonstrate your superior Python skills to everyone else in the company by writing low-level code because you absolutely cannot do with the standard libraries. Use the libraries, you fool.

Yet, of course, being a data scientist does not mean you should use your favourite open source library just because it’s so much cooler/faster/better than anyone else’s. Have you ever considered the legal implications of using a GNU licensed library in your enterprise environment? So you’re certain your code will never leave the enterprise?

Just because you love R it’s still not a good idea to call an R function from within productive Java code. Really, it isn’t.

Learn your language. R is not per se fast just because it contains lots of fancy, advanced statistical functions that other languages don’t have. There’s actually very few people who really understand what’s going on in a typical piece of R code.

Being a data scientist you should actually be concerned about your software engineering colleagues. They’re not dumb asses, they just studied a different subject at university. Just because you know the maths behind SVMs better than they do does not mean you should build them in Python using low-level functions when there are alternatives around that will make the life of your software engineering colleagues much easier because it actually means they don’t have to rewrite your entire code in a different language like Java.

Creating a model does not mean it won’t have to be maintained.

There’s actually a reason why such things as application servers exist, even if you might not know them.

HTFS and Spark is not solution architecture.

You should learn Scala.

Like it or not, being a data scientist does not mean you will never have to sell, promote and explain your work to a bunch of ignorant managers. Just keep in mind, they’re the ones who decide about your salary and promotion. Not you.

comments powered by Disqus