I would say if a startup overhead time of < 10 second bothers you, you're no...

danpalmer · on Jan 26, 2013

I'm mostly going to use this for parsing XML into some other formats and getting it into SQLite databases I think. The reason I would like to use Drake over 'raw' Python scripts is because it supports a lot of the mundane stuff that goes around the actual processing of the data, and I want to automate the processes.

I typically deal with sub-100MB XML documents, so processing them takes very little time, but having the quick iteration of changing the format and re-outputting is a key part of the development cycle for me, and I think very useful when you are experimenting with new data and seeing how it could be used. Doing quick transforms is awesome.

aboytsov · on Jan 26, 2013

Drip now works with Drake! Yes, it's still less than ideal if you're calling Drake hundreds of times from an automated script which you need to run quickly, but for interactive development, it should work just fine:

https://github.com/Factual/drake/wiki/Faster-startup:-Drake-...

aboytsov · on Jan 25, 2013

It's a good point, and I agree it might not be the top priority, but I also understand the frustration. I, too, find 5s start up file rather irritating especially when I make errors in the workflow file, or didn't specify targets correctly. So, we are in search of ideas on how to fix it.

abraininavat · on Jan 25, 2013

Did you consider ClojureScript and V8? Are there downsides to ClojureScript that would lead you to use Clojure instead?

aboytsov · on Jan 25, 2013

To be honest with you, no, we didn't seriously consider it. Maybe we should have. I do not know if ClojureScript would be able to work with all the dependencies we have (for example, Hadoop client library to talk to HDFS). But it's a good point nevertheless. I'll mention it in https://github.com/Factual/drake/issues/1.

Thanks!

danpalmer · on Jan 26, 2013

I didn't realise originally that Drake integrated with HDFS. Thats a really awesome feature, and I can see why the JVM made sense in development because of existing HDFS libraries.

abraininavat · on Jan 26, 2013

Thanks for the response! I ask because I have an idea for a CLI program, and I want to write it in Clojure, but I'm worried about the startup time of the JVM. As I understand it, this issue is mitigated in Drake by the fact that a typical job will crunch lots of data and therefore take lots of time. That's not the case for my program, it needs to be quick.

aboytsov · on Jan 26, 2013

Yes, startup times are a pain. As of this morning, Drake now works with Drip, which is a nifty tool to bring down start up times. It spins "backup" JVMs, so next time you run the command, JVM is ready. It works great for interactive environments where at least several seconds pass between runs, but won't do much if you need to run Drake several times per second from an automated script.

Another option is Nailgun, but it has its limitations, too.

None if this is ideal. If you want to write a very simple CLI program, keep this in mind. You may want to stay away from JVM.