Model serving in production is a persistent pain point for many ML backends, and is usually done quite poorly, so this is great to see.
I'm expecting large leaps and bounds for TensorFlow itself.
This improvement to surrounding infrastructure is a nice surprise, just as TensorBoard is one of the nicest "value-adds" that the original library had[4].
Google have ensured many high quality people have been active as evangelists[3], helping build a strong community and answerbase.
While there are still gaps in what the whitepaper[1] promises and what has made it to the open source world[2], it's coming along steadily.
My largest interests continue to be single machine performance (a profiler for performance analysis + speedier RNN implementations) and multi-device / distributed execution.
Single machine performance had a huge bump from v0.5 to v0.6 for CNNs, eliminating one of the pain points there, so they're on their way.
I'd have expected this to lead to an integration with Google Compute Engine (TensorFlow training / prediction as a service) except for the conspicuous lack of GPU instances on GCE.
While GPUs are usually essential for training (and theoretically could be abstracted away behind a magical GCE TF layer) there are still many situations in which you'd want access to the GPU itself, particularly as performance can be unpredictable across even similar hardware and machine learning model architectures.
[2]: Extricating TensorFlow from "Google internal" must be a real challenge given TF distributed training interacts with various internal infra tools and there are gaps with open source equivalents.
[4]: I've been working on a dynamic memory network (http://arxiv.org/abs/1506.07285) implementation recently and it's just lovely to see a near perfect visualization of the model architecture by default - http://imgur.com/a/PbIMI
For profiling of models, almost everything needed is already there. You only need to pass in a StepStatsCollector through the Session::Run() method (I called it RunWithStats() ) and hook it up to the Executor Args by filling in this variable: https://github.com/tensorflow/tensorflow/blob/master/tensorf... You then get a very usable set of profiling statistics out by aggregating the StepStats object. For profiling individual ops on the CPU, perf is very useful.
Derek is also extremely available to his colleagues at Google. He's always friendly when I ask questions, and very thoughtful. I feel lucky to work with him, however distantly! :)
The diagram was generated using a DMN I've implemented in TensorFlow, part of my work at MetaMind.
Those diagrams are useful not just in visualizing the architecture but also for spot checking certain issues.
I was planning on writing up a blog post on TensorFlow in the near future but undecided on the topic.
It could be about implementing something nice and simple[1], maybe an attention based model / language model using RNNs over PTB / etc, or a broader discussion about the good and bad bits of TensorFlow.
I really like TensorFlow, so improving the tutorials seems to be an important step.
Whilst the existing tutorials are a good starting point, more in-depth exploration is trial by fire, made more difficult by construction of the graph being separate from executing the graph[2].
If people have particular topics they'd like to see covered, I'd love to hear about them!
Ping me at smerity@smerity.com or @Smerity.
There is a whole other world of non stochastic gradient descent based algorithms out there; IMO Tensorflow is sensible to stick to one class of algorithms and do it well.
(Disclaimer: I work on mldb, one of the tools on that list).
mldb looks great.
But I was referring to distributed model building, in a horizontal way. Which SparkML does, and TensorFlow says it does.
If they can implement a distributed Gradient Boosting Tree across nodes, maybe even with GPU support (Although I'm not sure if it's applicable) that could be huge.
Once the open source version of Tensorflow releases multi-node support, this would be one way to make it work. There are potential gains from using a GPU for RF training. As for distributing, in my experience for small models it doesn't make much difference and for larger models the cost of distributing the dataset dominates the benefit from having multiple nodes. But an implementation carefully designed for a given node topology could be made more performant.
OTOH, I have a strong prejudice against Javascript on the backend... And its not due to it being dynamic - the same doesn't happen with Python codebases. It is completely irrational.
Which data are you after? The ImageNet data is public and they released the pretrained model.
They've promised to release (or already have released) the models for Exploring the Limits of Language Modeling[1] which was trained on the 1 B Word Benchmark corpus[2] which is also public data.
Note that for these, the trained models are often more immediately useful. The language modelling model was trained for 3 weeks on 32 Tesla K40s. That's not something many can replicate casually.
I'm expecting large leaps and bounds for TensorFlow itself. This improvement to surrounding infrastructure is a nice surprise, just as TensorBoard is one of the nicest "value-adds" that the original library had[4].
Google have ensured many high quality people have been active as evangelists[3], helping build a strong community and answerbase. While there are still gaps in what the whitepaper[1] promises and what has made it to the open source world[2], it's coming along steadily.
My largest interests continue to be single machine performance (a profiler for performance analysis + speedier RNN implementations) and multi-device / distributed execution. Single machine performance had a huge bump from v0.5 to v0.6 for CNNs, eliminating one of the pain points there, so they're on their way.
I'd have expected this to lead to an integration with Google Compute Engine (TensorFlow training / prediction as a service) except for the conspicuous lack of GPU instances on GCE. While GPUs are usually essential for training (and theoretically could be abstracted away behind a magical GCE TF layer) there are still many situations in which you'd want access to the GPU itself, particularly as performance can be unpredictable across even similar hardware and machine learning model architectures.
[1]: http://download.tensorflow.org/paper/whitepaper2015.pdf
[2]: Extricating TensorFlow from "Google internal" must be a real challenge given TF distributed training interacts with various internal infra tools and there are gaps with open source equivalents.
[3]: Shout out to @mrry who seems to have his fingers permanently poised above the keyboard - http://stackoverflow.com/users/3574081/mrry?tab=answers&sort...
[4]: I've been working on a dynamic memory network (http://arxiv.org/abs/1506.07285) implementation recently and it's just lovely to see a near perfect visualization of the model architecture by default - http://imgur.com/a/PbIMI