An appropriate quote: "If you can't intelligently argue for both sides of an issue, you don't understand the issue well enough to argue for either."
There are many people for whom the declarative paradigm is a huge plus. I would say there are at least 2 major approaches in running fast neural networks: 1. Figure out the common big components and make fast versions of those. 2. Figure out the common small components and how to make those run fast together.
Different libraries have different strengths and weaknesses that match the abstraction level that they work at. For example, Caffe is the canonical example of approach 1, which makes writing new kinds of layers much harder than with other libraries, but makes connecting those layers quite easy as well as enabling new techniques that work layer-wise (such as new kinds of initialization). Approach 2 (TensorFlow's approach) introduces a lot of complexity, but it allows for different kinds of research. For example, because how you combine the low-level operations is decoupled from how those things are optimized together, you can more easily create efficient versions of new layers without resorting to native code.
After being exposed to several declarative tools during my career, I must say they age poorly: make, autoconf, Tensorflow, and so on. They may start out being elegant, but every successful library is eventually (ab)used for something the original authors didn't envision, and with declarative syntax it descends into madness of "So if I change A to B here does it apply before or after C becomes D?"
At least Tensorflow isn't at that level, because its "declarative" syntax is just yet another imperative language living on top of Python. But it still makes performance debugging really hard.
With PyTorch, I can just sprinkle torch.cuda.synchronize() liberally and the code will tell me exactly which CUDA kernel calls are consuming how much milliseconds. With Tensorflow, I have no idea why it is slow, or whether it can be any faster at all.
I believe that make's declarative is not the cause of it's problems at all - it's poor syntax and lack of support for programming abstractions is what makes it clunky to use.
Something like rake, which operates on the same fundamental principles (i.e. declarative dependency description) but using ruby syntax has aged better.
I took the article to be the counterpoint to the uninhibited praise of TF. In that light, I don't think it was meant as a balanced assessment of the whole product, but had a narrow scope of simply pointing out a handful of flaws that he thinks isn't discussed enough.
It's the same feeling when you hate a movie that everyone gives five stars: you might agree with some aspects of the praise (or even most of it), but that's not what you're going to be talking about. You'll talk about how and why it sucks compared to better movies.
I'd guess he could make a strong pro-TF argument if desired, but that just wasn't the point of this post.
The assumption that there are always two intelligent sides to an issue is a pretty big assumption. If you understand both sides of an issue really deeply and you choose side B and are against side A, you should be able to argue intelligently for side A otherwise your choice of side B is not made intelligently, but this falls down on further examination.
If you believe that side B is correct and side A is incorrect given your deep understanding of the issue then an argument for side A is in some way not intelligent because you must keep out your most potent arguments for side B from your argument for side A - you must deny their existence in your head and thus argue from a less intelligent position than you normally would.
The ability to argue both sides is only really possible when all sides are considered trivial in their differences.
Despite its shortcomings, I share the same vision as this article. Here are my reasons:
- Tensorflow has a way too large API surface area: parsing command lines arguments handling, unit test runners, logging, help formatting strings... most of those are not as good as available counterparts in python.
- The C++ and Go versions are radically different from the Python version. Limited code reuse, different APIs, not maintained or documented with the same attention.
- The technical debt in the source code is huge. For instance, There are 3 redundant implementations in the source code of a safe division (_safe_div), with slightly different interfaces (sometimes with default params, sometimes not). It's technical debt.
In every way, it reminds me of Angular.io project. A failed promise to be true multi-language, failing to use the expressiveness of python, with a super large API that tries to do things we didn't ask it to do and a lack of a general sounding architecture.
I think the author raises a good point about Google envy. TensorFlow is not the most intuitive or flexible library out there, and it is very over-engineered if you're not doing large-scale distributed training. The main reason why everyone talks it up so much is because Google heavily marketed it from the outset, and everyone automatically assumes Google == Virtuoso Software Design because they couldn't make it through the interview. Really it's just modern enterprise software which has five different ways to implement batch norm that they push on the community so they don't have to train new hires on how to use it.
Or maybe it is built by a company that is doing large-scale distributed training, and they open sourced it not to cater to every need, but to help others trying to do the same thing they are. Companies are under no obligation to make sure their open source is well suited for others use cases.
That was kinda my point, it's not the be-all deep learning library because they made it for their own use case, but its towering popularity (as in 10x the number of stars of other popular libraries) is not genuine.
Also I highly doubt that the main reason Google open sourced it was to be charitable.
There is little analytical or "detailed" about this post. The most complex model is y = 3*x, the author provides no evidence to back up any claims about adoption, difficulty of use, etc., and most of the author's complaints boil down to a lack of syntactic sugar.
I'm open to a discussion about the downsides of tensorflow, which is why I read the article in the first place, but this post doesn't provide that.
I'm probably being overly cynical, but this is (indistinguishable from) a "growth-hack" submarine article by the author to promote their tool. There is hardly any substantiation to support the assertions. Tucked right at the end:
> If you want a beautiful monitoring solution for your machine learning project that includes advanced model comparison features, check out Losswise. I developed it to allow machine learning developers such as myself to decouple tracking their model’s performance from whatever machine learning library they use
I'm actually pretty ok with these types of articles. They are generally well-researched and well-written—giving technical introductions to important concepts.
As always, it is important to be wary of the reasons that an author writes an article. If there is an advertisement at the end, then the author motivations (at least in part) are clear. But I often find that promoters of new systems and tools are able to present excellent critiques of established tools and practices. New things are USUALLY made to address the shortcomings of existing things. You as a reader have to parse whether their arguments are sound and maybe do some more research before you can make a sound judgement on the matter.
There are a few categories that I think TensorFlow is notably strong in. Namely:
1. Deployment.
2. Coverage of the library / built-in functionality.
3. Device management.
For more details, I wrote a comparison of PyTorch and TensorFlow (mostly from a programmability perspective) a couple months back. Interested readers may find it helpful. https://awni.github.io/pytorch-tensorflow/
This article is not that detailed, but it's a sentiment I agree with, so I'll add one major shortcoming of Tensorflow: its memory usage is really bad.
The default behavior of TF is to allocate as much GPU memory as possible for itself from the outset. There is an option (allow_growth) to only incrementally allocate memory but when I tried it recently it was broken. This means there aren't easy ways to figure out exactly how much memory TF is using (e.g. if you want to increase the batch size). I believe you can use their undocumented profiler, but I ended up just tweaking batch sizes until TF stopped crashing (yikes).
TF does not have in-place operation support for some common operations that could use it, like dropout (other operations do have this support, I believe). Even Caffe, which I used for my research in college, had this. This can double your GPU RAM usage depending on your model, and GPU RAM is absolutely a precious resource.
Finally, I've had issues where TF runs out of GPU RAM halfway through training, which should never happen - if there's enough memory for the first epoch, there should be enough memory for every epoch. The last thing I want to do is debug a memory leak / bad memory allocation ordering in TF.
The default behavior of TF is to allocate as much GPU memory as possible for itself from the outset. There is an option (allow_growth) to only incrementally allocate memory but when I tried it recently it was broken. This means there aren't easy ways to figure out exactly how much memory TF is using (e.g. if you want to increase the batch size).
There is also per_process_gpu_memory_fraction, which limits Tensorflow to only allocate that fraction of each visible GPUs memory. It's still not great, but has been helpful in keeping resources free for models that do not need all the GPUs memory.
It seems to me the reason for insisting on a verbose declaritivization of everything is obvious: it guarantees you can build run/traintime environments which scale your model automatically.
Google’s mindset isn’t “train this model to multiply by three”. It’s “train this model on a 1% sample of search traffic over the last year.” That’s reflected in the design choices of tensorflow.
There are many people for whom the declarative paradigm is a huge plus. I would say there are at least 2 major approaches in running fast neural networks: 1. Figure out the common big components and make fast versions of those. 2. Figure out the common small components and how to make those run fast together.
Different libraries have different strengths and weaknesses that match the abstraction level that they work at. For example, Caffe is the canonical example of approach 1, which makes writing new kinds of layers much harder than with other libraries, but makes connecting those layers quite easy as well as enabling new techniques that work layer-wise (such as new kinds of initialization). Approach 2 (TensorFlow's approach) introduces a lot of complexity, but it allows for different kinds of research. For example, because how you combine the low-level operations is decoupled from how those things are optimized together, you can more easily create efficient versions of new layers without resorting to native code.
At least Tensorflow isn't at that level, because its "declarative" syntax is just yet another imperative language living on top of Python. But it still makes performance debugging really hard.
With PyTorch, I can just sprinkle torch.cuda.synchronize() liberally and the code will tell me exactly which CUDA kernel calls are consuming how much milliseconds. With Tensorflow, I have no idea why it is slow, or whether it can be any faster at all.
Something like rake, which operates on the same fundamental principles (i.e. declarative dependency description) but using ruby syntax has aged better.
It's the same feeling when you hate a movie that everyone gives five stars: you might agree with some aspects of the praise (or even most of it), but that's not what you're going to be talking about. You'll talk about how and why it sucks compared to better movies.
I'd guess he could make a strong pro-TF argument if desired, but that just wasn't the point of this post.
If you believe that side B is correct and side A is incorrect given your deep understanding of the issue then an argument for side A is in some way not intelligent because you must keep out your most potent arguments for side B from your argument for side A - you must deny their existence in your head and thus argue from a less intelligent position than you normally would.
The ability to argue both sides is only really possible when all sides are considered trivial in their differences.
on edit: improved formatting for legibility.
please argue the opposite of this before continuing
Dead Comment
- Tensorflow has a way too large API surface area: parsing command lines arguments handling, unit test runners, logging, help formatting strings... most of those are not as good as available counterparts in python.
- The C++ and Go versions are radically different from the Python version. Limited code reuse, different APIs, not maintained or documented with the same attention.
- The technical debt in the source code is huge. For instance, There are 3 redundant implementations in the source code of a safe division (_safe_div), with slightly different interfaces (sometimes with default params, sometimes not). It's technical debt.
In every way, it reminds me of Angular.io project. A failed promise to be true multi-language, failing to use the expressiveness of python, with a super large API that tries to do things we didn't ask it to do and a lack of a general sounding architecture.
Also I highly doubt that the main reason Google open sourced it was to be charitable.
I'm open to a discussion about the downsides of tensorflow, which is why I read the article in the first place, but this post doesn't provide that.
> If you want a beautiful monitoring solution for your machine learning project that includes advanced model comparison features, check out Losswise. I developed it to allow machine learning developers such as myself to decouple tracking their model’s performance from whatever machine learning library they use
As always, it is important to be wary of the reasons that an author writes an article. If there is an advertisement at the end, then the author motivations (at least in part) are clear. But I often find that promoters of new systems and tools are able to present excellent critiques of established tools and practices. New things are USUALLY made to address the shortcomings of existing things. You as a reader have to parse whether their arguments are sound and maybe do some more research before you can make a sound judgement on the matter.
> There is hardly any substantiation to support the assertions.
How about side-by-side TensorFlow and PyTorch comparison?
...though, for experiment tracking (and experiment running) I recommend https://neptune.ml/.
1. Deployment. 2. Coverage of the library / built-in functionality. 3. Device management.
For more details, I wrote a comparison of PyTorch and TensorFlow (mostly from a programmability perspective) a couple months back. Interested readers may find it helpful. https://awni.github.io/pytorch-tensorflow/
The default behavior of TF is to allocate as much GPU memory as possible for itself from the outset. There is an option (allow_growth) to only incrementally allocate memory but when I tried it recently it was broken. This means there aren't easy ways to figure out exactly how much memory TF is using (e.g. if you want to increase the batch size). I believe you can use their undocumented profiler, but I ended up just tweaking batch sizes until TF stopped crashing (yikes).
TF does not have in-place operation support for some common operations that could use it, like dropout (other operations do have this support, I believe). Even Caffe, which I used for my research in college, had this. This can double your GPU RAM usage depending on your model, and GPU RAM is absolutely a precious resource.
Finally, I've had issues where TF runs out of GPU RAM halfway through training, which should never happen - if there's enough memory for the first epoch, there should be enough memory for every epoch. The last thing I want to do is debug a memory leak / bad memory allocation ordering in TF.
There is also per_process_gpu_memory_fraction, which limits Tensorflow to only allocate that fraction of each visible GPUs memory. It's still not great, but has been helpful in keeping resources free for models that do not need all the GPUs memory.
Google’s mindset isn’t “train this model to multiply by three”. It’s “train this model on a 1% sample of search traffic over the last year.” That’s reflected in the design choices of tensorflow.