apl (u/apl) - Readit News

apl commented on Ask HN: Which algorithms based on biological ideas do you know? · Posted by u/labarilem

Speculative comments: evolving neural network architectures may be where EAs prove their worth.

Plenty of work on it but it’s early days and we need massive, massive, compute power.

apl · 4 years ago

Neural architecture search (NAS) is a thing! But it's almost exclusively based on meta-gradients. Again, wouldn't put my money on GAs ever outperforming gradient-based methods again.

apl commented on Ask HN: Which algorithms based on biological ideas do you know? · Posted by u/labarilem

api · 4 years ago

IMHO genetic algorithms are waiting for their AlexNet paper:

https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6...

The approach is clearly valid but I feel like there's some missing pieces in making it work effectively in a digital context. I played around with this stuff a lot in college and my take was that the evolvable encoding problem (a form of representation problem) is fairly major and not really solved. There are also some unsolved issues around evolutionary dynamics and evolutionary game theory, which means how to structure the population and the "game" for best results. Not sure how much progress has been made since then on these, but my impression is not much.

The main area where GAs have seen use in the field so far is optimization problems with a lot of parameters related in unknown or hard to model ways or where a closed form solution is unknown or computationally too expensive (NP). These include shipping and air travel routing (traveling salesman with multiple optimization goals like distance + time + fuel + depreciation), circuit board and IC layout, antenna design for exotic RF modulations, drug discovery, materials science, etc. Problems like these are fairly easy to map to a GA, and current GAs are pretty good at finding local maxima in these functions.

Still those are a far cry from "design ex nihilo," which is really the promise that evolutionary computation carries. Those applications are using GAs as a bigger brother to things like Monte Carlo and simulated annealing.

One area that I'd look into if I were doing this now would be a hybrid approach where a GA is used to design deep learning architectures and their associated parameters. Seems like it could be very powerful but damn would that ever take a lot of computing power. The fitness function of the GA would consist of N deep learning runs for each candidate. Luckily GAs are parallelizable to an almost unlimited degree.

apl · 4 years ago

An AlexNet/ResNet-type moment may be in the cards for GAs, but I wouldn't put any money on it. They're typically only one better than brute-force. This can be good enough (and is certainly easy to implement), but if you can get a gradient for your problem -- you should use that. And nowadays, you can typically get a gradient!

Most recent advances in the fields you mentioned were driven by gradient-based optimization (e.g., drug design, routing, or chip design: https://www.nature.com/articles/s41586-021-03544-w).

Nature can't SGD through genomes but has a metric ton of time, so evolution might be near-ideal for sexual reproduction. We typically don't have billions of generations, trillions of instantiations, and complex environments to play with when optimizing functions... It's telling that the fastest-evolving biological system (our brain!) certainly doesn't employ large-scale GA; if anything, it probably approximates gradients via funky distributed rules.

EDIT: The most modern application I can think of was some stuff from OpenAI (https://openai.com/blog/evolution-strategies/). But the point here is one of computational feasibility -- if they could backprop through the same workload, they would.

apl commented on How to train large deep learning models as a startup assemblyai.com/blog/how-t... · Posted by u/dylanbfox

apl · 4 years ago

Several hints here are severely outdated.

For instance, never train a model in end-to-end FP16. Use mixed precision, either via native TF/PyTorch or as a freebie when using TF32 on A100s. This’ll ensure that only suitable ops are run with lower precision; no need to fiddle with anything. Also, PyTorch DDP in multi-node regimes hasn’t been slower or less efficient than Horovod in ages.

Finally, buying a local cluster of TITAN Xs is an outright weird recommendation for massive models. VRAM limitations alone make this a losing proposition.

apl commented on Analyzing the performance of Tensorflow training on M1 Mac Mini and Nvidia V100 wandb.ai/vanpelt/m1-bench... · Posted by u/briggers

fxtentacle · 5 years ago

"trainable_params 12,810"

laughs

(for comparison, GPT3: 175,000,000,000 parameters)

Can Apple's M1 help you train tiny toy examples with no real-world relevance? You bet it can!

Plus it looks like they are comparing Apples to Oranges ;) This seems to be 16 bit precision on the M1 and 32 bit on the V100. So the M1-trained model will most likely yield worse or unusable results, due to lack of precision.

And lastly, they are plainly testing against the wrong target. The V100 is great, but it is far from NVIDIA's flagship for training small low-precision models. At the FP16 that the M1 is using, the correct target would have been an RTX 3090 or the like, which has 35 TFLOPS. The V100 only gets 14 TFLOPS because it lacks the dedicated TensorRT accelerator hardware.

So they compare the M1 against an NVIDIA model from 2017 that lacks the relevant hardware acceleration and, thus, is a whopping 60% slower than what people actually use for such training workloads.

I'm sure my bicycle will also compare very favorably against a car that is lacking two wheels :p

apl · 5 years ago

Hard disagree. V100s are a perfectly valid comparison point. They're usually what's available at scale (on AWS, in private clusters, etc.) because nobody's rolled out enough A100s at this point. If you look at any paper from OpenAI et al. (basically: not Google), you'll see performance numbers for large V100 clusters.

apl commented on TensorFlow, Keras and deep learning, without a PhD codelabs.developers.googl... · Posted by u/blopeur

0-_-0 · 6 years ago

For what it's worth, I've found Pytorch to be much more rigid than TF. Maybe I just haven't found the easy way to do things. For example here's a function that applies an N×N box filter to all but the first 2 dimensions of a tensor (apologies to mobile users):

    def boxfilter(image, N=3):
        shape = image.shape
        image = tf.reshape(image, [1, shape[0], shape[1], -1])
        C = image.shape[-1]
        conv = tf.nn.conv2d(image, tf.eye(C, C, [N, N]), 1, "SAME")[0]
        return tf.reshape(conv, shape) / (N * N)

Is there a simple way to do this in Pytorch? Preferably without having to inherit from the base class for convolution. It seems to me that Pytorch is like Keras and Tensorflow is like Numpy.

apl · 6 years ago

You can almost 1:1 translate this by swapping "tf" and "torch". No need to use nn.Conv2d -- there's a functional API for all these layers:

https://pytorch.org/docs/master/nn.functional.html#conv2d

Torch doesn't have "same" padding, so you have to manually calculate the correct padding value for your input/output shapes.

apl commented on Jax – Composable transformations of Python and NumPy programs github.com/google/jax... · Posted by u/lelf

chrisaycock · 6 years ago

Here is an earlier submission of JAX from December 2018:

https://news.ycombinator.com/item?id=18636054

JAX is pretty neat because it is effectively a derivatives compiler: it can automatically differentiate a function and JIT compile the result. This makes training in machine learning both fast and easy because gradient descent no longer has to be written by hand.

apl · 6 years ago

> gradient descent no longer has to be written by hand

Nobody's been writing derivatives by hand for 5+ years. All major frameworks (PyTorch, Tensorflow, MXNet, autodiff, Chainer, Theano, etc.) have decent to great automatic differentiation.

The differences and improvements are more subtle (easy parallelization/vectorization, higher-order gradients, good XLA support).

apl commented on Being a programmer will make me a better doctor scopeblog.stanford.edu/20... · Posted by u/chmaynard

monksy · 6 years ago

This is one of the frustrating things about what I have when I go to the doctor.

As a developer, I generally want a lot of information in order to pose theories about what is going wrong. They always seem really annoyed by this.

apl · 6 years ago

Mainly because it is genuinely exhausting for any medical practitioner. That lots of patients "enjoy" googling symptoms and coming up with far-fetched self-diagnoses is a given. But couple that with the perceived intellectual superiority of (software) engineers and you get a recipe for disaster. It's the equivalent of a doctor leaning over your shoulder while you're coding and telling you to remove random keywords.

apl commented on Cell Segmentation with U-Net and Others benjamin.computer/posts/2... · Posted by u/onidaito

apl · 6 years ago

For this particular problem, Mask R-CNN would have been the way to go -- it spits out instances as opposed to just deciding, for each pixel, to which class it belongs. Or an SSD (if we don't care about the mask at all).

apl commented on Neural Networks Are Essentially Polynomial Regression matloff.wordpress.com/201... · Posted by u/ibobev

posterboy · 7 years ago

Could one say, in a sense, that recurrence is essentially differential equations and convolution essentially more complicated operations than those of arithmetic polynomials?

This might sound like nonsense. On the one hand, most trivial convolutions use trivial operators; "polynomials" might include higher operations anyhow, or approximate some of the more important ones, none of which is appealing if simplicity equals efficiency. On the other hand, I never really understood diff-eqs; ODEs seem like polynomials over self similar polynomials, to me, hence "recurrent"; All the other diff-eqs I can't begin to fathom.

apl · 7 years ago

There's many perspectives on everything. Deep ConvNets, for instance, can be expressed as a continuously evolving ODE. Here's a fantastic paper on this view:

https://papers.nips.cc/paper/7892-neural-ordinary-differenti...

apl commented on Neural Networks Are Essentially Polynomial Regression matloff.wordpress.com/201... · Posted by u/ibobev

olooney · 7 years ago

Terminology note: data like images and voice which have strong spatial or temporal patterns are actually referred to as "unstructured" data; while data you get from running "SELECT * FROM some_table" or the carefully designed variable of a clinical trial are referred to as "structured" data.

If this seems backwards to you (as it did to me at first) note that unstructured data can be captured raw from instruments like cameras and microphones, while structured data usually involved a programmer coding exactly what ends up in each variable.

As you say, deep neural networks based on CNNs are SOTA on unstructured image data, RNNs are SOTA on unstructured voice and text data, while tree models like random forest and boosted trees usually SOTA on problems involving structured data. The reason seems to be the that the inductive biases inherent to CNNs and RNNs, such as translation invariance, are a good fit for the natural structure of such data, while the the strong ability of trees to find rules is well suited to data where every variable is cleanly and unambiguously coded.

apl · 7 years ago

Yeah, that's right. Doing too little proofreading with HN comments...