Advancements in machine learning for machine learning

These ML-compilers are being overhyped. It's all the same trade-off as a traditional compiler: you get a lot more throughput than hiring a specialist performance programmer, but the latter will typically outperform, possibly by orders of magnitude.

These things are inferior at many levels: - Algorithmic: These things aren't feeding back to their human masters tips and tricks on how to modify the network to go faster beyond some very basic signals. - Loss of intent: ML network designers are specifying architecture in python, and by the time it's gone through many layers of lowering, you can get some complete garbage. Highly efficient garbage, but still garbage. (recent example, we caught one of these compilers doing a slice update operation by first forming the range of all possible indices to the array, slicing that to get indices to update, and then doing a scatter; we replaced it with a single memcpy call). - Inefficient kernels. Every time we see the output of these compilers go head-to-head with an expert assembly programmer, the compiler loses, often by 30%+. This always seems like the sort of thing that should be easy to solve, but given no-one seems to have cracked it in the past 50 years, it's obviously not as simple as it sounds.

stabbles · 2 years ago

Take a look at the chess engine Stockfish: they tossed out years and years of human written heuristics in board evaluation, to a small neural net that does the same but better.

Now consider all the heuristics for inlining, loop unrolling, vectorization etc in compilers, certainly a neural net can be beneficial and possibly easier to maintain than tons of human written heuristics.

owlbite · 2 years ago

We'll have to see. I could definitely see someone spending a lot of time training for a specific algorithmic kernel and microarchitecture and beating the best human results (by a few percent).

I'd be very surprised if that can be extended to a large complex algorithmic system that is amenable to mathematical reformulations (at least within the next 10 years).

YawningAngel · 2 years ago

My understanding is that stockfish retains and uses its classical evaluation model in addition to the NNUE model

asah · 2 years ago

big +1 - IMHO the future of optimizers (and probably compilers...) are almost certainly ML-based.

ldjkfkdsjnv · 2 years ago

Humans designing algorithms by hand will go the way of the dodo bird

dbecker · 2 years ago

> These ML-compilers are being overhyped. It's all the same trade-off as a traditional compiler

Funny you should say that. Because traditional compilers have been incredibly useful.

owlbite · 2 years ago

Right, but we still tend to sidestep the compiler and/or spend hours of human time tuning the input to get the right output for core kernels.

JyB · 2 years ago

Comment seem extremely dismissive and close minded.

jhardy54 · 2 years ago

Exactly! Why would anyone use gcc/clang when you can just hire someone to hand-write assembly instead?

summerlight · 2 years ago

> It's all the same trade-off as a traditional compiler: you get a lot more throughput than hiring a specialist performance programmer, but the latter will typically outperform, possibly by orders of magnitude.

That throughput is the point though? You cannot have performance specialists on every single ML workload. It's still significantly better than not having these kinds of optimization.

hotstickyballs · 2 years ago

Hardware (and performance) can always be improved without involvement of users so this is actually pretty useful.

The pace that ML seems to be advancing right now is amazing. I don’t believe in the singularity but it’s changing software and then society in ways no one can predict.

wait_a_minute · 2 years ago

This + FunSearch make it seem like Singularity is imminent.

https://deepmind.google/discover/blog/funsearch-making-new-d...

tommychillfiger · 2 years ago

At great risk of sounding completely ignorant, this approach is basically what I thought the point of machine learning was - cleverly using feedback loops to improve things automatically. The thing that sticks out to me as particularly cool about FunSearch is the use of programs as inputs/outputs and the fact that they managed to automate feedback.

I'm pretty naive in terms of granular understanding here as I am barely proficient in Python, to be clear, but when I daydream about things you could solve with machine learning/AI, this is the approach I always think of and I guess is how I thought it already worked. Load it up with the best information we have currently, define the desired results as clearly as possible, implement some form of automatic feedback, and let it run iteratively until it produces something better than what you had before.

Is this a case of "well no shit, but actually implementing that effectively is the hard part"? Is it being able to quickly apply it to a wide variety of problems? I guess I'm trying to understand whether this is a novel idea (and if so, what parts are novel), or if the idea has been around and it's a novel implementation.

moffkalast · 2 years ago

Some speculate that this is what OpenAI's Q* model is about and what caused the Altman/Sutskever split.

DeathArrow · 2 years ago

For me it's just another gold rush after dotcom, mobile, cloud, VR.

falcor84 · 2 years ago

I'm not sure what the purpose of the word "just" there is. There indeed seems to be quite a lot of gold to be had by whoever gets a foothold.

xbmcuser · 2 years ago

The first 3 have and did result as of today in trillions in dollars of economic activity. And have changed societies, politics, political participation, access to knowledge etc worldwide for good and bad. So I don't get why you are so dismissive of them.

Deleted Comment

greatpostman · 2 years ago

I really don’t people will be programming like we do today in five years

xbmcuser · 2 years ago

I think the biggest blind spot for many programers/coders is that yes it might not change much for them but it will allow many more people to code and do stuff that they were not able to before. As the the models get better and people use them more and learn how to use them more efficiently they will start changing things.

I am hoping we get to the point where the models are good enough that classes in schools are introduced on how to use them rather than just build them as the number of people wanting to or willing to learn programming is a lot smaller than the number of people to looking for ways to do things more efficiently.

euos · 2 years ago

I’ve been programming since middle school. That would be 30 years. Nothing really changed much. C++ is incrementally more convenient but fundamentally the same. Code editors are same. Debugger are same. Shell is same.

I am certain in 30 years everything will still be the same.

SirMaster · 2 years ago

I don’t see why not.

I like programming how I do now. I don’t plan to stop.

People do lots of things manually that machines have been able to do for a long time.

Der_Einzige · 2 years ago

They’re making fun of your typo, but you’re right. Pretty much every software job in 5 years will be an AI job. This rustles a lot of feathers, but ignoring the truth will only hurt your career.

I think the era of big tech paying fat stacks to a rather larger number of technical staff will start to wane as well. Better hope you have top AI paper publications and deep experience with all parts of using LLMs/whatever future models there are, because if not, you’ll be in for a world of pain if you got used to cushy tech work and think it’s inevitable in a world where AI is advancing so fast.

Gabriel_Martin · 2 years ago

I don't people will be either brotha, I don't people will be <3

m3kw9 · 2 years ago

I want to see it come out with a cure for a disease that is tough to cure first. Singularity itself is pointless unless it benefits humans which is mainly in health/lower suffering

educaysean · 2 years ago

I'd say advancement in mathematics, computer science, and heck, even art is far from "pointless". Why does it feel like goalposts get moved everytime there is a significant progress in AI?

sbierwagen · 2 years ago

In order for an AI to evaluate the effect of a small molecule on the brain, it would have to... simulate the operation of a human brain in a simulated environment. Similarly, to avoid Thalidomide-style disasters, it would have to simulate the conception, development and growth to adulthood of a human.

These things are... physically possible, but have WBE and uploads as a hard requirement. Those are going to affect a hell of a lot of things more than the drug industry!

Amusingly, machine-phase nanotechology and blood nanobots would be easier to evaluate, since simple cell-level mechanical interventions (reading surface proteins on cancer cells and chopping them up, say) will have fewer interactions than a small molecule that diffuses into every cell in the body.

bart_spoon · 2 years ago

“Cure” is a tough bar, but I believe Paxlovid, the anti-viral used to reduce Covid severity, was identified using ML. There’s many companies like Recursion Pharma which are entirely focused on using ML for drug discovery, and from what I can tell seem to have promising results, but drug development is slow enough that nothing will come of it for a while.

Also, while not medicine focused, Google’s GNOME project results announced a few weeks ago was pretty remarkable. They discovered more theoretical new materials using their ML approach than the rest of human history combined, and they are already confirming many of the results in laboratory settings. That has the potential to be a revolution in limitless scientific and engineering applications.

ben_w · 2 years ago

AI advancements are why we have affordable genome reading.

AlphaFold was a nice surprise when it happened, too.

melagonster · 2 years ago

This is impossible, we can makesure that more possible scenario is that most of people lose job and starve. it is not sure whether we can reach to a society have UBI.

Deleted Comment

dalbasal · 2 years ago

Can anyone bring this down to earth for me?

What's the actual state of these "ML compilers" currently, and what is rhe near term promise?

d3m0t3p · 2 years ago

One of the easiest approache is torch.compile, it's the latest iteration of pytorch compiler (previous methods were : TorchScript and FX Tracing.)

You simply write model = torch.compile(model)

"Across these 163 open-source models torch.compile works 93% of time, and the model runs 43% faster in training on an NVIDIA A100 GPU. At Float32 precision, it runs 21% faster on average and at AMP Precision it runs 51% faster on average."[1]

What google is trying to do, is to involve more people in the R&D of these kind of methods.

[1]https://pytorch.org/get-started/pytorch-2.0/

araes · 2 years ago

Excellent. Read that entire article and still was not sure what Google was pitching.

It actually sounds very useful and cool, I just completely did not get that from the article.

larodi · 2 years ago

Thanks for this summary

PartiallyTyped · 2 years ago

The near term promise is that you can use AMD, CUDA, TPUs, CPUs etc without explicit vendor support for the framework on which the model was developed.

Disclaimer: I will be very handwavey, reality is complex.

This is achieved by compiling the graph into some intermediate representation. And then implementing the right backend. For projects here, look at stableHLO, IREE, openXLA.

You can argue that Jax's jit compiler is a form of such compiler, mapping the traced operations down to XLA, which then does its own bit of magic to make it work on your backend.

It's transformations and abstractions all the way down.

voz_ · 2 years ago

Check out torch.compile

aconz2 · 2 years ago

summary: improve prediction of run-time performance of a computation graph using GNN, they use an embedding dictionary for each node's opcode along with some other node features (eg shape, bits, window size, see [1]), they released a big dataset of these graphs in [2] with varying XLA compilation configurations and their resulting perf on TPUs, they did some stuff to improve prediction on bigger graphs than before in [3] by partitioning the graph (METIS graph partition, new to me) and other training things

This is only about predicting performance of a given graph and not about improving/suggesting/editing a new equivalent graph. As in FunSearch, models which have decent predictive power could be used with evolutionary search.

[1] https://github.com/google-research-datasets/tpu_graphs#featu...

[2] TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs https://arxiv.org/abs/2308.13490

[3] Learning Large Graph Property Prediction via Graph Segment Training https://arxiv.org/abs/2305.12322

potac · 2 years ago

Can anyone explain how conv works in that graph. You have a tensor of shape [2,4,16] and you convolve with a kernel of shape [4,16,8] and that gives you a [2,8] tensor? How's that possible?

phillengel · 2 years ago

Does this help?

*1. Input:*

* Tensor shape: [2,4,16] * `2`: This represents the *batch size*, meaning there are two independent data samples being processed. * `4`: This is the *input feature dimension*, indicating each sample has 4 features. * `16`: This is the *input channel dimension*, suggesting each feature has 16 channels of information.

*2. Kernel:*

* Shape: [4,16,8] * `4`: This is the *kernel size*, meaning the filter window used to convolve has a width of 4. * `16`: This matches the *input channel dimension*, ensuring the filter operates on the same number of channels as the input. * `8`: This is the *output channel dimension*, indicating the convolution produces 8 new channels of information per sample.

*3. Output:*

* Shape: [2,8] * `2`: This remains the *batch size* as the operation is applied to each sample independently. * `8`: This matches the *output channel dimension* of the kernel, signifying the final tensor has 8 new features extracted from the input.

*4. How is it possible?*

Despite the seemingly mismatched dimensions in the input and output, convolution on graphs works by leveraging the *neighborhood structure* of the graph. Here's a simplified explanation:

* The kernel slides across the graph, applying its weights to the features of the current node and its neighbors within a specific radius. * This weighted sum is then aggregated to form a new feature for the current node in each output channel. * As the kernel moves across the graph, it extracts information from the local neighborhood of each node, creating new features that capture relationships and patterns within the graph.

*Additional considerations:*

* The graph structure and edge weights likely play a role in how information propagates during the convolution process. * Specific details of the convolution implementation, including padding and stride, might also influence the output shape.

Thanks. What was confusing me is the kernel size 4. Normally in (2D) convolutions you have (in_channels, out_channels, k, k) for a kxk kernel size. In the example above it the k is the first dimension instead of the last. This is in PyTorch, not sure about Keras

GreedClarifies · 2 years ago

How’s Gemini looking?

It is interesting how persistently dominant GPT-4 is: https://twitter.com/lmsysorg/status/1735729398672716114

Off the top of my head, I can think for at least five foundation models (Llama, Claude, Gemini, Falcon, Mistral) that are all trading blows, but GPT is still a head above them and has been for a year now. Transformer LLMs are simple enough that, demonstrably, anyone with a million bucks of GPU time can make one, but they can't quite catch up with OpenAI. What's their special sauce?

code51 · 2 years ago

Their special sauce is most probably the quality of data and the amount of data cleaning effort they put in.

I’m speculating here but I think Google always refrains from getting into the manual side of things. With LLMs, it became obvious so fast that data is what matters. Seeing Microsoft’s phi-2 play, I’m convinced more about this.

DeepMind understood the properties, came up with Chinchilla but DeepMind couldn’t integrate well with Google, in terms of understanding what kind of data Google should supply to increase model quality.

OpenAI put annotation/cleaning work almost right from the start. Not too familiar with this but human labor was heavily utilized to increase training data quality after ChatGPT started.

kccqzy · 2 years ago

Their only special sauce is the first-mover advantage. Then it attracted users (data), brand recognition, talent and became a positive feedback cycle.

dmarchand90 · 2 years ago

I kinda wonder if maybe it's at least partially due to openai hitting a kind of hyperparameter lottery. When each experiment costs millions it might be that (aside from good/ unique data) they just have a good set of hyperparameters used in training and it's too expensive for a competitor to find equal or better settings

Beside the fact that Gemini pro is more comparable to GPT-3.5, one more interesting observation is that even OpenAI themselves was not able (or didn't intend) to deliver a significantly better model than GPT-4 almost over a year. And OpenAI does not seem to hide their own magical "AGI" behind the scene as they've been more focused on efficiency and engineering works reportedly, primarily driven by Sam, rather than developing a new model. I'm reasonably sure that the current transformer itself as an architecture is at its peak and most improvements will be mostly incremental.

dwaltrip · 2 years ago

Note, Gemini Ultra, which they claim is competitive with or possibly even better than GPT-4, isn’t out yet. They have released a weaker model, Gemini Pro.

It will be interesting to see how capable Gemini Ultra actually is. For now we wait.

jazarwil · 2 years ago

You cannot compare GPT 4 to Gemini Pro. They are different classes of models.

seydor · 2 years ago

What about transformer itself, any indication that it is optimal in some way?

ikers · 2 years ago

Feels like they bury the lede with the first paragraph, but otherwise cool stuff!

RyanShook · 2 years ago