What Is Machine Learning Anyway?

Little offtopic about the curse of knowledge.

I recently started interviewing ML Engineers for my company. In general I'm quite surprised by the lack of knowledge of people applying for the job. People seem to have several misconceptions, very surface knowledge, and lack even the fundamentals.

That made me question if I my expectations are set right. Is it possible that working in the field every day I expect candidates to know way more than it's reasonable? I'm not sure, and I don't feel we have a solid way to deal with that.

cjbgkagh · 4 years ago

I consider myself a decent ML expert, but the field is so vast that I think I could be tripped up in an interview rather easily. Plus I tend to get a bit rusty with the basics that I never use for anything.

anton_ai · 4 years ago

Same experience hiring Junior Data Scientist, it's a shitshow out there. The level is so low that I had to hire a "old school" statistician with a senior position

melling · 4 years ago

These fields are relatively new. There doesn’t seem to be a clear path to break into them.

Personally, I believe Kaggle is one of the ways to slowly gain some practical experience: https://www.kaggle.com/

However, I’m not sure if it’s sufficient.

Recently, I’ve been taking a deeper dive into studying various types of competitions. For example, I’ve created a repo where I’m organizing notebooks, etc for a regression competition:

https://github.com/melling/ml-regression

I’m creating others for classification, nlp, vision, etc

Of course, the self-study method means people have knowledge gaps because there’s no syllabus tailored for an interview

nl · 4 years ago

I once interviewed a ML engineer candidate who had two PhDs in ML who couldn't properly identify regression vs classification problems.

> and lack even the fundamentals

I don't think there is a strong consensus on what the fundamentals are. I've also noticed that the fundamentals differ remarkably between people who think of themselves as "data scientists" vs those who think of themselves as "machine learning practitioners".

mkl · 4 years ago

Multiple PhDs in the same field is a big red flag. A PhD is to teach you how to do research, so if you get to the end and need another one, you've failed.

neodypsis · 4 years ago

In your opinion, what are some examples of ML fundamentals? Would you include intermediate linear algebra?

antman · 4 years ago

What exactly do you ask an ML emgineer? In my experience if we ask 10 people of the scope of work pf an ML engineer we'll get 10 different answers 9 of which will be all inclusive aka Data Scientist that can also do robust production systems.

I ask for the actual fundamental skills in the job ad. Say 5 skills for junior, 15 skills for senior, organizing other people (incl clients) for a manager

angarg12 · 4 years ago

This is a fairly new role in our company. Officially it's described as 50% data scientist and 50% developer, but the loop composition is the same as developer but replacing one of the coding rounds for an ML round.

I'm still experimenting with the format, but I do a mix of asking theoretical ML questions, prior experience with ML, and designing a system to solve a business problem using ML.

tehsauce · 4 years ago

Curious, any specific examples of missing knowledge or misconceptions?

deepsquirrelnet · 4 years ago

I tend to notice that model evaluation is lacking, perhaps because it’s not especially interesting.

But to me, most business applied ML falls under the optimization umbrella. For some reason it’s never portrayed this way, but perhaps if it were, junior practitioners would more commonly pay attention to learning to thoroughly examine how their trained models will perform.

angarg12 · 4 years ago

Misconceptions: not knowing regression vs classification, supervised vs unsupervised, or thinking that ML is just neural networks.

strikelaserclaw · 4 years ago

ML is the "hot" field right now, like "big data" was a couple years ago, it attracts a vast amount of people who are interested in only money (and a huge subset of those people aren't really that interested in "knowing")

Silica6149 · 4 years ago

Were they mostly new graduates, or did they have some years of experience already?

angarg12 · 4 years ago

Both juniors and mid levels.

arolihas · 4 years ago

Do you have any examples? What are your expectations?

angarg12 · 4 years ago

The most baffling example is a candidate who admitted they didn't expect ML specific questions and hence hadn't prepared. I spent a minute figuring out if we were interviewing the right candidate.

My expectations are always evolving since this is a new role for us. The current guidelines are that candidates should have broad knowledge of ML fundamentals. We also work through a design challenge together where a candidate solves a business problem using ML.

I'm still figuring out the best ways to evaluate these.

Can anyone explain: why is the Adam optimizer so unreasonably effective? Is anyone even using a different optimizer anymore?

Yenrabbit · 4 years ago

A good run-down of the different algorithms: https://ruder.io/optimizing-gradient-descent/ I think people favour adaptive learning rate options like Adam in practice since they generally do seem to perform well, and are often less sensitive to initial conditions and the exact hyper-parameters used. There will always be people who like to test N optimizers with parameter sweeps to squeeze a tiny bit of extra performance out, but for the rest of us the default Adam or AdamW options and good, unobjectionable choices :)

NavinF · 4 years ago

It’s really hard to compare optimizers. Common architectures and default hyperparameters were discovered alongside Adam so you’d have to redo a bunch of sweeps if you wanted a “fair” comparison. In practice this doesn’t really matter and everyone just uses Adam. If you had infinite compute, you’d try every combo and select the one with the best results.

locuscoeruleus · 4 years ago

Adam was very effective when it got introduced so it was widely adopted. Since then only models that work well with Adam have made it from the idea stage to actually working. I think there's reason to believe we have over fit our model architectures to our loss functions and optimizers.

jonbaer · 4 years ago

What is the current SOTA? Demon Adam? https://arxiv.org/pdf/1910.04952v4.pdf

yobbo · 4 years ago

Adam functions as a low-pass filter (and/or "compressor") on the gradient. It filters out "noise", which is "wild" during start of training.

This is basically what all "optimizers" achieve in various ways, including momentum.

Adam uses several times more memory, and is slower, than momentum or just SGD. That's reason to not use if not needed.