I recently started interviewing ML Engineers for my company. In general I'm quite surprised by the lack of knowledge of people applying for the job. People seem to have several misconceptions, very surface knowledge, and lack even the fundamentals.
That made me question if I my expectations are set right. Is it possible that working in the field every day I expect candidates to know way more than it's reasonable? I'm not sure, and I don't feel we have a solid way to deal with that.
I consider myself a decent ML expert, but the field is so vast that I think I could be tripped up in an interview rather easily. Plus I tend to get a bit rusty with the basics that I never use for anything.
Same experience hiring Junior Data Scientist, it's a shitshow out there. The level is so low that I had to hire a "old school" statistician with a senior position
These fields are relatively new. There doesn’t seem to be a clear path to break into them.
Personally, I believe Kaggle is one of the ways to slowly gain some practical experience: https://www.kaggle.com/
However, I’m not sure if it’s sufficient.
Recently, I’ve been taking a deeper dive into studying various types of competitions. For example, I’ve created a repo where I’m organizing notebooks, etc for a regression competition:
I once interviewed a ML engineer candidate who had two PhDs in ML who couldn't properly identify regression vs classification problems.
> and lack even the fundamentals
I don't think there is a strong consensus on what the fundamentals are. I've also noticed that the fundamentals differ remarkably between people who think of themselves as "data scientists" vs those who think of themselves as "machine learning practitioners".
Multiple PhDs in the same field is a big red flag. A PhD is to teach you how to do research, so if you get to the end and need another one, you've failed.
What exactly do you ask an ML emgineer? In my experience if we ask 10 people of the scope of work pf an ML engineer we'll get 10 different answers 9 of which will be all inclusive aka Data Scientist that can also do robust production systems.
I ask for the actual fundamental skills in the job ad. Say 5 skills for junior, 15 skills for senior, organizing other people (incl clients) for a manager
This is a fairly new role in our company. Officially it's described as 50% data scientist and 50% developer, but the loop composition is the same as developer but replacing one of the coding rounds for an ML round.
I'm still experimenting with the format, but I do a mix of asking theoretical ML questions, prior experience with ML, and designing a system to solve a business problem using ML.
I tend to notice that model evaluation is lacking, perhaps because it’s not especially interesting.
But to me, most business applied ML falls under the optimization umbrella. For some reason it’s never portrayed this way, but perhaps if it were, junior practitioners would more commonly pay attention to learning to thoroughly examine how their trained models will perform.
ML is the "hot" field right now, like "big data" was a couple years ago, it attracts a vast amount of people who are interested in only money (and a huge subset of those people aren't really that interested in "knowing")
The most baffling example is a candidate who admitted they didn't expect ML specific questions and hence hadn't prepared. I spent a minute figuring out if we were interviewing the right candidate.
My expectations are always evolving since this is a new role for us. The current guidelines are that candidates should have broad knowledge of ML fundamentals. We also work through a design challenge together where a candidate solves a business problem using ML.
I'm still figuring out the best ways to evaluate these.
The way I like to explain it, which is how I have seen it explained several times, is that all AI can be reduced to search/optimization. ML is just applying the search over the function that will search for the final answer over a dataset (either generated on the fly or prepared beforehand). For neural networks the hypothesis space (all the solutions you are searching through to find the best ones) is the weights for the neural network, and your search strategy/optimization is (usually) backpropagation. If you translate the weights to something traversable by other algorithms they could do just "fine" (assuming infinite time and space) in it's place. It really opens the mind up for experimentation on every bit of the process. The book that really hammered it in for me was Intelligence Emerging by Keith Downing, short, great book on bio-inspired AI.
Both Tom M. Mitchell's "Machine Learning" as well as Russel & Norvig's "Artificial Intelligence: A Modern Approach" define the whole process from propagating the input until you have an output, calculate the gradient and update the weights.
A good run-down of the different algorithms: https://ruder.io/optimizing-gradient-descent/
I think people favour adaptive learning rate options like Adam in practice since they generally do seem to perform well, and are often less sensitive to initial conditions and the exact hyper-parameters used. There will always be people who like to test N optimizers with parameter sweeps to squeeze a tiny bit of extra performance out, but for the rest of us the default Adam or AdamW options and good, unobjectionable choices :)
It’s really hard to compare optimizers. Common architectures and default hyperparameters were discovered alongside Adam so you’d have to redo a bunch of sweeps if you wanted a “fair” comparison. In practice this doesn’t really matter and everyone just uses Adam. If you had infinite compute, you’d try every combo and select the one with the best results.
Adam was very effective when it got introduced so it was widely adopted. Since then only models that work well with Adam have made it from the idea stage to actually working. I think there's reason to believe we have over fit our model architectures to our loss functions and optimizers.
It’s a pleasant surprise to see this shared here, I am the author of this piece. Honestly I wrote this post for myself more than anything else. I also find that my knowledge in a lot of areas is very “surface level.” It’s really easy to regurgitate definitions, but it’s definitely harder to get to the core of those ideas. I hope you enjoyed!
I recently started interviewing ML Engineers for my company. In general I'm quite surprised by the lack of knowledge of people applying for the job. People seem to have several misconceptions, very surface knowledge, and lack even the fundamentals.
That made me question if I my expectations are set right. Is it possible that working in the field every day I expect candidates to know way more than it's reasonable? I'm not sure, and I don't feel we have a solid way to deal with that.
Personally, I believe Kaggle is one of the ways to slowly gain some practical experience: https://www.kaggle.com/
However, I’m not sure if it’s sufficient.
Recently, I’ve been taking a deeper dive into studying various types of competitions. For example, I’ve created a repo where I’m organizing notebooks, etc for a regression competition:
https://github.com/melling/ml-regression
I’m creating others for classification, nlp, vision, etc
Of course, the self-study method means people have knowledge gaps because there’s no syllabus tailored for an interview
> and lack even the fundamentals
I don't think there is a strong consensus on what the fundamentals are. I've also noticed that the fundamentals differ remarkably between people who think of themselves as "data scientists" vs those who think of themselves as "machine learning practitioners".
I ask for the actual fundamental skills in the job ad. Say 5 skills for junior, 15 skills for senior, organizing other people (incl clients) for a manager
I'm still experimenting with the format, but I do a mix of asking theoretical ML questions, prior experience with ML, and designing a system to solve a business problem using ML.
But to me, most business applied ML falls under the optimization umbrella. For some reason it’s never portrayed this way, but perhaps if it were, junior practitioners would more commonly pay attention to learning to thoroughly examine how their trained models will perform.
My expectations are always evolving since this is a new role for us. The current guidelines are that candidates should have broad knowledge of ML fundamentals. We also work through a design challenge together where a candidate solves a business problem using ML.
I'm still figuring out the best ways to evaluate these.
This is basically what all "optimizers" achieve in various ways, including momentum.
Adam uses several times more memory, and is slower, than momentum or just SGD. That's reason to not use if not needed.
Every time they pull off another thing that's just "too dang silly to work", and yet it does... it really makes me smile.
I took the free Machine Learning course at Stanford ways back, it was fun to get another toolkit, should I ever actually need it.