I am very much a beginner in the space of machine learning and have been overwhelmed by the choices available. Eventually I do want to simply want to build my own rig and just train models on that, but I don't have that kind of money right now, nor is it easy to find GPUs even if could afford them.
So I am basically stuck to cloud solutions for now, which is why I want to hear personal experiences of HN folks who have used any of the available ML platforms. Their benefits, short comings, which are more beginner friendly, cost effective, etc
I am also not opposed to configuring environments myself rather than using managed solutions (such as Gradient) if it is more cost effective to do so, or affords better reliability // better than average resource availability... because I read some complaints that Colab has poor GPU availability since shared among subscribers, and that the more you use it the less time is allocated to you... not sure how big of a problem it actually is though.
I am very motivated to delve into this space (it's been on my mind a while) and I want to do it right, which is why I am asking for personal experiences on this forum given that there is a very healthy mix of technology hobbyists as well as professionals on HN, of which the opinion of both is equally valuable to me for different reasons.
Also please feel free to include any unsolicited advice such as learning resources, anecdotes, etc,
Thanks for reading until the end.
While the (precious and useful) advice around seem to cover mostly the bigger infrastructures, please note that
you can effectively do an important slice of machine learning work (study, personal research) with just a battery-efficiency-level CPU (not GPU), in the order of minutes, on a battery. That comes before going to "Big Data".
And there are lightweight tools: I am current enamoured with Genann («minimal, well-tested open-source library implementing feedfordward artificial neural networks (ANN) in C», by Lewis Van Winkle), a single C file of 400 lines compiling to a 40kb object, yet well sufficient to solve a number of the problems you may meet.
https://codeplea.com/genann // https://github.com/codeplea/genann
After all, is it a good idea to use tools that automate process optimization while you are learning the deal? Only partially. You should build - in general and even metaphorically - the legitimacy of your Python ops on a good C ground.
And: note that you can also build ANNs in R (and other math or stats environments). If needed or comfortable...
Also note - reminder - that the MIT lessons of Prof. Patrick Winston for the Artificial Intelligence course (classical AI with a few lessons on ANNs) are freely available. That covers the grounds before a climb into the newer techniques.
So, while one is learning, the case could be for being conservative and work directly on available tools, which will be revealing on some scalability requirements, also optimistically: you do not need a full lab to do (reasonable) linear regression, nor to train networks for OCR, largely not to get acquainted with the various techniques in the discipline.
When the needs push, it sometimes will not be just high-end consumer equipment to solve your problem, so on the side of hardware already some practical notion of actual constraints of scale will help orientation. Because you do not need a GPU for most pathfinding (nor for getting a decent grasp of the techniques I am aware of), and when you will want to produce new masterpieces from a Rembrandt "ROM construct"¹ (and much humbler projects) a GPU will not suffice.
(¹reprising the Dixie Flatline module in William Gibson's Neuromancer)
GPT 5MB for the win. It really works.
Fast CPU convolutions: https://NN-512.com
Both are completely stand-alone (no external dependencies).
And especially, from Fabrice Bellard (QEMU, FFMPEG...)
I do not know how you found it: it is not even in his site's index!
--
I see that NN-512 is a personal project of yours: congratulations! Though it seems to be a go-lang application that generates specialized C for convolutional NNs... Not a general purpose library, not for beginners.
My advice is go with Colab Pro ($50/mo) and TensorFlow/Keras. You can go with Pytorch too if you prefer.
I made the mistake of buying a 2080Ti for my desktop thinking it would be better, but no. Consumer grade hardware is nowhere near as good/fast as the server grade hardware you get in Colab. Plus you have the option to use TPUs in Colab if you want to scale up quickly.
You really don't need to get fancy with this setup. The best part of using Colab is you can work on your laptop from anywhere, and never worry about your ML model hogging all your RAM (and swap) or compute and slowing your local machine down. Trust me, this sucks when it happens, and you have to restart!
As for your data, you can host it in a GCS bucket. For small data (<1TB) even better is Google drive (I know, crazy). Colab can mount your Google drive and loads from it extremely quickly. It's like having a remote filesystem, except with a handy UI and collaboration options, and an easy way to inspect and edit your data.
The $10 is more than enough for learning Deep Learning.
I use a paperspace VM + Parsec for personal ML projects. Whenever I've done the math an hourly rate on a standard VM w/GPU is better than purchasing a local machine and the complexity of a workflow management tool for ML just isn't worth it unless you are collaborating across many researchers. As an added bonus, you can re-use these VMs for any hobby gaming you might do.
The majority of ML methods train quickly on a single large modern GPU for typical academic datasets. The scaling beyond 1 GPU or 1 host leads to big model research. While big models are a hot field, this is where you would need large institutional support to do anything interesting. A model isn't big unless it's > 30 GB these days :)
Even in a typical industrial setting, you'll find the majority of scientists using various python scripts to train and preprocess data on a single server. Data wrangling is the main component which requires large compute clusters.
As for software, I do everything with jax and tensorboard for viewing experiments. Jax is a phenomenal library for personal ml learning as its extremely flexible and has relatively low level composable abstractions.
Maybe GPU prices will stabilize after Ethereum switches to POS and manufacturing pipelines get back to normal, but then I’m not that sure after seeing US trying to go ham with sanctions all over the place.
The 3090 machine gets about the same use as the 1070 in my case. While it is nice to have more GPU memory to have huge batches and train things faster, this is a quality of life improvement/bragging to be honest. Serious work in some sub-areas needs multi-GPUs or enterprise grade hardware (e.g. A100s).
Software-wise, I just use Pytorch/Pytorch lignting/keras, and anaconda.
Edit: I used to build my own machines in my younger days. The two machines I spoke of above are alienware. Got them on black friday sales. Cost-wise, they were ridiculously cheap for the power they give/impact on my career.
I am biased towards using Keras and I suggest you bookmark these curated examples https://keras.io/examples/
I bought an at home GPU rig 3 years ago and I regret that decision. As many other people here have mentioned Google Colab is a great resource and will save you so much time because you will not be setting up your infrastructure. Start with the free version and when you really need to, switch to Pro or Pro+.
For more flexibility, set up a GPU VPS instance that you can stop when not in use to save money. I like GCP and AWS, but I used to use Azure and that is also a great service. When a VPS is in a stopped state, you only pay a little money for storage. I will sometimes go weeks without starting up my GPU VPS to run an experiment. Stick with Colab when it is good enough for what you are doing.
Now for a little off topic tangent: be aware that most knowledge work is in the process of being automated. Don’t be disappointed if things you spend time learning get automated away. Look at the value of studying new tech as being very transitory, and you will always be in the mode you are in right now: a good desire to learn new things. Also, think of deep learning in the context of using it for paid work to solve real problems. As soon as you feel ready, start interviewing for an entry level deep learning or machine learning job.