Readit News logoReadit News
kmax12 · 7 years ago
Unlike most guides I've seen about ML, this one does a good job of focusing on developing and deploying a simple model first, then iterating. There are also lot of practical tips here, especially around feature engineering.

> the second phase of machine learning involves pulling in as many features as possible and combining them in intuitive ways. During this phase, all of the metrics should still be rising

As Google points out, after you build an initial model, the next step to increase accuracy is to perform feature engineering. They explain that this can be done manually or automatically using something like deep learning. Another option that people here might consider is using a library like Featuretools (https://github.com/featuretools/featuretools) for "automated feature engineering". Note: I am one of the developers.

Our goal is to help you increase the performance of your models without sacrificing the interoperability of your features. We have a post up about how our algorithm works here: https://www.featurelabs.com/blog/deep-feature-synthesis/. There are also plenty of real world demos on our website: https://www.featuretools.com/demos

easythings · 7 years ago
Your article is cool. I have some things about information of machine learning. just check it out http://tricks4321.blogspot.com/2018/07/machine-language-for-...
abdujava · 7 years ago
One of my random question is that what does Google gain by spending resources on developing course like this? Do they want more people to do machine learning as there is a short age of developer with this skill in the market or is there something else involved in the mix?

Secondly, for some reason data science just doesn't excite me as much as typical software development goes. Like, why am I not excited enough to go down the path of specializing in data science in field of machine learning? Even if there is more money in it, I'm still not extremely motivated to learn it.

What i do particularly enjoy is good ol' back end web development. I don't have a degree in computer science but working on a information system degree with focus on "programming", I dream/working my ass to become cult of "software engineer" type II, a sophisticated software developer/programmer. I love building layers, optimizing code, learning new tools, algorithms data structure (without knowing math), creating unit tests, following programming paradigm. It excites me so much. And my core skills to dive into is block chain.. I love studying that topic too and all the algorithms it comes with it.

But when I see data science, no excitement. All I imagine is image manipulation and fancy charts. I know I sound a bit ignorant but, that's how it is.

boxspam · 7 years ago
> One of my random question is that what does Google gain by spending resources on developing course like this?

Mindshare or more generally PR. Also to "collect" the talent on their platforms (Tensorflow, Google Cloud, ...). Also these guides were repurposed from existing (internal) guides and are a few years old by now, so the cost is low.

You further describe the role of a data engineer or ML engineer. If you'd approach data science with a focus on engineering and tool use, you could be one of the few dangerous data scientists that is able to go end-to-end (should be safe for at least 5 years when such pipelines are evolved without much human intervention).

> But when I see data science, no excitement. All I imagine is image manipulation and fancy charts.

This is because, while there is legit substance to the hype, the hype is real and it is focused on deep learning ImageNet (and later GAN's, Atari games, Go). Being able to show deepdreamed images and cat neurons is like catnip to journalists. Computer vision is but a very small part of ML and lots of data-driven companies have no need for such skills. Charts are made by analysts.

Everything (including block chain) will move closer to ML paradigm of learning software. Data infra engineers will see their infra increasingly used for ML. It remains all software (very advanced, but accessible to anyone) and hardware (still a asymmetry here between industry lab and practitioner). Don't get left out: Do machine learning like the great engineer you are, not like the great machine learning expert you aren’t.

hhs19832 · 7 years ago
Great and honest points.

>Secondly, for some reason data science just doesn't excite me as much as typical software development goes

Fair enough. Part of the reason is "data science" has been so jammed pack of nonsense and people who don't do the actual work of building things, as you describe below.

> What i do particularly enjoy is good ol' back end web development. I don't have a degree in computer science but working on a information system degree with focus on "programming", I dream/working my ass to become cult of "software engineer" type II, a sophisticated software developer/programmer. I love building layers, optimizing code, learning new tools, algorithms data structure (without knowing math), creating unit tests, following programming paradigm. It excites me so much. And my core skills to dive into is block chain..

Ok this makes sense. But I'd be worried about 5 years from now. When all the little gears and things that go on in backend becomes a commodity (or abstracted away in the "cloud"), what are you going to do?

> I love studying that topic too and all the algorithms it comes with it.

That spark of interest in the algorithms, (which is just about logic, which is what math is basically about in the end), is basically the essence of what makes "Data science" so attractive.

scarecrowbob · 7 years ago
"But I'd be worried about 5 years from now. When all the little gears and things that go on in backend becomes a commodity (or abstracted away in the "cloud"), what are you going to do?"

Well, over the last 8 years or so I started out in a similar kind of place, and have gotten quite good at building CRUD and business logic and glue, and fixing crap on the front end, and configuring servers.

Maybe I can stand in for the OP a few years down the line?

Over the last quarter, I've been splitting my time between things like linux admin automation and a set of pre-calculus core classes.

To answer your question on my personal scale, my whole ability to do this kind of work with my mediocre CS education (my BA is in Philosophy, and my PhD work is in Lit) is premised on leveraging the points in the systems where "all the little gears and things that go on in backend have [become] a commodity"... hence I just integrate ERP systems with WordPress or try and clean up some business's AWS drupal hosting setup some crap like that. That's been a fun and rewarding conjunction of my love for systems and the commodification of parts of IT/ programming work.

My hope is that by the time all the little bits of these data science topics become "abstracted away" over the next couple of years, I will understand the general underlying things well enough to use them. But who knows if that is a good bet or not... certainly not me.

However, it feels perfectly fine to learn things like math... I'm way, way better at it than I was as an undergrad 20 years ago and so it's quite a lot more fun for me. It's not like knowing some math has no application outside of this narrow field.

I dunno if my personal answer (keep learning, and enjoy fixing crap) matches the OP or helps extend your points/ question, but I've been getting a lot of fun (and some money) out of following my answer.

technologia · 7 years ago
I think I have an answer to that first question; Altruistically I'd like to think its to help facilitate more ml engineers and scientists. Realistically, amongst the other reasons noted by everyone else, its a way to attract enterprise users to their technology & invariably their cloud.

Consider a larger organization (1000+ people perhaps), if groups within that org can train their people with these materials or even send them to Google to be trained in this subject matter they can come back with a nice shiny credential. Whether that ultimately becomes useful to that individual or the group is up to them but really it helps google foster that relationship with the main organization to eventually snag higher contract values.

That probably made no sense, but I thought I'd give my two cents (however crummy they might look).

dws · 7 years ago
> One of my random question is that what does Google gain by spending resources on developing course like this?

s/Google/someone at Google/

20% time leaves discretionary time for people who're motivated to get something like this started. Official approval may come along the way.

gaius · 7 years ago
Everyone I know at Google says 20% time comes on top of 100% time these days
gaius · 7 years ago
Do they want more people to do machine learning as there is a short age of developer with this skill in the market or is there something else involved in the mix?

They want to sell TPUs, this is part of generating the demand.

amorphous · 7 years ago
> What i do particularly enjoy is good ol' back end web development.

By all means, keep at it! Better to be an exceptional backend dev than average ML engineer. No one can predict the future anyway. It's certainly possible that the ML job surge is gonna stop abruptly when most of the advances have been captured by APIs.

codeisawesome · 7 years ago
Regarding your latter part, do read the “define: CTO OpenAI” (don’t have link I’m on mobile) - author has fascinating insights on just how important engineering of the specifics you describe is, for ML work to progress and show results.
lovelearning · 7 years ago
An ML/data engineer tasked with productizing a data pipeline still does all of those - building layers, optimizing code, learning new tools, algorithms data structure, creating unit tests, following programming paradigms.
tabtab · 7 years ago
Perhaps it's somewhat off-topic, but I've built a spam detector similar to the article's withOUT using "direct" AI, but rather via a key-word or key-phrase "ranker". A simplified example is given below.

The advantage over other techniques is that one can easily trace the exact math of a conclusion, and tune it as needed. The disadvantage is that one probably has to manually tune it all rather than let the machine "learn". However, a hybrid approach could be used whereby "pure" AI suggests words and phrases to encode.

     rule.addList("nigerian, prince", rank=7);
     rule.addPhrase("great opportunity", rank=5);
     rule.addPhrase("lisa smith", rank = -4); // probably good
Here a "list" means that the word order doesn't matter, but with a "phrase" it does matter. A negative value means its less likely to be spam, usually because it's specific to your business or task. Actually I had multiple categories rather than just "spam" versus "non-spam", but that would complicate the example. I also used a database. One could perhaps call it a "weighted" version of MS-Outlook's rule engine. Somebody had a similar idea: http://dergipark.gov.tr/download/article-file/45302

webmaven · 7 years ago
You're essentially doing a rough manual version of Bayesian classification on n-grams (which is still very explicable): http://www.paulgraham.com/spam.html
tabtab · 7 years ago
The idea of my approach was that a "power user" could add the rules and scores without having to understand something that may take a while to explain. A scoring sheet can be displayed for a given message that would make sense to just about anybody with an associate degree. Example scoring sheet for a given message:

     Category: Spam
       Rule-ID    Score
       ----------------
       NgrPrnc1       7
       bPills         5
       knownPeople   -3      
         Total:       9 Threshold Exceeded!

     Category: Tech Support
       knownWidgets   3
       offer1        -2
         Total:       1 Insufficient total

     Category: Etc...
One could click on the rule-ID as a hyperlink to see specifics of a given rule (if details don't fit on screen).

minimaxir · 7 years ago
These guides also give good heuristics on how to look at data before throwing a model at it, and deciding what's the most logical model approach/architecture.

A good example is the text preprocessing flowchart (also shared by fchollet on Twitter): https://developers.google.com/machine-learning/guides/text-c...

Puer · 7 years ago
This is something that's almost always glossed over and I'm glad they included it. It's easy to apply ML algorithms to perfect data that doesn't need cleaning and get great results. Finding a productive model when presented with a nuanced, messy problem is a much more difficult task, however, and something most ML crash courses don't focus enough time on.

I think there's a tendency on Hacker News and other tech websites to diminish the importance of having a PhD in ML fields. The problem solving and communication skills you learn during the course of a PhD program are precisely the skills companies value when they're trying to solve hard problems. It's important to know not just how to apply ML algorithms, but when they're appropriate.

zamalek · 7 years ago
Back in 2006, in highschool, I was investigating multilayer feed-forward NNs. I found them magical. I wrote the XOR problem etc. etc.

What always confounded me was the choice of the number and width of hidden layers. This is even now more confusing with the advent of deep and recursive networks. We need empirical work on this, that can be taught in much the same way that gravity is taught as an apple falling from a tree.

We need a determination of the entropy of a network, how to route that entropy and expolit it. Specific scenarios are not adequate.

QML · 7 years ago
> gravity is taught as an apple falls from a tree.

Is this more advocating for a theory of neural networks rather than empirical evidence?

nothis · 7 years ago
These guides pop up left and right, lately. I can't comment on their quality (I assume it's somewhat decent) but it's kinda ridiculous to try compressing a college degree's worth of knowledge into a bunch of sleek online tutorials.
tomrod · 7 years ago
> it's kinda ridiculous to try compressing a college degree's worth of knowledge into a bunch of sleek online tutorials.

Honest question, why?

We used to give degrees (albeit hundreds of years ago) for material that now is covered, at a high level, in a single course (e.g. physical sciences). The amount of material to cover, and to master, increases dramatically over time. It makes sense to compress the knowledge to be delivered to a compendium so as to simply keep up with progress.

johntiger1 · 7 years ago
Not OP but I suspect they will say something about the math behind it. It's very true you can get quite adept at plug-and-play machine learning models (and indeed be quite successful) but the theoretical statistics, linear algebra and overall mathematical maturity take a long time to develop in my opinion.
jaimex2 · 7 years ago
Good guides, just finished the text classification one. The approach is very much grow a good dataset and then find and tune a model the works well for your needs.