Readit News logoReadit News
samvher · 3 years ago
> Nobody knew or even cared what the difference was between good and bad data science work. Meaning you could absolutely suck at your job or be incredible at it and you’d get nearly the same regards in either case.

In my experience it's even a little bit worse than that. Approaches that are wrong from a statistics point of view are more likely to generate impressive seeming results. But the flaws are often subtle.

A common one I've seen quite many times is people using a flawed validation strategy (e.g. one which rewards the model for using data "leaked" from the future), or to rely on in-sample results too much in other ways.

Because these issues are subtle, management will often not pick up on them or not be aware that this kind of thing can go wrong. With a short-term focus they also won't really care, because they can still put these results in marketing materials and impress most outsiders as well.

_jx7j · 3 years ago
>Meaning you could absolutely suck at your job or be incredible at it and you’d get nearly the same regards in either case.

One of the things I don't like about statements like this said in a Data Science context, is that they are true outside of Data Science as well. Executives make big decisions, managers make smaller decisions, nobody can evaluate how good/bad they really were for months or years. Engineers build something amazing, or build a house of cards, nobody cares as long as the money people are happy, even if the business use case turns out to be wrong in the long run.

>With a short-term focus they also won't really care, because they can still put these results in marketing materials and impress most outsiders as well.

Forget Data Science, you see this in KPIs as well. Say a crappy metric has to be moved by Q2 next year and people will destroy the company to move it.

I feel like Data Science is just one of those areas where you are exposed to a wider range of people and get to feel the full crapola of the insanity of working in a corporation. For lots of roles (e.g. Engineering) you get to hide in a hole behind layers of people and not see some of this insanity.

scottLobster · 3 years ago
Not to get too off topic, but as a 35 year old engineer it seems the world in general has far fewer consequences than I was raised to expect. Everything from businesses with bullshit ideas flourishing at a loss, to January 6 even being possible (politics aside I expected the Capitol Police to crack a lot more skulls than they did once people started smashing windows), to the whole FTX situation and the tepid response in the media/government, to petty crime being outright tolerated, to in my own career I've at times burned through enough money badly enough (albeit with good intentions) that I thought I was going to be fired, only to be told in a performance review I was doing a good job (grateful to stay employed but WTF, I would have fired or at least demoted me). Importantly, the motivation for this lack of consequence doesn't seem to stem from a desire for forgiveness or positive reinforcement or any mechanism that might make things better.

It seems like there's a general apathy/nihilism that's growing in society, whereas by contrast my entire education from childhood up I was held to strict standards and reliably punished when I failed to meet them, and this was in US public schools (albeit a highly ranked school district) and a public university. That or I was just raised in a bubble, and the historical examples I referenced growing up and reference to this day are just a case of survivorship bias, and all the bullshit that was alongside them back in the day has simply been forgotten. I'm not sure, but it is disappointing how little people at large seem to give a shit. Maybe it's a side-effect of the obesity epidemic and people just have less energy or something

giaour · 3 years ago
> One of the things I don't like about statements like this said in a Data Science context, is that they are true outside of Data Science as well. Executives make big decisions, managers make smaller decisions, nobody can evaluate how good/bad they really were for months or years. Engineers build something amazing, or build a house of cards, nobody cares as long as the money people are happy, even if the business use case turns out to be wrong in the long run.

This is purely anecdata, but I have found that this is more pronounced in a data science context. Managers and executives are (in my experience) more willing to admit they don't understand engineering work product and seek input from technical advisors, and executives and managers deal with decision making on a daily basis and understand that it can be nuanced. But since almost everyone reads financial reports or has to make a chart in Excel every now and then, they know enough to read someone else's analysis but not enough to recognize their knowledge gaps (particularly wrt advanced statistics).

remram · 3 years ago
This reminds me of the seminal article by The Correspondent about online advertising: https://thecorrespondent.com/100/the-new-dot-com-bubble-is-h...

Relying on your data science or marketing department to tell you how good your data science or marketing department is doing, with their own metrics and their own evaluation methods that you don't understand, can only really lead to one outcome.

Deleted Comment

CapmCrackaWaka · 3 years ago
I've seen this a LOT in my professional group. Many people (who often have PhDs!!) I interview for data science positions seem to know absolutely nothing about the algorithms they use professionally, or how to optimize them, or why they are a good fit for their use case, etc etc etc. I usually see through LinkedIn that these same people are now in impressive-sounding positions at other companies.

I had one candidate who was in charge of a multi-armed-bandit project at their current company. I asked them how it worked, and how they settled on that. Their response was "you know, I'm not really sure, the code was set up when I got there". He had been there for over a year, and could tell me nothing!

> A common one I've seen quite many times is people using a flawed validation strategy (e.g. one which rewards the model for using data "leaked" from the future), or to rely on in-sample results too much in other ways.

It's funny you mention this, we have a direct competitor who does this and advertises flawed metrics to clients. Often times our clients will come back to us saying "XYZ says they can get better performance", the performance in this case being something which is simply impossible without data leakage or some flawed validation strategy.

adamsmith143 · 3 years ago
Where are these jobs where you can interview this badly and still get hired because in my experience DS interviews are extremely hard and often expect people to have very high Stats skills as well as Data Structures/Algo skills at FAANG level.
Radim · 3 years ago
> clients will come back to us saying "XYZ says they can get better performance"

Oh yes, good old marketing.

Along with buying off "Industry Awards" – hey, we're objectively the "Best cybersecurity company of 2022!" With a matching "platinum/gold badge" to go on our website! Or buying a place in the "10 Best Products for X" and "Independent X-vs-Y Comparison", another classic.

Because it works. Are your customers not sophisticated? Are they unable (or unwilling) to follow up on defects and outright lies? Or reality simply doesn't matter all that much to them? Humans LOVE a good story more than reality, after all.

Then your contribution as an engineer to your company's success, and hence its longevity and your job security, is strictly inferior to that of marketing. Not everything is the work of evil marketers – a lot of the supplied BS is in response to an existing demand for BS.

DenisM · 3 years ago
> XYZ says they can get better performance

Can you do your analysis both ways? Give your customers both, then tell them you method is more modern, but if they want outdated methods you have those too.

f1shy · 3 years ago
Is this the US? I'm concern about the extremely low level or the bar to get a PhD in Europe... and I'm wondering if that is a global problem, or only Europe.
mumblemumble · 3 years ago
The problem is that nobody actually wants data science. They want data pseudoscience.

And for the same reason that people tend to want pseudoscience instead of science in any other domain, too. Science is slow, tentative, and messy, and usually responds to questions with even more questions rather than with answers.

Pseudoscience tends to be much more concerned with exuding confidence and providing clean-cut answers. It's what happens when a desire for science meets a need for instant gratification. Along the way, things like blinding and controls and watching for bias and validating assumptions tend to get dropped when they're inconvenient or difficult to explain. And they're always inconvenient and difficult to explain.

onlyrealcuzzo · 3 years ago
> The problem is that nobody actually wants data science. They want data pseudoscience.

Technically, I think investors & owners would want the company to use real data science to improve products & maximize profits.

Everybody in the middle just wants to use data to lie to get promoted faster - because you don't get promoted for actually doing a good job - you get promoted for convincing people you did a good job, and lying is a VERY useful / effective tool.

urthor · 3 years ago
In my experience, a little more nuanced.

LARGE INTERNET DATA COMPANIES. They want the real data science.

For them, data science actually allows them to perform a core business function (target their customers) in a profitable way (one way, asynchronous relationship. Note the complete lack of any "talking to a human being" in your relationship with big tech).

For everyone who isn't a large internet data company with an asynchronous relationship with their customers... what's the point?

Usually, they have only a handful of technical projects that benefit from data science.

In my experience, my multi-billion dollar organization got by with a shockingly small number of "real" data scientists.

etempleton · 3 years ago
I become wary any time someone utters the phrase, "show me the data" or any variation there of. There is a specific type of leader who thinks that within the data lurks a magical solution just waiting to be discovered. There is also the leader who uses data as a trump card to win arguments and these folks are perhaps even worse. This is not new. The origination of the phrase, "lies, damned lies, and statistics," can be traced to the 1800s. I propose the following update:

There are three kinds of lies: Lies, damned lies, and data

I am being glib, I of course do not think all data is inconsequential, rather it is more often used from a place of ignorance or a place of ill intent it is rendered, on the whole, useless.

lotsofpulp · 3 years ago
I have only heard “show me the data” when someone wants someone else to support a claim. I do not see why this would necessarily be a bad thing.
scottLobster · 3 years ago
Yep, these are the same people who backtest their portfolio and go "see, if you'd held this exact portfolio I put together through trial and error, you'd have turned one dollar into a million without any additional contributions!"

Not a data scientist, but it seems like a lot of people in business refuse to accept the fact that reality is generally boring, best practices are often "best" for a reason, and meaningful progress is hard. Of course it is possible to be too conservative, but 95% of ideas to improve a business or product are ego-stroking bullshit. Everyone wants the V10 engine to go down the highway at 65 mph, while towing a trailer, and there's only budget for an oil change every 15000 miles; don't look at the transmission fluid, just don't look.

naijaboiler · 3 years ago
if you torture data long enough, it will confess to anything
6gvONxR4sf7o · 3 years ago
This is the defining pain point for data science, in my experience. There’s no simple ground truth to test competence against.

If someone tells you that the data says their work is good, the only real way to know if they’re right or wrong is to look at what the data says yourself. If 99% of the work is building and 1% is checking something like latency, then you’re likely to have more than one set of eyeballs on that 1%. But if 99% of the work is putting the data together and doing the analysis, then you’re unlikely to have more than one person ever look at that part.

So incompetence goes unchecked (or worse, it is rewarded).

vsareto · 3 years ago
That's the same for many tech jobs. Competence is often only a local thing, subject to politics, reputation, and appearances. There's also no ground truth because the ground changes so fast. No one knows if the technologies mentioned in the OP will be popular 5-10 years from now.
mmcnl · 3 years ago
This is somewhat captured in the article as well.

"Managers will say they want to make data-driven decisions, but they really want decision-driven data. If you strayed from this role– e.g. by warning people not to pursue stupid ideas– your reward was their disdain, then they’d do it anyway, then it wouldn’t work (what a shocker). The only way to win is to become a stooge."

In science, a good scientific result can be bad for business. There is often little appreciation for the "science" in data science.

throwingit0 · 3 years ago
>There is often little appreciation for the "science" in data science.

It feels like even Google falls prey to this at times: they keep redoing the same A/B test until it comes up in favor of the change (or the designer whose pet project it is runs out of political capital, presumably).

chasely · 3 years ago
I've been pitched by many "data-driven" vendors offering predictions. They often have very impressive accuracy metrics (RMSE, R2, etc). When I dive into the details these metrics are often reported using in-sample predictions.

I see this pointing to any of the following:

a) DS teams overpromising the accuracy of their approaches b) marketing driving the narrative and DS getting pulled along c) incompetence from the DS team

alexpetralia · 3 years ago
Are these inferential statistics not designed to be in-sample?

I would imagine predictive statistics use more out-of-sample metrics like precision and recall.

2devnull · 3 years ago
This gave me a chuckle. If you read the feature article you understand that this is also because management wants “decision driven data.” They have an idea and use ds to provide charts and tables to support their idea. The harder the idea is to support, the greater value data science is able to provide.

I guess data science is inferior to research in this way. People care about research methods, rigor, etc… Maybe data scientists should adopt stricter standards, like actual scientists.

samvher · 3 years ago
I did read the article - some of the problems with judgements of work quality also come up with (hypothetical) well-intentioned truth-seeking non-political long-term-optimizing managers who just don't happen to be stats experts.
hcks · 3 years ago
> Because these issues are subtle, management will often not pick up on them or not be aware that this kind of thing can go wrong.

Management is not your teacher at school, it is not there to check up your results make sense.

Management mostly assumes you’re competent at your job.

dqpb · 3 years ago
> Approaches that are wrong from a statistics point of view are more likely to generate impressive seeming results.

This is to be expected from an information theory point of view. It's why "fake news" will always be a thing.

jrochkind1 · 3 years ago
> Approaches that are wrong from a statistics point of view

When OP talked about "the main bottleneck to my work" in terms of areas he would need to learn more about -- I was expecting him to talk about facility with statistical methods and using them appropriately!

I'm not sure what to take from the fact that he never did! I would like to ask him what he thinks about that!

rgrieselhuber · 3 years ago
I’ve always disliked how data science was positioned within companies as well, it’s outside the critical path of product and engineering, which means it becomes a mere abstraction to management (e.g. “throw that problem to the data science team and see what they come up with”), resulting in very vague and abstract requirements and, hence, deliverables. I think there is huge value in the discipline and technologies, but it unfairly gets relegated when not integrated to the whole product / engineering process. Hence, the title / concept of Data Engineer seems like a much better fit for this role within many companies.
rgavuliak · 3 years ago
Yeah, as a Data Science manager I've experienced this pain a lot (not part of the critical path). I am now an Engineering Manager that works with a cross-functional team including FE/BE/DS/Devops and it's the most power I ever had to put Data Science in front of our clients in a meaningful way.
whiplash451 · 3 years ago
You should pay the price for data leakage very quickly in production.

Does management look at slides or AB test dashboards?

disgruntledphd2 · 3 years ago
People don't pay attention to production metrics, and a noisy problem (like marketing or whatnot) can often be pretty bad for a looooonnnnnggg time before anyone notices.
whatever1 · 3 years ago
Blame statistics for that. Wrong outcome? Well you were unlucky you fell into the 1% error range.

Correct outcome? You totally predicted it correctly.

There is literally no way you can screw up something in statistics and not being able to make up a story to defend your approach.

lagt_t · 3 years ago
You don't look at single outcomes with statistics.
jmount · 3 years ago
Super point. I can't resist repeating it back. When incorrect work outperforms correct work in superficial evaluation, it is then selected for.
angry_moose · 3 years ago
In a recent past life, I was a HPC (high performance computing) administrator for a mid size company (just barely S&P400) who was in the transportation industry, so I had a lot of interactions with the "data science" team and it was just a fascinating delusion to watch.

Our CTO did the "Quick, this is the future! I'll be fired if I don't hop on this trend" panic thing and picked up a handful of recent grads and gave them an obscene budget by our company's standard.

The main problem they were expected to solve - forecasting future sales - was functionally equivalent to "predict the next 20 years of ~25% of the world economy". Somehow these 4 guys with a handful of GPUs were expected to out-predict the entirety of the financial sector.

The amazing part was they knew it was crap. All of their stakeholders knew it was crap. Everyone else who heard about it knew it was crap. But our CTO kept paying them a fortune and giving them more hardware every year with almost no expectation of results or performance. It was a common joke (behind the scenes) that if they actually got it right, we'd shut down our original business and becomes the world's largest bank overnight.

At least it finally gave the physics modelers access to some decent GPUs which led to some breakthrough products, as they finally were able to sneak onto some modern hardware.

fnands · 3 years ago
Honestly, I can't tell you how many jobs ads I saw where I was wondering: "What would they expect me to bring to the table here?"

Some companies just don't have the data, or heck even the need, for data scientist yet try and hire them anyway.

Give smart people a fundamentally ill-posed problem and they won't get anywhere anyway.

wheels · 3 years ago
My ex worked at a startup where she was hired as a the second or third data scientist. Their entire Posgress database dump was 20 MB. And they had three people working full time on analyzing ... that 20 MB.
SilverBirch · 3 years ago
It’s a great skill to walk in to a job and say “hey I’m the expert, that’s not a reasonable proposal, here’s the problem we can solve and here’s what we’ll do”. Much more value to the company, but hard to do.
quickthrower2 · 3 years ago
Not the first HN comment I have seen where $real_useful_department borrows resources off overfunded $bullshit_department to get the job done inspite of management.
mrtksn · 3 years ago
In retrospect, maybe they made the right call for themselves when the money was pouring. Probably everyone involved was paid very well for the charade. The ethics can be questionable but maybe its some kind of wealth redistribution, after all the people with money are trusted to make the calls and them falling for this maybe simply means the money is beter off somewhere else.
drgiggles · 3 years ago
Unfortunately it seemed pretty clear from the start that this is what data science would turn into. Data science effectively rebranded statistics but removed the requirement of deep statistical knowledge to allow people to get by with a cursory understanding of how to get some python library to spit out a result. For research and analysis data scientists must have a strong understanding of underlying statistical theory and at least a decent ability write passable code. With regard to engineering ability, certainly people exists with both skill sets, but its an awfully high bar. It is similar in my field (quant finance), the number of people that understand financial theory, valuation, etc and have the ability to design and implement robust production systems are few and you need to pay them. I don't see data science openings paying anywhere near what you would need to pay a "unicorn", you can't really expect the folks that fill those roles to perform at that level.
jghn · 3 years ago
I worked adjacent to the data science field when it was in its infancy. As in I remember people who are now household names in the field debating what it should be called.

At the time I considered going down that path, but decided I did not have anywhere near the statistics & math knowledge to get very far. So I stuck with the path I had been on. Over time I saw a lot of acquaintances jumping into the data science game. I couldn't figure out how they were learning this stuff so fast. At some point I realized that most of them knew less than I did when I decided I didn't know enough to even begin that journey.

Of course, I was comparing myself against the giants of the field and not the long tail of foot soldiers. But it made for a great example to me of how with just about everything there's a small handful of people who are the primary movers, and then everybody else.

codeulike · 3 years ago
Data science effectively rebranded statistics but removed the requirement of deep statistical knowledge to allow people to get by with a cursory understanding of how to get some python library to spit out a result.

I dont know anything about Data Science but as a bystander with a mathematical background thats what I assumed was going on so its kindof interesting to see it spelt out like that. Like you've put words to a preconception that I didnt even know I had.

rafaelero · 3 years ago
That's because businesses don't require a deep level of math knowledge.
ShredKazoo · 3 years ago
>Data science effectively rebranded statistics but removed the requirement of deep statistical knowledge

An important thing people miss is that shallow statistical knowledge can cause subtle failures, but shallow software engineering knowledge can cause subtle failures too.

A junior frontend developer will write buggy code, notice that the UI is glitched, and fix the bug. A junior data analyst will write buggy code, fix any bugs which cause the results to be obviously way off, but bugs which cause subtler problems will go unfixed.

Writing correct code without the benefit of knowing when there is a bug is challenging enough for senior developers. I don't trust newbie devs to do it at all.

Context here is I used to work in email marketing and at one point I was reading some SQL that one of the data scientists wrote and observed that it was triple-counting our conversions from marketing email. Triple-counting conversions means the numbers were way off, but not so far off as to be utterly absurd. If I hadn't happened to do a careful read of that code, we would've just kept believing that our email marketing was 3x as effective as it actually was.

So, it's impossible to know how much of a problem this is. But there is every reason to believe it is a significant problem, and lots of code written by data scientists is plagued by bugs which undermine the analysis. (When's the last time you wrote a program which ran correctly on the first try?) Any serious data science effort would enforce stern practices around code review, assertions, TDD, etc. to make the analysis as correct as possible -- but my impression is it is much more common for data analysis to be low-quality throwaway code.

Breza · 3 years ago
This is an important point. I used to work in adtech. It's amazing how terrible the modeling is in that space. You can generate a model that identifies a given target audience and simply assert that it works without any real validation.
adamsmith143 · 3 years ago
On the flip side you used to have statisticians writing code that is frankly unusable in a Production environment. You would weep at the R code I've seen and had to turn into something to actually produce business value.
fnands · 3 years ago
There is a bit of a joke that a data scientist is someone who can do better stats then the average SWE and can write better code than the average statistician. Both of those are relatively low bars to clear though
drgiggles · 3 years ago
This is exactly my point. Let subject matter experts in their respective disciplines handle what they know and communicate through the lingua franca of R. Most data scientists/statisticians probably shouldn't be writing production code, I think that's ok. It's a failing of management to think that coding is coding and not understand the value of true engineering ability.
numbsafari · 3 years ago
My first job basically consisted of taking code in FORTRAN and translating it into C++ with robust testing and engineering, and then frontending that code into a ton of spreadsheet packages. So you had quanta doing quant work, software engineers doing software engineering, and analysts and traders being analysts and traders, instead of having quants fail at all three, which is more or less what data science is.
esparrohack · 3 years ago
Yeah but in the end it’s just code. And even better, just R.

The business value comes from the stats guy.

layman51 · 3 years ago
> Data science effectively rebranded statistics but removed the requirement of deep statistical knowledge to allow people to get by with a cursory understanding of how to get some python library to spit out a result.

That's a good way of putting it. I remember in my first calculus-based probability+statistics class in college, I felt incredibly challenged by the theory. I wondered why there are so many probability distributions out there, why the standard stats formulas look like they do, what "kernel density estimation" even is, etc.

On the other hand, my data science course did include some theory, but a big part of it was also learning how to type the right commands in R to perform the "featured analysis of the week" on a sample data set. Something about these lab exercises felt off because it felt more like training rather than education. The professor expressed something along the lines that if we wanted to go far with this in the future, he would expect us to design the algorithms behind the function calls. I think the analogy he used was "baking a cake from scratch rather than buying a ready made one at the store."

kxc42 · 3 years ago
That answer somehow reminds me of an article in logicmag: An Interview with an Anonymous Data Scientist [1].

[1]: https://logicmag.io/intelligence/interview-with-an-anonymous...

manicennui · 3 years ago
I don't know many software engineers who have the ability to design and implement robust production systems.
jmhammond · 3 years ago
> But there’s also a part of me that’s just like, how can you not be curious? How can you write Python for 5 years of your life and never look at a bit of source code and try to understand how it works, why it was designed a certain way, and why a particular file in the repo is there? How can you fit a dozen regressions and not try to understand where those coefficients come from and the linear algebra behind it? I dunno, man.

This is true everywhere. As a professor, every semester I’m baffled by students who aren’t curious. But I’ve come to terms that there is a difference between those who will graduate and go on to be readers of hacker news and write this kind of article, and those who won’t.

boringg · 3 years ago
To counter your professor opinion. The amount of extra time available as a student that I had to pursue things of interest was in the negative. All academic time was spent getting course content accomplished.

I am a naturally curious individual but time limitations prevent further exploration in most circumstances. Additionally there is a relevancy factor weighed on top of it. If something looks curious I have to pre-determine if I think the time spent pursuing that rabbit hole has any value to it. Granted you never know the outcome - it is alway a gamble.

Davertron · 3 years ago
Well good luck then, in my experience the most free time I've ever had in my life was during college. I squandered massive amounts of that time doing things completely unrelated to education, and I definitely don't regret doing that. College isn't just about book learning after all. But still, BY FAR, college is the time of my life when I had the most free time to do whatever I wanted.
dizzant · 3 years ago
Did you happen to attend a prestigious school? I find that the level of rigor (and corresponding freedom) varies tremendously from program to program.

I did my undergrad at a state school with a middling engineering program, where I had ample free time to explore topics in depth, pursue extracurriculars that taught me far more than my classes, and have a thriving social life.

Contrast that experience to what I saw as a teaching assistant at Georgia Tech: undergrads who are so full of classwork that they're punting on the least-valuable graded assignments, never mind extracurriculars. The level of rigor in courses is much higher, but it presses out freedom to explore independently.

Another datapoint: I competed against GT extracurricular teams during my undergrad years, and we beat them handily almost every time because their students couldn't justify high effort for work that wasn't graded. I once saw a GT team arrive a day late to a competition, work on a robot for three hours at the adjacent table, realize their robot did not work, and drive home without competing.

mr_gibbins · 3 years ago
Nope, nope and nope again. I refute this utterly, as a teaching academic.

Contact hours at most universities are around 2-4 hours per week per 15-credit module. To gain a degree, you have to take 120 credits a year, typically two terms of 4 x 15 credit modules, or 8-16 hours of contact per week maximum with the entire summer off.

You therefore have at least 24 hours a week to study on your own to bring your working week up to 40 hours. Maybe you're working, fair enough. But if you don't have time to study subjects in depth then you need to reduce your working hours. If you can't, then by definition you are not a full-time student.

This is not a personal attack on you. Perhaps you were genuinely studious and spent all your time poring over the coursework. It is a commentary on the whole academic sector where we repeatedly see students do nothing for most of the time and spend the last 2 weeks cramming and putting in substandard assessments, then blame the course material/their lecturers/their anxiety etc. for their poor results. And of course the leadership teams lap it up and tell us to make our courses easier.

Hasz · 3 years ago
I see lots of concurring and dissenting opinions here, and will add one more:

For context, I double majored in two adjacent subjects, physics and math. I went to a state school that has a very strong physics program. I also worked in physics lab for the last ~2 years, and graduated a semester early. While I did OK academically, I had no desire to run the gauntlet again in grad school, and left to work in tech.

I have never, ever, been as a busy as I was in college, nor do I ever want to be. I think that's a good thing! I have much more time to explore things that don't pan out, to do things I know are not "productive" (i.e, play video games), and am generally happier.

Apart from quality of life improvements, I think there are additional financial and intellectual benefits to not being overly burdened -- the time to explore topics that were not immediately adjacent to my field of study results in extremely useful skill development and better cross-pollination of ideas.

BeetleB · 3 years ago
> The amount of extra time available as a student that I had to pursue things of interest was in the negative. All academic time was spent getting course content accomplished.

Did you go to a "good" school?

I went to a mediocre one for undergrad and a top school for grad. The one glaring difference I saw between the two: The top school's undergrad program gave students way, way too much busy work. All that work didn't give any insights, and was merely used to artificially distinguish students for grades. Their grad program was nothing like this.

Really glad I went to a mediocre school. Still learned everything, but had plenty of time to explore.

mayankkaizen · 3 years ago
You must have been a very sincere and disciplined student. :)

In my case, when I was studying, I had all the time in the world. During that time, I did try to learn many things but I didn't go deep or weren't consistent. I mostly wasted times in goofing around. Looking back, the amount of time I wasted during college years has become a biggest pain of my current life when I don't have any skill or time to learn those skills.

nisegami · 3 years ago
I think this inclination to be curious can still be apparent even when someone doesn't have the time to pursue that inclination. It will be more subtle, but I think it's something rather fundamental that applies in broad ways across our lives.
jrochkind1 · 3 years ago
Were you working at a job in college? Full-time or part-time?
warinukraine · 3 years ago
> But there’s also a part of me that’s just like, how can you not be curious? How can you write Python for 5 years of your life and never look at a bit of source code and try to understand how it works, why it was designed a certain way, and why a particular file in the repo is there? How can you fit a dozen regressions and not try to understand where those coefficients come from and the linear algebra behind it? I dunno, man.

Because there's a lot of things out there which are also interesting, and you don't have time to do all of them, so you choose. And different people choose differently.

overgrownzygote · 3 years ago
I’m surprised I had to scroll so far to find this response.

It was a revelation to me when I realized that, no, it’s not that “most people” lack intellectual curiosity. Their interests are just different than mine.

nicoburns · 3 years ago
I'm pretty curious, but I wonder whether I would have come across that way that my college professors. I felt like college stifled my curiosity. Undergraduate courses rarely care about original or creative work, or about students pursuing their individual interests. They more or less want students to learn what the authorities in the field think.

I did student representation while I was at college, so I had quite a bit of contact with teaching staff around discussing the learning process. There were a lot of complaints from their side that students weren't engaging with the course and were rote learning answers for exams.

My perspective was that most of the courses were badly taught (students were given little guidance and struggled to learn the basics) AND badly examined (you had to guess at what the professor wanted in order to score well - it wasn't actually assessing learning accurately). The courses where you found truly curious students were the ones that taught the basics in a way that other professors would consider hand holding (which meant they could get passed that onto more advanced material), and gave clear advice on what was expected in and how to approach the exam (so that students didn't have to worry about that and could focus on learning and their interests).

You'll always get some students who just aren't interested (perhaps they picked the wrong course, or simply aren't that academic), but you'll also find that the same students respond dramatically differently to different environments.

strikelaserclaw · 3 years ago
curiosity is good but there is so much stuff to learn out there that for many fields learning things deeply is much less important than learning a lot at 25-35% depth.
mschuster91 · 3 years ago
Not everyone is wired that way. Personally, I have taken apart and reassembled most of the tech stuff I have at home simply because it interests me how things work (and broke and repaired a non-negligible amount of them in the process, to add), I've dabbled in repairing cars, gas boilers, do my own electricity work... but in my social circle, I'm pretty much the only one. And as I grew older, managed to land myself an s/o, I kind of get why - two other parts come into play:

The first issue is a lot of acquiring broad-spectrum knowledge involves risking quite an amount of money. A good DSLR cam can easily rack up a few thousand euros, a fully spec'd Mac Pro or larger drones cross the five digits without blinking. Messing around with gas and electricity can kill you, messing with water pipes can cause immense water damage. It takes a lot of ... let's say recklessness to even think about dealing with this if you're not a professional, and you have to have the resources in the first place.

But the real issue is time. Students, at least here in Europe, don't have the luxury of taking six or seven years for their basic diploma - "thanks" to the Bologna reforms, you're fucked if you can't make it in the designed timeframe as you won't be eligible for most kinds of financial aid. That means you simply cannot afford "wasting" a week to get that deep level of knowledge, you simply are happy enough if it runs well enough to get a passing grade. And once you've entered the workforce, it becomes even harder to have actual hobbies. It's one thing if you live alone, no one will bat an eye if you pull in an all-nighter on a weekend with just yourself, a crate of beer and a laptop and that's assuming you're not completely drained from your average 40 hours work week, 10 hours of getting to the workplace, and another 10 hours on domestic chores. When you live together with another person, the game completely changes: they also want time and attention from you - bonus points if your s/o has roughly the same interests that you have (which is why I suspect so many people meet their s/o at work). And with children... forget about hobbies of any kind if you don't have enough resources for either yourself or your s/o to be a stay-at-home parent.

This is why I so strongly advocate for a four-day and six-hour work week, a proper minimum wage and government-subsidized affordable housing for everyone. Just imagine what useful things people could run as side projects if they actually had the time to pull them off, not to mention the obvious physical and mental health benefits of not having to struggle with survival every single day. Add to that the elimination of "bullshit jobs" and an end of wasting the best minds of the world on financial bullshit (i.e. HFT, "quant investment funds") or advertising... or getting rid of racism and other discrimination. We as humanity could make so much more progress if we were not so hell-bent on exploiting each other.

throwaway2037 · 3 years ago
You wrote: <<Students, at least here in Europe, don't have the luxury of taking six or seven years for their basic diploma>>

Is this true in Germany?

lifeisstillgood · 3 years ago
Hear hear
oneoff786 · 3 years ago
Seems pretentious to me. I’ve never bothered to look through many things I use. I look extensively at how to use them and what the API offers. I have a good intuition for how most models work. I don’t really care about the specifics of the implementations.

I have more important things to do. The hacker mentality, imo, is about identifying what’s useful for you to explore to accomplish whatever you need. Often that’s a lot of glue between things that other people built. Other times it’s tweaking the internals to do something a bit different.

6gvONxR4sf7o · 3 years ago
If you think it’s about implementation details, you’re misunderstanding. It’s about understanding the principles behind it.

As an example, it’s more about understanding the statistics and linear algebra around estimating uncertainty in GLM regression estimates, than about reading the code for how the statsmodels library implements that.

hnews_account_1 · 3 years ago
This is not about the hacker mentality. This is a researcher mentality from a daily life perspective. Some people just aren’t curious. I like to understand the math and the computing models behind many things I use. That doesn’t mean I want to know what’s happening in Windows internals or something just because I use Windows everyday. But if I’m creating an app connecting to Office DLLs, I want to know what it does beyond “here’s a bunch of methods and constants you can use”.

I’d further argue that the nature of a hacker / power user is to break things apart once you want to get deep enough. If I need to know where in the cluster my instance of some software got lost into, I should be able to investigate all the tools I have available to somehow find it. Not just give up and say some garbage collector will get it for me.

7thaccount · 3 years ago
I see what you're saying, but the post above seems to indicate that understanding those models SHOULD be important to you if your job is to run those models and explain results and make corporate decisions based off your forecasts. You shouldn't be just passing data through and thinking that it's not your job to actually understand things. The subtlety matters a lot as the software hitting some edge case could completely skew the results. There is a vague line drawn somewhere that tells you what is necessary to learn and what is superfluous. Finding the line isn't easy, but those that label too much as superfluous will likely get more erroneous results and that is a problem.

With regards to your API statement, I'm just as guilty regarding reading the code, but I do run some manual tests to ensure that my script calling the database actually does what I think it should. Is that good enough? Who knows :)

r-zip · 3 years ago
That's all fine, as long as you still understand the underlying assumptions and pitfalls. Many people who skim documentation and throw things together haphazardly do not.
mark_l_watson · 3 years ago
I agree with you. Super powers are seeing value in doing something, and then finding the easiest and most efficient path to get there.

That said, sometimes I do like to read the code in libraries I use but often this is more for enjoyment with occasionally learning something interesting.

wittycardio · 3 years ago
"it's pretentious to know what you're doing" just be a haxxor xD
3pt14159 · 3 years ago
> Meaning you could absolutely suck at your job or be incredible at it and you’d get nearly the same regards in either case.

Story time.

There was once a junior data scientist at Shopify that had learned Python and SQL and was tasked to figuring out how to fix their "broken app store recommendation engine" but since they didn't know Ruby, they asked for my help in figuring out what was going on.

Well somewhere in the soup of math was a fuzz factor at the very top. Think of it like

    factor = 0.something  # Not 100% sure what the decimal portion was.
    some_complicated_math_that_maxed_out_at_one_pt_zero() + rand(factor)
Now the thing about ruby is that rand is basically broken for floats.

    Negative or floating point values for max are allowed, but may give surprising results.
https://ruby-doc.org/core-2.4.0/Kernel.html#method-i-rand

So basically what they thought they were doing was introducing a bit of randomness that would hinder others from reverse engineering their algorithm. What they actually did was make the recommendation algorithm fifty percent total noise. Yes, it's true. On every load half the recommended app scores were noise.

They fixed the bug and I'm sure a ton of balance sheets for businesses around the world are markedly different know because of it, but I never heard of it again.

This is one of the core problems with data science.

The lack of feedback.

mattkrause · 3 years ago
Ironically....

Having a subset of totally random recommendations wouldn't be a totally terrible idea---especially if you know which they were! It could help push the system out of local minima and it's the obvious benchmark to beat.

euix · 3 years ago
I have to agree with a lot of this - I started my career as a data scientist right out of a STEM PhD back when the term just started coming into existence. At the time, anyone who wanted to get hired as a Data Scientist needed to be trained as a professional scientist, i.e. have a PhD - at first my expectation that the purpose of my job was to apply the scientific method to solve business problems by leveraging the companies own data as the empirical evidence - whether I did this using machine learning, excel tables or a chalkboard didn't matter. ML was a barely used term at the time, the first version of Tensorflow wasn't released until later that year.

But over time, the higher up I climbed the more I realized the job had marginal business impact. Usually a big company would hire a bunch of PhDs with fancy degrees and stick them in some "Advance Analysis" department and leverage them as internal consultants, which just meant creating some models, writing a powerpoint deck, get a pat on the back from the execs - not a single model would ever see the day of light. I got all the way up to Director this way, before calling it quits this January, at the end I had basically nothing to do except work on "corporate AI strategy", which meant writing presentations and white papers for upper management.

It was comparatively easy job, one could coast their entire life in some of these corporations - especially in government sanctioned oligopolies like banking.

wheelinsupial · 3 years ago
> the higher up I climbed the more I realized the job had marginal business impact

Do you have any observations why? I'm a pretty lowly business analyst, but my observation is if you don't own the decision making (usually by having profit and loss responsibility), you can't have much impact. Possibly it's the companies and industries I've worked at, but at the end of the day if the results don't meet expectations, it's the business owner that gets fired and not the people providing the recommendations.

mjburgess · 3 years ago
For the same reason science takes 100s (,1000s) of years to develop.

All the "intelligence" takes place in the humans that design experiments to collect unambiguous data. "data" absent a profoundly intelligent (, expensive, fraught, ...) experimental design is basically useless.

euix · 3 years ago
Here is an example: target metrics are heavily manipulated and people don't really want to know what's going on. At my first job the Director of Product would change the way a target KPI was measured every few months but would not back-propagate the changes, the end result was that to upper management the product always looked good, because the product owner would just redefine the metric in a way that made the numbers go up. This was at a multi-billion marketcap company in the SP500 and this particular person was promoted two levels to managing vice president in 1.5 years.

Basically, like some other people have already said, companies are inherently political - they do not want data-driven decisions they want their decisions to be data-validated. If their view of reality aligns with the data that is all the better, but if it doesn't, their alignment takes priority. Moving up as a DS then involves delivering "evidence" that fits whatever narrative your boss and senior management want. Sometimes that evidence will be rock solid, other times there is no evidence. That's why I suspect in the beginning they loved hiring STEM PhDs from "elite" universities. If your degree is from Harvard Astronomy Dept, people will borrow your credentials to further their agenda - because you got a golden halo.

TLDR: science is not gospel, it's just a method of thinking to deduce natural laws, if you keep digging you can find your initial assumptions proven wrong, sometimes completely wrong, - in business and politics if you dig too hard, you start finding things that nobody wants to hear.

Regarding your point about owning profit and loss that is very true as well. In my second job I was in a center of excellence team and it was extremely hard to get any traction because we didn't own any sources of revenue so we were a cost center like HR or Accounting. Teams that owned LOBs want to hire their own analytics rather then "outsource" to a COE team as a way to retain control and expand their own power base.

Would I ever do it again? Who knows, maybe, I still believe it's possible to do good scientific work outside of academia (not to say good science always gets done in academia either). I am living off investments and savings right now and working on hobby projects that may or may not pan out. People always take less than ideal jobs for want of reality.

I think there is real value in scientific analysis in business but it's closer to operations research where you solve complex optimization problems that are directly pertinent to the core business (like traffic routing or container packing) than in busting out the latest DNN techniques.

ZephyrBlu · 3 years ago
Props for saying this, but you didn't realize the "marginal business impact" bit until you were a Director..? Seems awfully convenient.
agomez314 · 3 years ago
"Managers will say they want to make data-driven decisions, but they really want decision-driven data"

Ooofff. This is too true. How often is the case that data is collected to test hypotheses vs confirming priors?

cobbzilla · 3 years ago
So true. This article accurately describes DS at many companies.

The preceding sentence is a hilariously cynical zinger:

“Those who have seen my Twitter posts know that I believe the role of the data scientist in a scenario of insane management is not to provide real, honest consultation, but to launder these insane ideas as having some sort of basis in objective reality even if they don’t.”

phkahler · 3 years ago
>> "Managers will say they want to make data-driven decisions, but they really want decision-driven data" Ooofff. This is too true. How often is the case that data is collected to test hypotheses vs confirming priors?

Find me some evidence of WMDs in Iraq! Yessss Sir!

yamtaddle · 3 years ago
I've found this to be the rule, not the exception. Pointing out extremely-obvious (to me? Maybe I'm just unusually good at it? I don't even have much formal science training, and hell, barely any math training by the standards of HN folks, though) damning errors in experimental construction that should invalidate the whole thing won't earn you any friends, even if you do it before the work is undertaken, and even if you're telling the person who's claiming to want good data and useful results. Everyone seems to just want a veneer of science to what they're doing, not actually good efforts at it. As long as you have a paper-thin layer of justification that falls apart if anyone looks at it long enough, that's considered good enough and people will sit around in meetings nodding along.

Of course, in many situations the business totally lacks what it needs to correctly do the "data-driven" stuff they want to, and it'd take a good deal of up-front effort by competent people to get it, amounting to entire new projects or deep modification of existing projects.

So, given the choice between: going without that stuff and acknowledging that a lot of what they're doing is guesswork and gut decision making, or simply arbitrary; putting a smaller but still-large amount of work into finding out what they can glean from what's available; spending the time and money to collect what they need, the right way, to do the data-driven decision making they claim to want to do; and insisting they're doing things "data driven" but having all their data hopelessly ruined by e.g. selection bias and comically-bad experimental construction that can't possibly be yielding reliable results, so they can cheap out and get no actual "data-driven" benefits aside from falsely claiming that's what they're doing—they tend to go with that last option, nearly every time!

A4ET8a8uTh0 · 3 years ago
That one stood out to me as well, but, to be fair, this predated current 'fashionable trend' for data driven decisions. It is, sadly, not a new development, but something to still be overcome.
lajosbacs · 3 years ago
This especially sucks if you are the middle manager. You know that what you are asked to do is a complete BS but you have to somehow communicate it to you underlings (who see through the BS) without using sarcasm or snarky remarks.
Reimersholme · 3 years ago
Rather than wanting to confirm priors, I believe this usually is a problem with neither the PM nor the data scientist ensuring that the problem formulation is good enough before diving in. I.e., what data would be needed to actually test the hypothesis? Do we have that data or not? Is the hypothesis even formulated in a way to be falsified in theory?

I've seen so many analysis tasks where data scientists without questioning went away for a few weeks to crunch data and come back with some random graphs and statistics that are completely useless as decision support.

nisegami · 3 years ago
You're overthinking it. Executives and managers quite literally want to see data that confirms their existing convictions and beliefs so they can act on those beliefs under the guise of it being "data-driven".
starwind · 3 years ago
I made this same transition from data science to data engineering about 18 months ago and I've never looked back.

I hated working with bad code and dealing with arrogant phds who don't value good code. I've seen so many terrible Jupyter Notebooks just copied and pasted into VS Code and the data scientist just washed their hands of it calling it "production ready." Here's a conversation I've had multiple times:

Me: have you ever considered not making every variable global scope

Them: that's just software engineering. We do machine learning

Me: if it's just software engineering, then why can't you do it?

Meanwhile, automated data science tools are getting halfway decent. If you know what algorithm to pick and you don't need to run millions of records through the model every minute, your standard business analyst could probably get a solid model going--at least as well as most data scientists for all the reasons the article mentions.

And I like that I know I can do data engineering. With data science you can never really know if you can hit your target metrics given the data you have. So data scientists end up encouraged to fudge their results or make sloppy decisions. With data engineering I can say "yes this is doable or no that's not" and people believe me.

My prediction: there's value in the massive volume of data but most of it can be had through standard dashboards, some summary statistics, a graph network, or maybe a linear/logistic regression. Most data science is BS and companies aren't getting the return they need to pay for these guys. (And good God, you almost certainly don't need a neural network.) Meanwhile, data engineering will get integrated into software development, and machine learning—by virtue of its proliferation through academia—will just become another tool for software developers. Data scientists won't get laid off enmass but they will go the way of the webmaster: either pick up new skills and evolve or move on til they end up with new titles

srajabi · 3 years ago
This resonates with me so much, I stumbled into data science out of University a decade ago. Left it to do SWE and came back to it in the last 3 years.

So many data scientists are full of themselves thinking they are magicians and software developers are blacksmiths who are beneath them.

Incrementally at my company the SWE's have automated so much of the data scientists workflow that they end up just as you describe, using the tooling and being relegated to becoming analysts.

After 3 years coming back to this field, I see the writing on the wall: In the 90's most models were created by software developers, in the 2030's most models will be created by software developers.

wodenokoto · 3 years ago
I don’t get it. As a data engineer aren’t you the one who has to deal with the DS code? Whereas a DS someone else has to deal with your code?