shadowmint (u/shadowmint)

shadowmint commented on An understanding of AI’s limitations is starting to sink in economist.com/technology-... · Posted by u/martincmartin

TrackerFF · 6 years ago

That's actually something ML is incredibly useful at, when it comes to machines with sensors - failure prediction / anomaly detection, etc.

In the industry, (preventive) maintenance takes up a pretty huge chunk of resources. It's something techs need to do often, and it's often a laborious task, but it's obviously done to reduce downtime.

So the business insight, as they like to call it, is to reduce costs tied up to repairs and maintenance.

All critical applications have multiple levels of redundancy, so that a complete breakdown is very unlikely, but it's still a very expensive process if you're dealing with contractors. If you can get techs to swap out parts before the whole unit goes to sh!t, then that's often going to be a much cheaper alternative.

But, in the end, it comes down to the quality of data, and the models being built. A lot of industrial businesses hire ML / AI engineers for this task alone, but expect some magic black-box that will warn x days / hours / minutes ahead that a machine/part is about to break down, and it's time to get it fixed. And they unfortunately expect a near-perfect accuracy, because someone in sales assured them that this is the future, and the future is now.

shadowmint · 6 years ago

If the business wanted to track the rate of failures and create predictive models about when things fail, or detect anomalous behaviour, that's what they would have set out with as the goal, and, perhaps, some ML model might have helped, but probably, it would've been too unreliable and any number of standard predictive models with well known characteristics would have been used instead.

That's not what they wanted.

What people are being sold is AI/ML as a magic bullet that will do something useful regardless of the situation, and it lets business people avoid making decisions about what they actually want, because AI/ML can be anything, so they just signup for it and expect to get 20 things they didn't know they wanted handed to them on a plate.

Turn out, it's not enough to just collect a bunch of data and wave your magic wand at it. It wasn't with web analytics 10 years ago, it's still not.

What you actually need is someone who has a bunch of tricks up their sleeve, and has done this before, and can suggest a bunch of Business Insights the business might need before they start building anything, people that actually decide what to do, and actions taken to investigate, and solve those problems.

I mean, to some degree you're right; perhaps ML models could be useful for tracking hardware failures, but that's not what the parent post is talking about. The previous post was talking about just collecting the data and expecting the predictive failure models to just jump out magically.

That doesn't happen; it needs a person to have the insight that the data could be used for such a thing, and that needs to happen before you go and randomly collect all the wrong frigging metrics.

...but hiring experts is expensive, and making decisions is hard. So ML/AI is sold like snake-oil to managers who want to avoid both of those things. :)

shadowmint commented on Angular v8.0 github.com/angular/angula... · Posted by u/tashoecraft

stupidcar · 7 years ago

This feels like an argument made by somebody who has never built a serious project in either, and has instead formed an opinion based on quick glance at the respective syntaxes.

To me, Angular is what HTML and the DOM would look like if they had been designed from the beginning for application development:

- Custom elements backed by controller classes. - Data-binding and event-binding syntax baked into HTML - Component style encapsulation, on by default.

React seems far more like a project created by people who dislike front-end development: As I recall the genesis of the project was to replace traditional DOM mutation with a more PHP-esque approach of updating state and re-rendering everything, just as you would do on the back-end.

shadowmint · 7 years ago

Why does having a different opinion to you mean someone has no idea what they’re talking about, or has never used a thing “seriously“?

I used to love angular, then I got a job which was a “| async” dumpster fire and spent a year watching a team of smart c# developers wallow in a mire of disaster so bad it became a two week regression to change a text field on a form. So full of amazing functional statement no one, even the original authors, could touch it without breaking something in the process.

so.

Your milage may vary. I no longer particularly like angular, personally, because I find it a chore to herd inexperienced FactoryInjectorConstructorFactoryPattern angular developers into not screwing things up.

...but talented team can do well with it too, and I’ve seen people screw up react projects too.

It really is more about good practice and experience than framework, your personal preference is probably, like mine, basically irrelevant.

shadowmint commented on Negotiations Failed: How Oracle Killed Java EE headcrashing.wordpress.co... · Posted by u/omnibrain

evgen · 7 years ago

No, in practice is works something like this:

1) write everything in python

2) yeah, the performance here is good enough so ship it

3) there is no 3

There are very few situations where performance is going to be an issue for you where there is not an existing C module solution that will solve the problem for you. The tired old 'python is slow' trope is getting more and more irrelevant every day. There are other aspects of the language that may make it a mediocre solution to the problem at hand, but out in the real world most people are simply getting the job done with python.

shadowmint · 7 years ago

I spent 4 years as a professional python developer.

We certainly shipped (using django) and it was certainly slow, and remains a painfully slow very successful enterprise app.

I’m not arguing that the slowness is deal breaking, but it is slow, and it does, routinely, break the SLAs its supposed to meet.

So... unusably slow? no.

...but slow? yes, it really is.

imo. your milage may vary. /shrug

shadowmint commented on Negotiations Failed: How Oracle Killed Java EE headcrashing.wordpress.co... · Posted by u/omnibrain

rbanffy · 7 years ago

The hot parts we can implement in C. There's not much overhead in crossing over to native code.

shadowmint · 7 years ago

Lovely theory, but in practice it works more like this:

1) write everything in python because its easy and quick to do so.

2) its slow as.

3) abandon software and write it in something else, or, live on with slow ass software and blame python for being slow and rubbish forever more.

re-writing python in c is a hideously painful process, and its proven to be very unsuccessful practically.

Writing new code in c/c++/whatever and exposing a python api is where successful projects like numpy and tensorflow live.

python is very good at what it is, but no one is ever going to go and rewrite your python code in c to make it faster; its just going to be slow forever.

shadowmint commented on Asynchronous Programming in Rust blog.yoshuawuyts.com/runt... · Posted by u/Supermighty

pornel · 7 years ago

Someone has to try it out first, before it gets frozen.

shadowmint · 7 years ago

Sure, but it sucks to be left with code you have to rewrite because you're donating your time to the cause to find problems and smooth the path for other people in the future.

Maybe some people are in to that for fun, but the for the majority of people, the message should be:

stick with stable folks.

shadowmint commented on Ask HN: How do I improve our data infrastructure? · Posted by u/remilouf

usgroup · 7 years ago

I’d contend this personally. You can employ disk, or row compression on PG if you want. Compressed disk will actually make your queries faster. You can use cstore for ORC based column storage with PG if you want.

Presumably the cost of a few TB on EBS is the least of your worries.

Finally, the time saving of full transactional support and constraints + sql to write etl in will drastically reduce the amount of work needed to write etl.

IMO, if RDBMS is an option for you, do it whilst your data is small enough.

shadowmint · 7 years ago

> sql to write etl in will drastically reduce the amount of work needed to write etl.

:)

My experience with writing an ETL in SQL is that it is almost never, quick, easy, correct or easy to test, and also almost always denormalized, or unconstrained (dimensonal keys which aren't 'real' foreign keys, just numbers so you can parallelize the data inserts and updates without constraint errors).

So... your milage may vary with that.

It's most certainly not true that writing any kind of ETL that uses SQL saves time in all cases.

shadowmint commented on Ask HN: How do I improve our data infrastructure? · Posted by u/remilouf

shadowmint · 7 years ago

It sounds like you already have an idea of what you want to do, but I think you should pause and think more deeply about what you have, vs. what you want.

What I would want in your situation is:

    - All the data in one place.
    - An easy way to explore the data. 
    - A single source of truth for transformed data.
    - Metadata to explain the data model (ie. documentation).

What you're proposing does some of those things, but it also:

    - Adds yet another maintain-forever technology to your stack.
    - Adds yet another pipeline (or set of pipelines) that does the same thing.
    - Moves from an architecture that is clustered for scale (ie. spark) to one that only scales vertically (postgres). 
    - Potentially introduces *yet more* sources of truth for some data.

> I was thinking that in a first iteration, data scientists would explore their denormalized, aggregated data and create their own feature with code.

^ Moving data into postgres doesn't make this somehow trivial, it just enables people to use a different SQL dialect. The spark API is, for anyone competent to be writing code, not meaningfully less complicated than using the postgres API.

I appreciate the naive attractiveness of having a traditional "data warehouse" in a SQL database, but there is actually a reason why people are moving away from that model:

    - it doesn't scale
    - SQL is terrible language to write transformations in (its a *query* language, not an ETL pipeline)
    - it's only vaguely better when you have many denormalised tables, vs. s3 parquet blobs
    - you have to invent data for schema changes (ie. new table schema, old data in the table) (ie. migrations are hard)

More tangibly, I know people who have done exactly what you're talking about, and regretted it. Unless you can very clearly demonstrate that what you're making is meaningfully better, it won't be adopted by the other team members and you'll have to either live forever in your silo, or eventually abandon it and go back to the old system. :/

So... I don't recommend it.

The points you're making are all valid, and for a small scale like this, if you were doing it from scratch it would be a pretty compelling option... but migrating entirely will be prohibitively expensive, and migrating partially will be a disaster.

Could you perhaps find better way to orchestrate your spark tasks, eg. with airflow or ADF or AWS Glue or whatever?

Personally I think that databricks offers a very attractive way to allow data exploration without a significant architecture change.

The architecture you're using isn't fundamentally bad, it just needs strong across the board data management... but that's something very difficult to drive from the bottom up.

shadowmint commented on Please be more careful when interpreting the Stack Overflow Developer Survey meta.stackoverflow.com/q/... · Posted by u/SoReadyToHelp

ChrisSD · 7 years ago

> I don’t accept you can survey 90000 developers and cannot offer any generalisation from those results without quanatitively proving there is an overwhelming sample bias, and specifically quantifying the degree of that bias.

Surely you have this backwards? If you want to argue that a survey offers any generalisation, then surely the onus is on you to prove you've accounted for sample bias (amongst others)?

shadowmint · 7 years ago

That seems fair; but they have a whole methodology section.

If you want to argue with it, surely the onus is on you to do it concretely?

> Because of your methodology, we must assume a biased sample.

^ I find this quote problematic.

Why must we assume that? If you want to distribution comparisons and point out there survey results are skewed by X compared to some other survey Y... ok.

...but that’s not whats happening right? Its just a flat out arbitrary assumption.

I don’t like arbitrary assumptions when I’m doing maths.

Its easy to say something is wrong, but if you can’t quanitfy how its wrong, I’m struggling to see why I should accept the assumption being raised here.

The js survey was very similar; it was arbitrarily asserted it went to more react developers... but no one actually proved that. They just... assumed it.

shadowmint commented on Please be more careful when interpreting the Stack Overflow Developer Survey meta.stackoverflow.com/q/... · Posted by u/SoReadyToHelp

shadowmint · 7 years ago

> you cannot generalize from a non-random sample

So, honest question:

If any survey of any size can be ignored on the basis that the sample is not random, then how is any survey meaningful?

Isn’t this a self defeating argue?

You can’t prove the sample is random, all you can do is show differences between samples and suggest its not consistent... but how do we go away and prove that some other survey we’re comparing it to is from a random sample?

ie. Isnt this just a convenient excuse to deny that a survey is meaningful?

Statistically, how do you mathemtaically quantify the effect of selection bias?

...because, it seems to me, unless you can actually do that, you’re just doing some arm chairmhand waving because you don’t like the results youre seeing.

This has come up several times (eg. js survey about react vs angular), and no one has ever given me a meaningful and mathematical response.

Its always just.. “it must be sample bias”, regardless of the 90000 people they surveyed.

I don’t accept you can survey 90000 developers and cannot offer any generalisation from those results without quanatitively proving there is an overwhelming sample bias, and specifically quantifying the degree of that bias.

Am I missing something here? Everyone seems thoughorly convinced that this is perfectly normal.

(I’m not proud, I’ll take your down votes, but please answer and explain what I’m missing)