There was a buzzword Big Data which regularly popped up in tech news. I haven’t seen the word used much lately. What advances are being made on it?
I imagine with various privacy scandals it fell out of favour since your data should be /your/ data only.
And many have talked about data being the ‘new oil’ when really it should be reframed as radioactive waste.
What happened to using this term to hype up your brand: ‘We use Big Data to infer information about how to improve and go forward’?
Was it just a hyped up buzzword?
Similar language history that happened to terms like "dynamic web" or "interactive web". In the late 1990s when Javascript started to be heavily used, we called attention to that new trend the "dynamic web". Today, the "interactive web" phrase has mostly gone away. But that doesn't mean that Javascript-enabled web pages were a fad. On the contrary, we're using more Javascript than ever before. We just take it as a given that the web is interactive.
Examples of rise & fall of "interactive web" in language use that peaked around 2004:
https://books.google.com/ngrams/graph?content=dynamic+web&ye...
https://books.google.com/ngrams/graph?content=interactive+we...
But just the transfer of text files over HTTP or some other protocol was a buzzword some day!
Then Machine Learning comes along and these same people think that means that you can just feed the beast with your big data and it will be clever enough to tell you what you want to know and then the same companies will realise that they still don't have the skills to work out what to tell the ML algorithm to do.
My favorite of one of these was an ML model demo we got was the incredibly insightful analysis: "As customer dissatisfaction increases, customer satisfaction decreases."
Now it is in many places. Enterprises use it each moment.
A laptop hard disk is now capable of holding databases with tens of millions of rows.
Traditional "Data Science" and modern Deep Learning rely entirely on it. Millions of datapoints are used to create models everyday.
A sensor on human wrist collects and stores thousands of data points each day.
So do refrigrators, cars, and your washing machine with ubiquity of IoT.
Giant tech cos use billions of rows each day to show users products, or sell their attention as products.
Big Data became ubiquitous. And it became so common that no nody calls it that anymore.
Tools like BigQuery, Dask, and even Pandas and SQL can handle hundreds of thousands to hundreds of millions of rows (or other structure) with normal, regular programming, command, etc.
> A sensor on human wrist collects and stores thousands of data points each day.
I feel we have a very different view of what comprises big data :)
But that is now.
If you are old enough you must remember the early years of big data hype. It was not far from millions or tens of millions of rows.
Re: sensors on fitbits, I thought everyone would read between the lines, and consider hundreds of thousands of these devices sold every year (every month?) will definitely amount to "big data".
Either these companies are plainly hoarding all of it and running some kind of analysis, or they maybe are doing federated learning. From the cos' standpoint, yes, it is big data.
Still tons of folks out there using Hadoop (ew), Snowflake, etc. New technologies coming out include things like Trino, Apache Iceberg, etc. So it's there ... just no one cares about the moniker .. just getting things done.
The are a lot of approaches like Change Data Capture CDC or HyperLogLog - but the norm? Far from.
I think the marketing BS fell out of fashion when every database designer became a data scientist, but that's another issue.
That said if you want to do streaming nowadays then you just integrate with Segment. If you want to track your database then you can dump data using Fivetran. If you want to track client events in excruciating detail then you can use Fullstory/Heap to do so in real time. That's all now table stakes for any company and outsourced to those platforms.
I guess it is similar to other technologies which most companies or developers would really never need due to their limited scale like distributed databases, NoSQL or microservices: It is interesting technology and engineers would like to get their hands on it because that's what the big boys play with, even if they don't really need it. In the meantime the industry hypes it because the technology is difficult so they know that they can make money doing consulting.
I'm not saying that it is not useful technology, I work at a company where we had the need to go from Postgres to "Big Data" tooling. But for tons of businesses it just doesn't make any sense. And even in our case one of the questions I have most frequently is: What business decision are you taking based on processing this enormous amount of data? Can we not take the same decision based on less data?
If what you’re doing is
1. Easily parallelizable
2. CPU intensive
3. In 10’s of petabytes or more
Then one of these machine gun like setups makes sense in 2022. Otherwise YAGNI (you aren’t gonna need it)
— Mark Imbriaco, 37Signals
These quotes are from 2009 and 2010, and yet here most of us are in 2022, having learned the lesson the hard way over the last decade that there is no refuting this simple logic. I'll add my own truism: All else being equal, designing and maintaining simpler systems will always cost less than complex ones.
quote references:
https://signalvnoise.com/posts/2479-nuts-bolts-database-serv...http://37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on...
Storing data on S3 or using BigQuery remove a lot of the challenges as opposed to doing this stuff in the data centre. You then also have services such as EMR, Databricks and Snowflakes to acquire the tooling and platforms as an IaaS/SaaS. The actual work then moves up the stack.
Businesses are doing more with data than ever before and the volumes are growing. I just think the challenge moved on from managing large datasets as result of new tooling, infrastructure and practices.
Plus people started wising up to COST.
For some reason there's ridiculous levels of FOMO in executive ranks, so any new trend is something they need to jump on like it will be what keeps their company around in 10 years. The result of this is fad-jumping, which I've seen happen from Big Data to ML to Blockchain, costing companies millions that could have been better invested in their own products or offerings and actually competing better. It's a really expensive educational cost for leadership IMO.