Readit News logoReadit News
tjhunter commented on Ask HN: Does (or why does) anyone use MapReduce anymore?    · Posted by u/bk146
tjhunter · 2 years ago
(2nd user & developer of spark here). It depends on what you ask.

MapReduce the framework is proprietary to Google, and some pipelines are still running inside google.

MapReduce as a concept is very much in use. Hadoop was inspired by MapReduce. Spark was originally built around the primitives of MapReduce, and you see still see that in the description of its operations (exchange, collect). However, spark and all the other modern frameworks realized that:

- users did not care mapping and reducing, they wanted higher level primitives (filtering, joins, ...)

- mapreduce was great for one-shot batch processing of data, but struggled to accomodate other very common use cases at scale (low latency, graph processing, streaming, distributed machine learning, ...). You can do it on top of mapreduce, but if you really start tuning for the specific case, you end up with something rather different. For example, kafka (scalable streaming engine) is inspired by the general principles of MR but the use cases and APIs are now quite different.

tjhunter commented on Most companies do not need Snowflake or Databricks   kjhealey.medium.com/cache... · Posted by u/whoiskatrin
tjhunter · 2 years ago
This article has valid points but does not understand the perspective of companies. Companies do not buy technologies. Companies buy solutions.

- Companies do not buy Spark, they buy the ability to process their data and to have multiple personas collaborate (data scientists, data engineers, ...)

- You can do it yourself. It will be cheaper but it will require time, expertise and money, all things that companies do not give easily

- Snowflake and Databricks are elastic: you can start small and grow as you need. This is much easier than justifying the upfront cost of hiring specialized people or asking for trust that your ad-hoc solution will respect whatever enterprise governance rules

(disclaimer: I worked at Databricks for 6 years and talked to hundreds of prospect and actual Databricks users and customers)

u/tjhunter

KarmaCake day68August 23, 2023View Original