mble_ (u/mble_) - Readit News

mble_ commented on Leadership Power Tools: SQL and Statistics matt.blwt.io/post/leaders... · Posted by u/PaulHoule

PaulHoule · a year ago

From my POV I have a choice of a database or a tool like pandas. Anybody who is interested in this sort of work has a choice of doing it with databases or with a specialized data analysis tool. What's your take on that?

mble_ · a year ago

Why not both?

There are times when pushing the work down to the database layer is appropriate - databases are quite good at a lot of these operations - but if you need more nuanced approaches (e.g. ANOVA, ARIMA, other kinds of forecasting or analysis), leverage the appropriate tools.

mble_ commented on Leadership Power Tools: SQL and Statistics matt.blwt.io/post/leaders... · Posted by u/PaulHoule

conductr · a year ago

Thanks for chiming in, great post, I like the premise - I just think we must have completely different working experiences. I'm typically in a larger org that has multiple systems feeding data into a data lake or something similar that has been normalized but also can still usually has some quirks. Articulating the right request to BI is certainly a skill, but my approach/experience is that I try to paint the picture of the end goal and let them fill in the gaps as needed. Sometimes that's literally drawing out a graph or chart that I want to exist.

Even when no BI team is dedicated, there's usually someone that's wearing that hat. Someone setup those schemas and data pipelines, etc or is responsible for maintaining them. That person is probably the one that knows "make sure you exclude the NULL items" or something similar.

I do like being in touch with changing data trends from a leadership perspective. It's either real and could be a valuable insight or it's a bug that needs to be addressed before any ill advised decisions are made from the 'info'. I find this can often be setup proactively and put into a dashboard. In that way, identifying it and raising concern can be 'my job' but when investigating it, it could be a team effort.

mble_ · a year ago

> I just think we must have completely different working experiences.

Likely! I've generally worked in smaller orgs (including as part of a much larger org, as with my current employer) and there is less access to dedicated resources.

> Even when no BI team is dedicated, there's usually someone that's wearing that hat.

100%. Unfortunately, this has commonly be me from my personal experience.

> In that way, identifying it and raising concern can be 'my job' but when investigating it, it could be a team effort.

Totally agreed.

For some additional context, I've spent my working career on data systems so I likely feel a much stronger affinity to this type of self-serve analysis than your average bear.

mble_ commented on Leadership Power Tools: SQL and Statistics matt.blwt.io/post/leaders... · Posted by u/PaulHoule

wjnc · a year ago

If you’d just had a business controller, you’d have x*$10M saved and have more time for your PM-role.

Yes, calling BS on leadership running their own SQL. Bring strategy and tactics, find good people, create clear roles and expectations and sure don’t get lost in running naive scripts you’ve written because you can do all roles better than the people actually occupying those roles.

mble_ · a year ago

Agreed, if you have the budget for it. There are often times where living off the land is necessary.

mble_ commented on Leadership Power Tools: SQL and Statistics matt.blwt.io/post/leaders... · Posted by u/PaulHoule

conductr · a year ago

As a somewhat technical leader/manager, I’m pretty comfortable with SQL but that also means I know i could pretty easily goof up these queries. I don’t know much about the quality or exceptions that may be present in the underlying data either. I simply wouldn’t trust my own results for fear I was overlooking something. So, I’d rather ask the BI person to make this for me. They should be more intimately familiar with any footguns.

For that reason, I see the technical part of this post at odds with the initial premise that a leadership role should need to learn how to query their data. If the resulting information, stats, etc are being used to answer business questions and make business decisions, it would be best that the person that specializes in this produces said information/queries.

If there’s some tool GUI interface and the datasets are clean or well documented, then maybe the self service nature is on the table again but anything moderately complex likely will still be run through the BI team. In a sense, it’s just basic QC, it’s not that I’m completely helpless. I might even do a first pass and kick it to BI for them to review, but seldom do I find myself in a real world situation where I’m confident enough about my knowledge of the underlying data so that gives me a huge pause most of the time.

mble_ · a year ago

Author here.

One of the main things here is that you should know your data well enough to articulate the right request from BI. In my experience, BI often end up as pure order takers - if you ask the wrong question, you get a lovingly formatted but wrong answer.

The other thing is that this assumes you have a BI team at hand - smaller teams/orgs often don't! Perhaps I should make this a little more explicit.

My central thesis, also not made explicit, is that leaders should be appropriately curious _and_ leverage the tools they have to be able to do things like "hey, this looks weird, what's up?" and share the data and their methodology - that way it can be corrected/investigated etc.

mble_ commented on 7 Databases in 7 Weeks for 2025 matt.blwt.io/post/7-datab... · Posted by u/yarapavan

refset · a year ago

The upcoming XTDB v2 is a SQL-first engine. We also built an experimental Clojure/Datalog-like 'XTQL' language to go along with it, to provide some continuity for v1 users, but the primary API is now SQL over the Postgres wire protocol, where we implemented a variation on SQL:2011 - see https://docs.xtdb.com/quickstart/sql-overview.html

mble_ · a year ago

Oh, very cool. I'll have to add this to my list to check out next year.

mble_ commented on 7 Databases in 7 Weeks for 2025 matt.blwt.io/post/7-datab... · Posted by u/yarapavan

breadwinner · a year ago

ClickHouse is awesome, but there's a newer OLAP database in town: Apache Pinot, and it is significantly better: https://pinot.apache.org/

Here's why it is better:

1. User-facing analytics vs. business analytics. Pinot was designed for user-facing analytics (meaning analytics result is used by end-user (for example, "what is the expected delivery time for this restaurant?"). The demands are much higher, including latency, freshness, concurrency and uptime.

2. Better architecture. To scale out ClickHouse uses sharding. Which means if you want to add a node you have to bring down the database, re-partition the database and reload the data, then bring it back up. Expect downtime of 1 or 2 days at least. Pinot on the other hand uses segments, which is smaller (but self-contained) pieces of data, and there are lots of segments on each node. When you add a node, Pinot just moves around segments, no downtime needed. Furthermore, for high availability ClickHouse uses replicas. Each shard needs 1 or 2 replicas for HA. Pinot does not have shards vs replica nodes. Instead each segment is replicated to 2 to 3 nodes. This is better for hardware utilization.

3. Pre-aggregation. OLAP cubes became popular in the 1990s. They pre-aggregate data to make queries significantly faster, but the downside is high storage cost. ClickHouse doesn't have the equivalent of OLAP cubes at all. Pinot has something better than OLAP cubes: Star trees. Like cubes, star trees pre-aggregate data along multiple dimensions, but don't need as much storage.

mble_ · a year ago

Pinot is something I haven't had any personal experience with, so that's why it wasn't on the list - same with StarRocks, or Druid.

Something for me to look into next year, clearly.

mble_ commented on 7 Databases in 7 Weeks for 2025 matt.blwt.io/post/7-datab... · Posted by u/yarapavan

joeevans1000 · a year ago

No bitemporal db (i.e. xtdb)?

mble_ · a year ago

I love Datalog, but its such a niche technology. If I had included it, I would have probably swapped out TigerBeetle for it.

mble_ commented on 7 Databases in 7 Weeks for 2025 matt.blwt.io/post/7-datab... · Posted by u/yarapavan

wb14123 · a year ago

Ever since CockroachDB changed their license, I'm searching for alternatives. PostgreSQL is an obvious choice but is there a good HA solution? What people usually do for HA with PostgreSQL or do they just not care about it? I tested Patroni, which is the most popular one in my knowledge, but found some HA issues that makes me hesitate to use: https://www.binwang.me/2024-12-02-PostgreSQL-High-Availabili...

mble_ · a year ago

> What people usually do for HA with PostgreSQL or do they just not care about it?

Patroni for most cases. At Heroku we have our own control plane to manage HA and fencing which works very reliably. I also like the approach the Cloud Native PG folks have taken with implementing it in the k8s API via the instance manager[1].

Other options like Stolon or repmgr are popular too. Patroni is, despite the Jepsen testing, used well without issues in the majority of circumstances. I wouldn't over think it.

[1]: https://cloudnative-pg.io/documentation/1.24/instance_manage...

mble_ commented on 7 Databases in 7 Weeks for 2025 matt.blwt.io/post/7-datab... · Posted by u/yarapavan

maxloh · a year ago

Apache Pinot is written in Java, which is a garbage-collected language and also a rare choice for mainstream databases.

Any idea if that would affect its performance?

The last time I checked there are several databases written in Go, which is also garbage-collected, but never saw one in Java except Apache Derby.

mble_ · a year ago

Apache Cassandra would probably be the most notable one (outside of Kafka etc).

mble_ commented on 7 Databases in 7 Weeks for 2025 matt.blwt.io/post/7-datab... · Posted by u/yarapavan

biggestlou · a year ago

As a co-author of the book of the same name, I’m disappointed that you didn’t see fit to provide any kind of attribution

mble_ · a year ago

It was not intentional. I've corrected this oversight, and attribution is now provided - my apologies.