capkutay (u/capkutay)

capkutay commented on The DuckDB Local UI duckdb.org/2025/03/12/duc... · Posted by u/xnx

jarpineh · a year ago

The UI looks nice and is by itself a welcome addition.

I am somewhat at odds with it being a default extension build into DuckDB release. This still is a feature/product coming from another company than the makers of DuckDB [1], though they did announce a partnership with makers of this UI [2]. Whilst DuckDB has so far thrived without VC money, MotherDuck has (at least) 100M in VC [3].

I guess I'm wondering where the lines are between free and open source work compared to commercial work here. My assumption has been that the line is what DuckDB ships and what others in the community do. This release seems to change that.

Yes, I do like and use nice, free things. And I understand that things have to be paid for by someone. That someone even sometimes is me. I guess I'd like clarification on the future of DuckDB as its popularity and reach is growing.

[1] https://duckdblabs.com

[2] https://duckdblabs.com/news/2022/11/15/motherduck-partnershi...

[3] https://motherduck.com/blog/motherduck-open-for-all-with-ser...

edit: I don't want to leave this negative sounding post here without addendum. I'm just concerned of future monetization strategies and roadmap of DuckDB. DuckDB is a good and useful, versatile tool. I mainly use it from Python through Jupyter, in the browser and native. I haven't felt the need for commercial services (plus purchasing them from my professional setting is too convoluted). This UI whilst undoubtedly useful seems to be leaning towards commercial side. I merely wanted some clarity on what it might entail. I do hope DuckDB and its community even more greater, better things, with requisite compensation for those who work to ensure this.

capkutay · a year ago

I think this is a bit of a non issue. The UI is just that, a UI. Take it or leave it. If it makes your life easier, great. If not, nothing changes about how you use DuckDB.

There is always going to be some overlap between open source contributions and commercial interests but unless a real problem emerges like core features getting locked behind paywalls there is no real cause for concern. If that happens then sure let’s talk about it and raise the issue in a public forum. But for now it is just a nice convenience feature that some people (like me) will find useful.

capkutay commented on Ask for no, don't ask for yes (2022) mooreds.com/wordpress/arc... · Posted by u/mooreds

JackFr · a year ago

This is a recipe for disaster the first time you break something. Getting a yes or a no indicates that your boss is aware of it.

When you’re in the hot seat, and someone asks “Who approved this?”, the truthful answer is that no one approved it.

capkutay · a year ago

Owning things is breaking things (and fixing it).

capkutay commented on Salesforce will hire no more software engineers in 2025, says Marc Benioff salesforceben.com/salesfo... · Posted by u/lordswork

didgeoridoo · a year ago

I was at Salesforce for 4 years, and during those years the company made a massive deal about:

2020: the first AI craze, introducing “Einstein” as their name for their analytics platform, and officially changing the corporate vision to being the “No. 1 AI CRM company”.

2021: Now it’s all about “Customer 360”, i.e. account-based marketing, i.e. what basically everyone else does without such a memeable name. You wouldn’t believe the number of slide decks I had to sit through with all our little product logos orbiting this stock art character straight out of Women Laughing While Eating Salad.

2022: Never mind, now we’re betting the company on a real-time unified database called Genie, which was neither real-time nor unified (and eventually not called Genie either). Got sued for that one.

2024: AGENTS. AGENTS EVERYWHERE. WE ARE AN AGENT COMPANY NOW.

So, let’s see how this holds up in the face of the next hot thing.

capkutay · a year ago

salesforce is a sales and marketing company first, tech company second. it's in their interest to create a ton of buzz and hype on whatever the current thing is and how they are that thing. Then they go on to sell a basic CRUD app that has to be customized by consultants.

capkutay commented on Show HN: BemiDB – Postgres read replica optimized for analytics github.com/BemiHQ/BemiDB... · Posted by u/exAspArk

cocoflunchy · a year ago

What I would really love is a dead simple way to: 1) connect to my transactional Postgres db 2) define my materialized views 3) have these views update in realtime 4) query these views with a fast engine

And ideally have the whole thing open source and be able to run it in CI

We tried peerdb + clickhouse but Clickhouse materialized views are not refreshed when joining tables.

Right now we’re back to standard materialized views inside Postgres refreshed once a day but the full refreshes are pretty slow… the operational side is great though, a single db to manage.

capkutay · a year ago

that's been supported in striim since 2016

https://dl.acm.org/doi/10.1145/3129292.3129294

capkutay commented on Understanding the Limitations of Mathematical Reasoning in LLMs arxiv.org/abs/2410.05229... · Posted by u/hnhn34

woopwoop · a year ago

This paper, among other things, shows that LLMs have dramatically worse performance on basic algebra questions when you add in irrelevant information. The examples are things like "John picked 43 kiwis on Monday, 24 kiwis on Tuesday. On Wednesday, 5 of the kiwis he picked were smaller than usual. Altogether, on Monday, Tuesday, and Wednesday, John picked 87 kiwis. How many kiwis did John pick on Wednesday?" In this question, the remark about some of the kiwis on Wednesday being small is irrelevant, but adding things like this reduces performance on a popular benchmark from 95% to 77% for GPT-4o, for example.

I don't find this very impressive. Forget LLMs for a second. Let's say _you_ read a question of that kind with some bit of irrelevant information. There are two possibilities you have to consider: the question may as well have excluded the irrelevant information, or the question was miswritten and the irrelevant information was meant to be relevant. The latter is a perfectly live possibility, and I don't think it's a dramatic failure to assume that this is correct. I have to confess that when I read some people's LLM gotcha questions, where they take some popular logic puzzle and invert things, I think I would get them "wrong" too. And not wrong because I don't understand the question, but wrong because with no context I'd just assume the inversion was a typo.

capkutay · a year ago

I agree that it's not particularly surprising that if you try to trick an LLM with irrelevant text will make it perform worse.

I don't see this as an material limitation of LLMs but rather something that can be addressed at the application level to strip out irrelevant information.

capkutay commented on Layoffs push down scores on Glassdoor – how companies respond newsletter.pragmaticengin... · Posted by u/EvgeniyZh

thejackgoode · 3 years ago

Glassdoor is one good candidate for "disruption". Anyone knows a competing alternative for a global workplace review website?

capkutay · 3 years ago

Why is it a good candidate to disrupt? 1 year in, the “disruptor” would run into the exact same problems if they reach any meaningful scale or adoption.

Asking happy team members to review your company is no different than apps asking frequent users to review on the App Store.

capkutay commented on Who wants to be tracked? quantable.com/analytics/w... · Posted by u/jhpacker

capkutay · 3 years ago

Anecdotal but my wife actually likes that she’s fed such relevant ads on instagram and ends up researching and buying many of the products.

capkutay commented on DuckDB – An in-process SQL OLAP database management system duckdb.org/... · Posted by u/freilanzer

SnowflakeOnIce · 3 years ago

DuckDB is terrific. I'm bullish on its potential for simplifying many big data pipelines. Particularly, it's plausible that DuckDB + Parquet could be used on a large SMP machine (32+ cores and 128GB+ memory) to deal with data munging for 100s of gigabytes to several terabytes, all from SQL, without dealing with Hadoop, Spark, Ray, etc.

I have successfully used DuckDB like above for preparing an ML dataset from about 100GB of input.

DuckDB is undergoing rapid development these days. There have been format-breaking changes and bugs that could lose data. I would not yet trust DuckDB for long-term storage or archival purposes. Parquet is a better choice for that.

capkutay · 3 years ago

reminds me of this blog on streaming data to Parquet files and running queries on data in the native format.

https://pedram.substack.com/p/streaming-data-pipelines-with-...