Readit News logoReadit News
otter-in-a-suit commented on Stream Kafka Topic to the Iceberg Tables with Zero-ETL   vutr.substack.com/p/strea... · Posted by u/fatezero
fatezero · a month ago
Better support for real-time stream data analysis has become a new trend in the Kafka world.

We've noticed a clear trend in the Kafka ecosystem toward integrating streaming data directly with data lake formats like Apache Iceberg.

What is your opinion on this matter?

otter-in-a-suit · a month ago
Seems interesting. The article is a bit light on details around producers and schema management - how would this look like for existing protobufs going to Kafka? How does it handle types that might differ between proto and Iceberg or evolutions that are invalid in Iceberg, but are valid in proto (`oneof` comes to mind)?

fwiw, I've hand-written pretty much exactly this - proto on Kafka to Iceberg via Flink with dynamic source schemas - and things like schema evolution are a nightmare.

otter-in-a-suit commented on The future is not self-hosted   drewlyton.com/story/the-f... · Posted by u/drew_lytle
kreco · a month ago
I strongly agree with the global sentiment.

If you can't actually download a copy of a digital content as a mere file, then you can't really host it and serve it.

You can't host your own Spotify-clone even if you are allowed to listen to songs. However, you can still download music on Bandcamp to feed your Spotify-clone.

You can't host your own your own digital Video Game Store usually because of various DRM, or because it's painful to "export" the content and painful to "import" it back.

Still on the video game side, You can't even backup your game save (at least on the Nintendo Switch, Nintendo Switch 2 and Xbox Series), it's not because of any copyright infringement or IPs misuse, it's only a way for them to get more online subscription with online game save backup.

There is still a positive side: when it will become impossible to legally own anything, I'm pretty sure some illegal system will enable you to have a massive library of whatever you want at the cost of few clicks and/or a couple of bucks. I'm saying "positive side" even though it's illegal because I mostly talk about the comfort of having your own local library.

otter-in-a-suit · a month ago
Exactly. It's a great article, but the depressing part is that there's a very limited catalog of legal media available to use these services with (except for immich, I suppose).

For games, there's GOG. Good luck finding bigger releases.

For music, there's Bandcamp and CDs and vinyl. Fortunately, most albums still release on either one of these.

Audiobookshelf can be used for most podcasts (some do not have a traditional RSS feed and are in some walled garden) and some audio books are available DRM free, but tons of books are Audible exclusives. I'm relatively sure that they also stop authors from publishing e.g. on Royal Road once they're on there.

The same is true for e-books - HumbleBundle and co are great, but good luck finding certain titles. I regret buying a new Kindle, but at least had the foresight to download all my books before they stopped allowing that. Physical books are an option, but that's not an equivalent to en e-book.

I stopped caring about TV shows and movies a long time ago (largely due to the atrocious streaming fragmentation, pricing, and the sheer audacity to include ads in paid plans), but I assume 95% of all shows are exclusive to some streaming giant, too.

otter-in-a-suit commented on Fair Pricing   kagi.com/changelog#6155... · Posted by u/CleverLikeAnOx
pietz · 7 months ago
It's so crazy to me to hear these super positive opinions. I gave kagi a shot for several months but the results were quite a bit worse than Google or DuckDuckGo. Maybe it's because I live in Germany and kagi doesn't do well with German content but I never understood the hype of kagi.
otter-in-a-suit · 7 months ago
It is worse than Google at some queries, but for me that's a tiny fraction of my total searches. I usually only use Google if I need local results / Google Maps.

What makes Kagi great is that they let you customize results. I've pinned wikipedia, for instance. Google first throws AI slop in your face (with no way of disabling it), followed promptly by (presumably also AI-generated) blog spam, Pinterest links, and other useless garbage that I can't filter.

fwiw, I search in German every once in a while and the results are a lot better than Google (in the US, anyways), since I don't need a VPN to get "good" results and have a quick toggle button for my location built into Kagi.

Also, as a company, they seem great: They are neutral, run as a PBC, are very open and transparent about what they offer and why it costs money ("no BS", if you will), are receptive to feedback and do consumer-friendly stuff like this change.

otter-in-a-suit commented on GitHub Is Down   githubstatus.com... · Posted by u/floriangosse
bigfatfrock · 7 months ago
Who's going to kick off the holy war around self hosting?
otter-in-a-suit · 7 months ago
Since you asked, https://about.gitea.com/ is a great tool. MIT license.
otter-in-a-suit commented on Ask HN: Is maintaining a personal blog still worth it?    · Posted by u/namanyayg
otter-in-a-suit · 7 months ago
Yes.

I like _writing_ because it's an effective way of learning (at least for me), since explaining something is very different than "just" doing something. I don't track visitors/analytics on my blog, so I don't really care how many people read it, but it forces me to dive just a little bit deeper into topics than I usually would for side projects and experiments. I also have no problem admitting if I misunderstand something and/or that my way of describing something might not be 100% accurate, but at last it forces me to reason about it.

I find that I write a lot about my homelab these days, since a lot of the things I experiment with there are not things I would encounter at work, since they tend to be behind several layers of abstractions (think running bare metal hypervisors and messing around with ansible and zfs pools + hardware vs. getting a new EC2 instance via terraform).

I run my own forked blog template (ink-free for hugo) and have added fun little statistics - turns out, based on a pure word count, I wrote about 1.3x "The Hobbit" by Tolkien since 2016 (~128,000 words: https://chollinger.com/blog/tags/). My blog is decidedly a worse choice as far as literature goes, but writing all these articles taught me a lot.

otter-in-a-suit commented on How we built ngrok's data platform   ngrok.com/blog-post/how-w... · Posted by u/samber
boltzmann-brain · a year ago
scala? why not haskell instead?
otter-in-a-suit · a year ago
Not assuming you’re serious, but in any case: the reason is the JVM (+ Scala) ecosystem in the data space.
otter-in-a-suit commented on How we built ngrok's data platform   ngrok.com/blog-post/how-w... · Posted by u/samber
1a527dd5 · a year ago
Blimey, that is a lot of moving parts.

Our data team currently has something similar and its costs are astronomical.

On the other hand our internal platform metrics are fired at BigQuery [1] and then we used scheduled queries that run daily (looking at the -24 hours) that aggregate/export to parquet. And it's cheap as chips. From there it's just a flat file that is stored on GCS that can be pulled for analysis.

Do you have more thoughts on Preset/Superset? We looked at both (slightly leaning towards cloud hosted as we want to move away from on-prem) - but ended up going with Metabase.

[1] https://cloud.google.com/bigquery/docs/write-api

otter-in-a-suit · a year ago
I’m the author, but posting as a private individual here, these being just my options and all that… but I can shed some more light on why I did move us to Superset.

Preset is great, as are most of these tools’ hosted versions! Lots of great folks working on these.

But, tbh, as an infrastructure company this is somewhat the core business of ngrok - hosting another DB + K8s service is something that we have great tooling for and lots of expertise in the infra space. And using ngrok makes it even easier.

The whole dogfooding aspect is important too - if I don’t run an app in production with ngrok I have a hard time empathizing with customers who want to do the same. My previous job encouraged that too and I’ve always liked that.

Also, yes, lots of moving parts - but most of them are very reusable and they share a lot of code, infra, and logic/operations playbooks etc. Costs are manageable - Athena charges $5/TB scanned iirc, which tends to be the biggest factor.

otter-in-a-suit commented on How we built ngrok's data platform   ngrok.com/blog-post/how-w... · Posted by u/samber
valzam · a year ago
i pity the developer who has to maintain tagless final plumbing code after the “functional programming enthusiast” moves on… in a Go first org no less.
otter-in-a-suit · a year ago
Author here. This decision went through all proper architecture channels, including talks with our engineers, proof of concepts and the like.

I’ve been doing this too long to shoehorn in my pet languages if I didn’t think they’re a good fit. And I think that scala/FP + Flink _is_ a good fit for this use case.

We did also explore the go ecosystem fwiw - the options there are limited (especially around the data tooling like iceberg) and go is simply not a language that’s popular enough in the data world.

Python’s typing system (or lack thereof) is a huge hinderance in this space in general (imo), and Java didn’t cause many happy faces on the Eng team either, but it’s certainly an option. I just find FP semantics a better fit for data / streaming work (lots of map and flat map anyways), and Scala makes that easy.

Also no cats/zio - just some tangles final _inspired_ composition and type classes. Not too difficult to reason about, not using any obscure patterns. I even mutate references sometimes. :-)

u/otter-in-a-suit

KarmaCake day429September 5, 2018
About
https://chollinger.com/blog/

Contact: hackernews@chollinger.com

View Original