This and Materialize seemed like great tools. I met some of the team of Rising Wave at the Kafka conference last year in London and was impressed by their work. It may be great if you need such a tool.
In the end, I went with ClickHouse and it's materialized views feature. It might not be quite as powerful as what these other tools are doing, but it works for us, and it's really easy to set up. Before we were using Timescale's continuous aggregates, which had good performance, but require some domain knowledge to setup. ClickHouse materialized views are great because you don't need to be an expert to use them. And even so, performance is still very good.
My use-case is IoT devices sending data, and I'd want to keep the eg last 6 months of data for review, and some agregates, and archive + delete older data
I was going to go with Timescaledb for "simplicity" (eg having a single database)
I've been running this in prod self hosted for around 6 months (podman with docker compose, minio for s3, streaming with pulsar). We have built position calculations for risk monitoring and booking enrichment pipelines. Risingwave is a much better alternative to Kafka Streams: primarily around consistency, sql first, easy state query and deployment.
The RisingWave team are pretty responsive on slack and the ask ai feature also helps to solve questions. They have coverage from Singapore, China and California.
Issues we have seen have mostly been related to reliability of our on prem Minio cluster which is used to store the data. Other bugs do appear from release to release but once raised get attention quickly.
Looking at the contributor list, I doubt they speak English or frequent HN so you’ll only get the engineers’ perspective. Looks new and the cloud offering a way to sell it.
A. Some of the team members are in the bay area including the founder who writes well.
B. Used it for streaming sql on citus cluster and planning to use it more.
This seems very good. Always wondering what are the usecases for this apart for observability/real-time analytics? Do people use this for incremental view maintenance in Postgres?
I'm thinking of using it to replace an analytics pipeline at my job, which now uses expensive batch jobs.
If the tech is solid, we would have instant and incremental updates, instead of recomputing everything every X hours.
This would simplify things a lot.
I think Materialize offers a similar product, but last I checked it was only available as a SaaS solution.
I hope to do a proof of concept soon, to compare both solutions
The use case section also mentions Event-driven applications which is quite broad. Like other comment on the thread I'd be curious to hear about anyone having experience with RisingWave in this use case area.
I apologize for this stupid question but whenever i see products like this or kafka, i cant help but wonder. when exactly do you need a system like this compared to a traditional redis pub sub?
It's very useful any time the input to some system is a stream of events, potentially from a whole bunch of different sources, but you want the output to be a unified relational data model.
I used to work in insurance, and we had a whole bunch of systems of record for different functions of the business -- CRM, policy management, billing, claims, etc. Some were our own tech, many were SaaS. It's great to be able to keep these systems decoupled operationally. That way, you can replace pieces and have your business areas have fairly independent IT stacks.
But many backoffice tasks, like finance, accounting, and servicing need a holistic view of what's going on. It's helpful to ingest all the data into a centralized warehouse, and build up a unified model of the state of the business. A lot of analysts like to write these data transformations in SQL.
Insurance is not a fast-paced business, so we largely ingested the data in structured form. But you can imagine that for faster businesses, like advertising, monitoring, IoT, or trading, the data from the systems of record might be an event stream, rather than a data model. These stream processing databases are designed for this type of situation, where you may want real-time ETL, event-by-event.
In the end, I went with ClickHouse and it's materialized views feature. It might not be quite as powerful as what these other tools are doing, but it works for us, and it's really easy to set up. Before we were using Timescale's continuous aggregates, which had good performance, but require some domain knowledge to setup. ClickHouse materialized views are great because you don't need to be an expert to use them. And even so, performance is still very good.
We wrote about it briefly here: https://blog.picnic.nl/building-a-real-time-analytics-platfo...
I was going to go with Timescaledb for "simplicity" (eg having a single database)
would Postgres+Clickhouse be indicated for this?
The RisingWave team are pretty responsive on slack and the ask ai feature also helps to solve questions. They have coverage from Singapore, China and California.
Issues we have seen have mostly been related to reliability of our on prem Minio cluster which is used to store the data. Other bugs do appear from release to release but once raised get attention quickly.
I think Materialize offers a similar product, but last I checked it was only available as a SaaS solution.
I hope to do a proof of concept soon, to compare both solutions
I used to work in insurance, and we had a whole bunch of systems of record for different functions of the business -- CRM, policy management, billing, claims, etc. Some were our own tech, many were SaaS. It's great to be able to keep these systems decoupled operationally. That way, you can replace pieces and have your business areas have fairly independent IT stacks.
But many backoffice tasks, like finance, accounting, and servicing need a holistic view of what's going on. It's helpful to ingest all the data into a centralized warehouse, and build up a unified model of the state of the business. A lot of analysts like to write these data transformations in SQL.
Insurance is not a fast-paced business, so we largely ingested the data in structured form. But you can imagine that for faster businesses, like advertising, monitoring, IoT, or trading, the data from the systems of record might be an event stream, rather than a data model. These stream processing databases are designed for this type of situation, where you may want real-time ETL, event-by-event.
EDIT: Also, their website has a use cases section: https://risingwave.com/use-cases/
They often have watermarking, windowing, and all that good stuff built in whereas with redis you would have to build all of that in your application.