Readit News logoReadit News
Posted by u/hislaziness 3 years ago
Ask HN: Consolidating Multiple Data Sources
How do you connect multiple data sources? I have a usecase where I have multiple data sources batch and streaming that I need to analyze together. I have used a database to consolidate the various sources but I do not get the realtime outcome I need. I am exploring https://getdozer.io/ any suggestions / feedback?
gunnarmorling · 3 years ago
Sounds like a great use case for Debezium (capturing changes from databases with low latency) and Apache Flink (for processing these change event streams, e.g. filering them, joining them, applying pattern searches, putting aggregated data to a dashboard, etc.

Disclaimer: I work for Decodable, where we build a managed platform around these technologies and their use cases

hislaziness · 3 years ago
Thanks. Debezium looks interesting but it is a bit different than what I am looking for. I am looking for an api approach so that other systems can pull the data without having to worry about is it batch or stream.
ronnykylin · 3 years ago
I’ve just learned about the Multi-Catalog feature of Apache Doris (an analytic database). It allows you to connect to various data sources without worrying about data transfer and query data from multiple external sources as simply as querying internal data. (https://doris.apache.org/docs/dev/lakehouse/multi-catalog/)
iamdeedubs · 3 years ago
Definitely check out https://debezium.io/. I've been using to stream data out of mongo and postgres to great effect.

You can use it with kafka-connect or a standalone process.