Readit News logoReadit News
qoega commented on Timescale Is Now TigerData   tigerdata.com/blog/timesc... · Posted by u/pbowyer
freilanzer · 6 months ago
DuckDB seems to be the most interesting there.
qoega · 6 months ago
It is meant for single reader/writer workload so not meant to be used as a service
qoega commented on Show HN: Turn CSV Files into SQL Statements for Quick Database Transfers   github.com/ryanwaldorf/ge... · Posted by u/ryanwaldorf
adammarples · 2 years ago
Yeah I'm not sure about redshift, but bigquery uses "autodetect" so something like

bq load --autodetect --source_format=CSV mydataset.mytable ./myfile.csv

And snowflake uses INFER_SCHEMA I believe you can do this

select * from table( infer_schema( location=>'@stage/my file.csv', file_format=>'my_csv_format' ) );

Although tbh I'm not sure if that's what you're looking for. You might enjoy looking at duckdb for stuff like this. My policy when starting data engineering was to bung everything into pandas dataframes, and now my policy is to try to avoid them at all costs because they're slow and memory hungry!

qoega · 2 years ago
In ClickHouse it is just `INSERT INTO t FROM INFILE 'data.csv.gz'`. Any supported format, any encryption, autodetected from file name and sample data piece to get column types, delimeters etc. Separate tools to convert CSV are not necessary if you can just import to db and export as SQL Statements.

echo "name,age,city John,30,New York Jane,25,Los Angeles" > example.csv

clickhouse local -q "SELECT * FROM file('example.csv') FORMAT SQLInsert" INSERT INTO table (`name`, `age`, `city`) VALUES ('John', 30, 'New York'), ('Jane', 25, 'Los Angeles');

qoega commented on Show HN: Postgres Columnstore index vs. ClickHouse for OLAP queries   tablespace.io/blog/postgr... · Posted by u/smythe123
qoega · 2 years ago
Are you guy comparing 16vCPU/32GB vs 8vCPU/32GB and say yours is only 1.6 faster?
qoega commented on Ask HN: Does (or why does) anyone use MapReduce anymore?    · Posted by u/bk146
qoega · 2 years ago
Now you rarely use basic MapReduce primitives, you have another layer of abstraction that can run on infrastructure that was running MR jobs before. This infrastructure allows to efficiently allocate some compute resources for "long" running tasks in a large cluster with respect to memory/cpu/network and other constraints. So basically schedulers of MapReduce jobs and cluster management tools became that good, because MR methodology had trivial abstractions, but required efficient implementation to make it work seamlessly.

Abstraction layers on top of this infrastructure now can optimize pipeline as a whole by merging several steps into one when possible, add combiners(partial reduce before shuffle). It requires whole processing pipeline to be defined in more specific operations. Some of them propose to use SQL to formulate task, but it can be done using other primitives. And given this pipeline it is easy to implement optimizations making whole system much more user-friendly and efficient compared to MapReduce, when user has to think about all the optimizations and implement them inside single map/reduce/(combine) operations.

qoega commented on ClickHouse Keeper: A ZooKeeper alternative written in C++   clickhouse.com/blog/click... · Posted by u/eatonphil
insanitybit · 2 years ago
So could I just point my Kafka at this thing and use it?
qoega · 2 years ago
You can even migrate your zookeeper to ClickHouse keeper. It requires small downtime, but you will have all your zookeeper data inside and your clients will just work when your keeper will be back
qoega commented on Uses and abuses of cloud data warehouses   materialize.com/blog/ware... · Posted by u/Malp
RyanHamilton · 2 years ago
For real-time and large historical data, open source there's tdengine/questdb, commercial DolphinDB and kdb+. If you only need fast recent data and not large historical embedding is a good solution which means h2/duckdb/sqlite if open source, extremedb if commercial. I've benchmarked and ran applications on most these databases including running real-time analytics.
qoega · 2 years ago
Open-source ClickHouse also allows both real-time and large historical data.
qoega commented on Neeva acquired by Snowflake   snowflake.com/blog/snowfl... · Posted by u/danielcampos93
riku_iki · 3 years ago
that link doesn't provide much details how they try to test ch for joins, and if they tried to test it at all..
qoega · 3 years ago
I think atwong just promotes his product https://news.ycombinator.com/threads?id=atwong
qoega commented on Show HN: Gitbi – Lightweight BI app based on Git repo   github.com/ppatrzyk/gitbi... · Posted by u/pieca
qoega · 3 years ago
It is nice to have an image of expected dashboard in readme.
qoega commented on Building ClickHouse Cloud from scratch in a year   clickhouse.com/blog/build... · Posted by u/techn00
mayank · 3 years ago
This is a wonderful article, architecture, and project. Can anyone from Clickhouse comment on any non-technical factors that allowed such a rapid pace of development, e.g. team size, structure, etc.?
qoega · 3 years ago
Passion and experience

u/qoega

KarmaCake day17July 15, 2020View Original