Table definition:
CREATE TABLE hackernews_history
(
update_time DateTime DEFAULT now(),
id UInt32,
deleted UInt8,
type Enum('story' = 1, 'comment' = 2, 'poll' = 3, 'pollopt' = 4, 'job' = 5),
by LowCardinality(String),
time DateTime,
text String,
dead UInt8,
parent UInt32,
poll UInt32,
kids Array(UInt32),
url String,
score Int32,
title String,
parts Array(UInt32),
descendants Int32
)
ENGINE = MergeTree(update_time) ORDER BY id;
A shell script: BATCH_SIZE=1000
TWEAKS="--optimize_trivial_insert_select 0 --http_skip_not_found_url_for_globs 1 --http_make_head_request 0 --engine_url_skip_empty_files 1 --http_max_tries 10 --max_download_threads 1 --max_threads $BATCH_SIZE"
rm -f maxitem.json
wget --no-verbose https://hacker-news.firebaseio.com/v0/maxitem.json
clickhouse-local --query "
SELECT arrayStringConcat(groupArray(number), ',') FROM numbers(1, $(cat maxitem.json))
GROUP BY number DIV ${BATCH_SIZE} ORDER BY any(number) DESC" |
while read ITEMS
do
echo $ITEMS
clickhouse-client $TWEAKS --query "
INSERT INTO hackernews_history SELECT * FROM url('https://hacker-news.firebaseio.com/v0/item/{$ITEMS}.json')"
done
It takes a few hours to download the data and fill the table.<Trace> ReadWriteBufferFromHTTP: Failed to make request to 'https://hacker-news.firebaseio.com/v0/item/40298680.json'. Error: Timeout: connect timed out: 216.239.32.107:443. Failed at try 3/10. Will retry with current backoff wait is 200/10000 ms.
I googled with no luck. I was wondering if you have a solution for it.
Unfortunately, China is winning in the rest of the world for the same reason. While western citizens and companies are well protected, the West has intentionally kept the rest of the world vulnerable for self dirty interest. We have seen the Western hypocrisy in policies and moral codes destroying societies for decades all over the world. If the Western code of conducts for the rest of the world were hypocrisy free, Chinese slithering would have been impossible, and the world would be a lot better than it is now.