One alternative way I’ve been testing recently: load the entire csv file into a single row column in Postgres, and the use a view to parse it into lines and values. Then cache the result with a materialized view. 16mb, 500k rows (formula 1 laptimes) takes only a few seconds to load and map to materialized view records.
The improvements from using a transaction are familiar to me and make intuitive sense. But can someone explain why using a prepared statement results in a roughly 5x improvement? Parsing the very simple SQL doesn't seem like it would account for much time, so is the extra time spent redoing query planning or something else?
>Parsing the very simple SQL doesn't seem like it would account for much time, so is the extra time spent redoing query planning or something else?
If you're inserting one million rows, even 5 microseconds of parse and planning time per query is five extra seconds on a job that could be done in half a second.
One big caveat to this sort of test. With DBs you are very often trading throughput for latency.
Meaning, it's certainly possible to get "the absolute fastest inserts possible" but at the same time, you are impacting both other readers of that table AND writers across the db.
This also gets more messy when you are talking about multiple instances.
With SQLite by default nobody else can even read the database while it's being written, so your comment would be better directed to a conventional database server.
I didn't have the patience of OP though to push it to 1B.
https://github.com/eatonphil/databases-intuition
Deleted Comment
If you're inserting one million rows, even 5 microseconds of parse and planning time per query is five extra seconds on a job that could be done in half a second.
Meaning, it's certainly possible to get "the absolute fastest inserts possible" but at the same time, you are impacting both other readers of that table AND writers across the db.
This also gets more messy when you are talking about multiple instances.
[0] https://www.sqlite.org/wal.html
create table testInt(num Integer); WITH n(n) AS(SELECT 1 UNION ALL SELECT n+1 FROM n WHERE n < 1000000) insert into testInt select n from n;
On my machine - Mac Mini 3 GHz Dual-Core Intel Core i7 16 GB 1600 MHz DDR3
this takes Run Time: real 0.362 user 0.357236 sys 0.004530