Readit News logoReadit News
nine_k · 6 years ago
The counter-intuitive part is that a 100MB file is considered large on a machine with 8-16GB RAM.

It's definitely not the problem of the resources, but only of the architecture of applications.

(Indeed, the right application is a DBMS, not a spreadsheet.)

mamcx · 6 years ago
You could get interested in:

https://github.com/BurntSushi/xsv

alecdibble · 6 years ago
Thank you for linking this. It seems to be much more intuitive than awk, especially for his particular purpose. I wish I would have found this a few months ago when I was slicing and dicing lots of data for an extensive system migration project.
codesnik · 6 years ago
nice. long time ago I had much success importing and filtering multiple gigabytes of CSV data into elasticsearch using CSVfix (if I'm not mistaken) + jq (converting it to json-lines in between, using jq as well) xsv seems to cover some areas I've used jq.
adouzzy · 6 years ago
When I saw "large", I expected >10GB. And "big data" if it needs to be batch processed or distributed processed.
paulryanrogers · 6 years ago
No mention of vi or Sqlite? While I'm no vi expert it's a great tool for working with big files when you want to browse around without grep. And Sqlite is similarly ubiquitous and capable of crunching large files.
alecdibble · 6 years ago
vi is an editor so it doesn't really solve my requirement of getting spreadsheet-like editing capabilities. SQlite is a good idea, I never thought of that. I will investigate that and add it to the article. Thanks!
gav · 6 years ago
SQLite does a great job of importing CSV, and then you're able to use something like "DB Browser for SQLite"[1] to browse the data.

I use something like this to fix up column names and import:

    sed -i '' -e '1 s/ \/ /_/g; 1 s/[() ]/_/g' $csv_file
    sqlite3 $db_file <<SQL
    .mode csv
    .import $csv_file $table_name
    SQL
[1] https://sqlitebrowser.org/

dbt00 · 6 years ago
Two of my favorite tools for this kind of thing:

http://visidata.org/

And

http://recs.pl/

throwGuardian · 6 years ago
> Excel for Mac performed well but it is a paid solution so I did not consider it viable for many developers

Because developers handling gigabyte size data, and wanting to reliably manipulate it in a GUI, cannot possibly be expected to pay/afford the $7/month to Microsoft.

That said, the recommended solution is probably the best option for developers, not bedside because it's free, but for the ability to run complex SQL statements, and visualize the results.

If I were to edit this article, that'd be my takeaway: use tool X for snappy vitalization of SQL queries, even on multi gigabyte sized CSVs

apta · 6 years ago
> cannot possibly be expected to pay/afford the $7/month to Microsoft.

Especially after paying $2000+ for a mac.

lerigner · 6 years ago
I wonder how well would Table Tool [1] perform with your large dataset? This is an open source CSV editor for Mac from the developer of Postico, my favorite PostgreSQL client for Mac [2]

[1] https://github.com/jakob/TableTool

[2] https://eggerapps.at/postico/

coverman · 6 years ago
Python + Pandas
ekianjo · 6 years ago
Or R + Tidyverse will do the job nicely too.
suslik · 6 years ago
R without tidyverse (which is just sugar) will do just as nicely.
appleiigs · 6 years ago
Python + Pandas + Jupyter Notebook/Lab
anst · 6 years ago
Python + Jupyter OK, but pandas actually reads everything at once, doesn’t it. 100MB is no problem but bigger files could result in high swapping pression.

Deleted Comment