How To Open and Manipulate Large CSV Files On A Mac

Readit News

Posted by u/alecdibble 6 years ago

How To Open and Manipulate Large CSV Files On A Mac alecdibble.com/blog/large...

nine_k · 6 years ago

The counter-intuitive part is that a 100MB file is considered large on a machine with 8-16GB RAM.

It's definitely not the problem of the resources, but only of the architecture of applications.

(Indeed, the right application is a DBMS, not a spreadsheet.)

mamcx · 6 years ago

You could get interested in:

https://github.com/BurntSushi/xsv

alecdibble · 6 years ago

Thank you for linking this. It seems to be much more intuitive than awk, especially for his particular purpose. I wish I would have found this a few months ago when I was slicing and dicing lots of data for an extensive system migration project.

codesnik · 6 years ago

nice. long time ago I had much success importing and filtering multiple gigabytes of CSV data into elasticsearch using CSVfix (if I'm not mistaken) + jq (converting it to json-lines in between, using jq as well) xsv seems to cover some areas I've used jq.

adouzzy · 6 years ago

When I saw "large", I expected >10GB. And "big data" if it needs to be batch processed or distributed processed.

paulryanrogers · 6 years ago

No mention of vi or Sqlite? While I'm no vi expert it's a great tool for working with big files when you want to browse around without grep. And Sqlite is similarly ubiquitous and capable of crunching large files.

alecdibble · 6 years ago

vi is an editor so it doesn't really solve my requirement of getting spreadsheet-like editing capabilities. SQlite is a good idea, I never thought of that. I will investigate that and add it to the article. Thanks!

gav · 6 years ago

SQLite does a great job of importing CSV, and then you're able to use something like "DB Browser for SQLite"[1] to browse the data.

I use something like this to fix up column names and import:

    sed -i '' -e '1 s/ \/ /_/g; 1 s/[() ]/_/g' $csv_file
    sqlite3 $db_file <<SQL
    .mode csv
    .import $csv_file $table_name
    SQL

[1] https://sqlitebrowser.org/

dbt00 · 6 years ago

Two of my favorite tools for this kind of thing:

http://visidata.org/

And

http://recs.pl/

throwGuardian · 6 years ago

> Excel for Mac performed well but it is a paid solution so I did not consider it viable for many developers

Because developers handling gigabyte size data, and wanting to reliably manipulate it in a GUI, cannot possibly be expected to pay/afford the $7/month to Microsoft.

That said, the recommended solution is probably the best option for developers, not bedside because it's free, but for the ability to run complex SQL statements, and visualize the results.

If I were to edit this article, that'd be my takeaway: use tool X for snappy vitalization of SQL queries, even on multi gigabyte sized CSVs

apta · 6 years ago

> cannot possibly be expected to pay/afford the $7/month to Microsoft.

Especially after paying $2000+ for a mac.

lerigner · 6 years ago

I wonder how well would Table Tool [1] perform with your large dataset? This is an open source CSV editor for Mac from the developer of Postico, my favorite PostgreSQL client for Mac [2]

[1] https://github.com/jakob/TableTool

[2] https://eggerapps.at/postico/

coverman · 6 years ago

Python + Pandas

ekianjo · 6 years ago

Or R + Tidyverse will do the job nicely too.

suslik · 6 years ago

R without tidyverse (which is just sugar) will do just as nicely.

appleiigs · 6 years ago

Python + Pandas + Jupyter Notebook/Lab

anst · 6 years ago

Python + Jupyter OK, but pandas actually reads everything at once, doesn’t it. 100MB is no problem but bigger files could result in high swapping pression.

Deleted Comment