Thank you for linking this. It seems to be much more intuitive than awk, especially for his particular purpose. I wish I would have found this a few months ago when I was slicing and dicing lots of data for an extensive system migration project.
nice.
long time ago I had much success importing and filtering multiple gigabytes of CSV data into elasticsearch using CSVfix (if I'm not mistaken) + jq (converting it to json-lines in between, using jq as well)
xsv seems to cover some areas I've used jq.
No mention of vi or Sqlite? While I'm no vi expert it's a great tool for working with big files when you want to browse around without grep. And Sqlite is similarly ubiquitous and capable of crunching large files.
vi is an editor so it doesn't really solve my requirement of getting spreadsheet-like editing capabilities. SQlite is a good idea, I never thought of that. I will investigate that and add it to the article. Thanks!
> Excel for Mac performed well but it is a paid solution so I did not consider it viable for many developers
Because developers handling gigabyte size data, and wanting to reliably manipulate it in a GUI, cannot possibly be expected to pay/afford the $7/month to Microsoft.
That said, the recommended solution is probably the best option for developers, not bedside because it's free, but for the ability to run complex SQL statements, and visualize the results.
If I were to edit this article, that'd be my takeaway: use tool X for snappy vitalization of SQL queries, even on multi gigabyte sized CSVs
I wonder how well would Table Tool [1] perform with your large dataset?
This is an open source CSV editor for Mac from the developer of Postico, my favorite PostgreSQL client for Mac [2]
Python + Jupyter OK, but pandas actually reads everything at once, doesn’t it. 100MB is no problem but bigger files could result in high swapping pression.
It's definitely not the problem of the resources, but only of the architecture of applications.
(Indeed, the right application is a DBMS, not a spreadsheet.)
https://github.com/BurntSushi/xsv
I use something like this to fix up column names and import:
[1] https://sqlitebrowser.org/http://visidata.org/
And
http://recs.pl/
Because developers handling gigabyte size data, and wanting to reliably manipulate it in a GUI, cannot possibly be expected to pay/afford the $7/month to Microsoft.
That said, the recommended solution is probably the best option for developers, not bedside because it's free, but for the ability to run complex SQL statements, and visualize the results.
If I were to edit this article, that'd be my takeaway: use tool X for snappy vitalization of SQL queries, even on multi gigabyte sized CSVs
Especially after paying $2000+ for a mac.
[1] https://github.com/jakob/TableTool
[2] https://eggerapps.at/postico/
Deleted Comment