Great tables has done some really nice work on python/jupyter tables. It looks like they are almost building a "grammar of tables" similar to a grammar of graphics. More projects should write about their philosophy and aims like this.
I have built a different table library for jupyter called buckaroo. My approach has been different. Buckaroo aims to allow you to interactively cycle through different formats and post-processing functions to quickly glean important insights from a table while working interactively. I took the view that I type the same commands over and over to perform rudimentary exploratory data analysis, those commands and insights should be built into a table.
Great tables seems built so that you can manually format a table for presentation.
Thanks for your work on Buckaroo! Jupyter print() and IPython display() have limitations given their dead static output and feels like printf debugging of yore, which I know Buckaroo was written to address.
What are your thoughts on Visidata's hotkeys and controls? I used Visidata in the past and always wondered why it couldn't be added into Jupyter (eventually) for dataframe explorations.
>It looks like they are almost building a "grammar of tables" similar to a grammar of graphics.
Agreed that Great Tables seems to be taking annother crack at formalizing a "grammar of tables", and I welcome this approach too given the power of tabular formats and wider adoption of the dataframe concept via the R/pandas/Arrows/polars ecosystem, although I believe the term was initially referred to in the 90s[1] from the statistical S language.
Buckaroo started as a lowcode UI with an accompanying table. The low code UI lets you click on columns and perform actions (drop, fillNA, groupby). The dataframe is then modified, AND python code to perform the same action is emitted. Controlling the lowcode UI through keyboard shortcuts should be fairly straightforward.
The other feature I have played with in this area is auto-cleaning. Auto-cleaning looks at individual columns and emits cleaning commands to the low-code UI. Different cleaning strategies can be implemented and toggled through.
Buckaroo takes the view that being opinionated is good, so long as you can toggle through opinions to get the right combination of cleaning, display, or post-processing that you are looking for quickly. All of the features of buckaroo are also built to be easily extendable by users.
This feature saw very little use, so I haven't developed it much (I had to disable it after some refactorings). The lowcode UI is demonstrated at the end of the youtube video linked above.
The example they show of a Great Table is, to my taste, way too busy. Here is my unsolicited opinion:
The top and bottom horizontal rules on the Title appear to be superfluous, and I dislike how it is aligned with the first column (row labels) rather than the second. I feel like a little space to breath at the bottom, along with a bold font would add visual hierarchy w/o the clutter.
The row label backgrounds are far too dark and the font weight makes it hard to read. I'd prefer a very light blue here instead. I don't like the row group label ("Name") being italicized.
The spanner labels floating in the centre make the table hard to scan. Would be much nicer aligned left.
Finally, I really dislike the font (maybe this is just my browser, though).
I mocked-up some of the changes here, I think this is a much easier to read table:
You might want to read Edward Tufte's Beautiful Evidence.[1] He discusses stuff like what you brought up about readability and distracting from the message / point of the data.
If you've seen sparklines, [2] Tufte coined the term.
Whenever I do a UI review I end up paging through it just to see if there's something we're not thinking about, and its an interesting book to just open to a random page and read.
Plus he has an entire treatise on why PowerPoint is terrible.
Table titles should be either centered above, or captioned below. Left-aligning them above any column instantly conveys a generally false/unintended impression of the title being a top level in the information hierarchy of the table. In the modernist makeover above I was immediately uneasy that the title stipulated “names, addresses, characteristics” whilst apparently aligned to exclude the names.
In contrast the census manual chooses to center almost all labels within their box, and when not it is almost always due to indentation, and moreover is unafraid to set column widths to fit the data not the labels, with indent and hyphenation to match. The result is both horizontally compact and intuitively comprehensible.
edit: on further reflection I also think it’s a crappy title. Titles and captions should convey context, scope, purpose - and may otherwise be omitted entirely for the editorial sin of failing to justify their own existence. As given, this one could be retitled “Table 1” with no loss of information or generality. For an article that’s trying to discuss and reformulate tabular presentation from first principles, that’s a tad disappointing. Since table titles form a crucial layer of their information catalogue, it is hardly surprising that the census manual devotes an entire chapter to the matter of title construction, and even though somewhat domain specific and archaically worded it is well worth the visit
Keep going IMO: shorten the title to remote correspondents since the rest is redundant with the column names. The blue highlight is now redundant with the title so ditch all of it. Personal characteristics vs location don’t meaningfully improve the organization so ditch those as well.
the white text on a dark background really was a glaring misfeature in the original example, to the extent that i wonder if the colours looked different on the author's monitor
This a good article with some fascinating history.
More recent history involves the production of CALS tables https://en.wikipedia.org/wiki/CALS_Table_Model. The company Datalogics https://en.wikipedia.org/wiki/Datalogics was heavily involved in the CALS table initiative. Datalogics staff was part of the ISO committee forming SGML, and trained many people on SGML, including DoD staff and their contractors involved with documentation.
I was involved with the team that produced an editor for SGML-based documents. It had as one of its features the ability to specify the formatting of an element based on the SGML context of that element. This was before XSLT and its kin.
Alumni of Datalogics helped Microsoft learn about XML ("No, you can't arbitrarily switch case on XML element tags").
Also TeX practitioners have pretty well-formed opinions about how tables should be formatted.
Odd side-note: I learned that the documentation for a fighter airplane of the time, if printed out, would weigh more than the aircraft and would fill a football-field sized collection of filing cabinets.
And as much as many today don't like XML, coming from the SGML world it is a boon.
> This a good article with some fascinating history.
Indeed. I don’t think it’s all correct, though. On Visicalc, it says “The grid cells couldn’t be styled with borders for presentation purposes, the values couldn’t be formatted, and the tables couldn’t even be printed”
I think I even the first version had (limited, of course) formatting support.
Something that always annoyed me about numeric data like dollar amounts in tables is that visually the comparison between quantities is logarithmic instead of linear.
E.g.:
Cost
$1500
$130
$110
$210
The text in the last three rows look 4/5ths the size of the text in the first row. However, even if summed, the last three costs add up to only 1/3rd of the top row! People visually see the number digits, which is roughly the same as Log 10.
I’ve so often had this issue that I started putting in-cell bar charts into every finance-related spreadsheet.
Otherwise meetings will get derailed debating the cost of something trivial that is totally irrelevant compared to the biggest absolute costs.
As a real example, I had many meetings spent debating a $15 monthly cost for server log collection in the cloud for a VM running a database engine that costs $15K monthly for the license alone.
Hey one of the co-maintainers of Great Tables, along with Rich Iannone, here!
I just wanted to say that Rich is the only software developer I know, who when asked to lay out the philosophy of his package, would give you 5,000 years of history on the display of tables. :)
It makes me wonder how we've gone this long with increasingly poor data table presentations (the mid-century modern tables are astutely pointed to as shining examples).
This makes me excited to get back into data analysis with python. Moreover, I see some possible API improvements and extensions I'd like to make.
I love this package and have been using it for a few years in R. It's great [for making] tables in html but the pdf and docx output is a little less polished. I do worry that the recent shift to bringing the python version up to speed with the R version has slowed down the R development. Though it's well worth checking out whatever your language.
Wonderful! In the 90s a colleague and I wrote a book (EBRI Datebook on Employee Benefits) which was mostly tables. In addition to SAS, our other primary tool was an ancient language called Table Producing Language ("TPL"). Despite dating back to the 1970s, TPL was incredibly flexible, expressive, and efficient - once you figured out the syntax.
The designers of Great Tables might want to check out TPL. It covers everything Great Tables aims to do, and I think may have a few more tricks up its sleeves:
I have built a different table library for jupyter called buckaroo. My approach has been different. Buckaroo aims to allow you to interactively cycle through different formats and post-processing functions to quickly glean important insights from a table while working interactively. I took the view that I type the same commands over and over to perform rudimentary exploratory data analysis, those commands and insights should be built into a table.
Great tables seems built so that you can manually format a table for presentation.
https://github.com/paddymul/buckaroo
https://youtu.be/GPl6_9n31NE
What are your thoughts on Visidata's hotkeys and controls? I used Visidata in the past and always wondered why it couldn't be added into Jupyter (eventually) for dataframe explorations.
>It looks like they are almost building a "grammar of tables" similar to a grammar of graphics.
Agreed that Great Tables seems to be taking annother crack at formalizing a "grammar of tables", and I welcome this approach too given the power of tabular formats and wider adoption of the dataframe concept via the R/pandas/Arrows/polars ecosystem, although I believe the term was initially referred to in the 90s[1] from the statistical S language.
[1] https://towardsdatascience.com/preventing-the-death-of-the-d...
The other feature I have played with in this area is auto-cleaning. Auto-cleaning looks at individual columns and emits cleaning commands to the low-code UI. Different cleaning strategies can be implemented and toggled through.
Buckaroo takes the view that being opinionated is good, so long as you can toggle through opinions to get the right combination of cleaning, display, or post-processing that you are looking for quickly. All of the features of buckaroo are also built to be easily extendable by users.
This feature saw very little use, so I haven't developed it much (I had to disable it after some refactorings). The lowcode UI is demonstrated at the end of the youtube video linked above.
The top and bottom horizontal rules on the Title appear to be superfluous, and I dislike how it is aligned with the first column (row labels) rather than the second. I feel like a little space to breath at the bottom, along with a bold font would add visual hierarchy w/o the clutter.
The row label backgrounds are far too dark and the font weight makes it hard to read. I'd prefer a very light blue here instead. I don't like the row group label ("Name") being italicized.
The spanner labels floating in the centre make the table hard to scan. Would be much nicer aligned left.
Finally, I really dislike the font (maybe this is just my browser, though).
I mocked-up some of the changes here, I think this is a much easier to read table:
https://i.imgur.com/iMMf5vo.png
If you've seen sparklines, [2] Tufte coined the term.
Whenever I do a UI review I end up paging through it just to see if there's something we're not thinking about, and its an interesting book to just open to a random page and read.
Plus he has an entire treatise on why PowerPoint is terrible.
[1] https://www.edwardtufte.com/tufte/books_be
[2] https://en.wikipedia.org/wiki/Sparkline
As someone trying to build a PowerPoint competitor, this is awesome. I'm going to start here and work my way through his whole corpus
In contrast the census manual chooses to center almost all labels within their box, and when not it is almost always due to indentation, and moreover is unafraid to set column widths to fit the data not the labels, with indent and hyphenation to match. The result is both horizontally compact and intuitively comprehensible.
edit: on further reflection I also think it’s a crappy title. Titles and captions should convey context, scope, purpose - and may otherwise be omitted entirely for the editorial sin of failing to justify their own existence. As given, this one could be retitled “Table 1” with no loss of information or generality. For an article that’s trying to discuss and reformulate tabular presentation from first principles, that’s a tad disappointing. Since table titles form a crucial layer of their information catalogue, it is hardly surprising that the census manual devotes an entire chapter to the matter of title construction, and even though somewhat domain specific and archaically worded it is well worth the visit
More recent history involves the production of CALS tables https://en.wikipedia.org/wiki/CALS_Table_Model. The company Datalogics https://en.wikipedia.org/wiki/Datalogics was heavily involved in the CALS table initiative. Datalogics staff was part of the ISO committee forming SGML, and trained many people on SGML, including DoD staff and their contractors involved with documentation.
I was involved with the team that produced an editor for SGML-based documents. It had as one of its features the ability to specify the formatting of an element based on the SGML context of that element. This was before XSLT and its kin.
Alumni of Datalogics helped Microsoft learn about XML ("No, you can't arbitrarily switch case on XML element tags").
Also TeX practitioners have pretty well-formed opinions about how tables should be formatted.
Odd side-note: I learned that the documentation for a fighter airplane of the time, if printed out, would weigh more than the aircraft and would fill a football-field sized collection of filing cabinets.
And as much as many today don't like XML, coming from the SGML world it is a boon.
Indeed. I don’t think it’s all correct, though. On Visicalc, it says “The grid cells couldn’t be styled with borders for presentation purposes, the values couldn’t be formatted, and the tables couldn’t even be printed”
I think I even the first version had (limited, of course) formatting support.
http://www.bricklin.com/history/refcard3.htm says the “/F” command allowed setting justification and setting the number format to, for example, dollars and cents.
It is for version 1.35, but I think even the first version shipped supported at least showing dollars and cents.
While the article is overall good, there is a bunch of history that is not covered, or covered too briefly.
E.g.:
The text in the last three rows look 4/5ths the size of the text in the first row. However, even if summed, the last three costs add up to only 1/3rd of the top row! People visually see the number digits, which is roughly the same as Log 10.I’ve so often had this issue that I started putting in-cell bar charts into every finance-related spreadsheet.
Otherwise meetings will get derailed debating the cost of something trivial that is totally irrelevant compared to the biggest absolute costs.
As a real example, I had many meetings spent debating a $15 monthly cost for server log collection in the cloud for a VM running a database engine that costs $15K monthly for the license alone.
I just wanted to say that Rich is the only software developer I know, who when asked to lay out the philosophy of his package, would give you 5,000 years of history on the display of tables. :)
It makes me wonder how we've gone this long with increasingly poor data table presentations (the mid-century modern tables are astutely pointed to as shining examples).
This makes me excited to get back into data analysis with python. Moreover, I see some possible API improvements and extensions I'd like to make.
The designers of Great Tables might want to check out TPL. It covers everything Great Tables aims to do, and I think may have a few more tricks up its sleeves:
https://www.ojp.gov/pdffiles1/Digitization/68013NCJRS.pdf
Regardless, thanks for making Great Tables! This goes a long way towards making table producing in python much better.