In case it saves anyone a headache: Pivoting with Postgres’ JSON features (json_build_object and json_object_agg, specifically) is usually less difficult than crosstab IME.
I’ve used it for exactly this same sorta metadata logging.
Just ran into the fiddlyness of crosstab this week attempting to extend an existing query that used it. I ended up converting the query to use filters instead, which was a failure on my part to get through that fiddlyness -- but I'm consoling myself by saying it's a little more explicit this way for the next person. :)
Given what this function is doing, a limit on the output seems reasonable. Others have mentioned 1600 columns, which seems like a bit of a strange number, but might correlate internally to a maximum tuple of pointers to the source data.
In a quick skim of various searches I don't see any good alternatives to crosstab (AKA pivot) for such monstrously large use cases. Nearly all the toy examples include months, or possibly days where the aggregate inputs collect summary statistics. At 1600+ columns, it's probably better to re-process the data and convert it from tabular to columnar format in whatever way best fits the particular data.
So for example when you have a table like described (column_name, meta_key, value), you would create a query like this:
SELECT
column_name,
MAX(CASE WHEN meta_key='total_rows' THEN value ELSE NULL END) AS total_rows,
MAX(CASE WHEN meta_key='not_null_count' THEN value ELSE NULL END) AS not_null_count,
-- for all other metrics....
FROM tall_table
GROUP BY 1
Most of the time I'm using `filter ... where` for cases like these... for example
select
column_name,
MAX(value) FILTER (where meta_key='total_rows') as total_row,
MAX(value) FILTER (where meta_key='not_null_count') as not_null_count,
ROUND(SUM (amount_in_cents) FILTER (WHERE EXTRACT(MONTH FROM TIMESTAMP '2006-01-01 03:04:05) = 1) / 100.0, 2) as 'january_sub_total'
FROM table
GROUP BY column_name
In a similar vein, here is my very similar rant and a teeny tiny sqlalchemy plugin that deals with the grossness a bit if you're using python. Not entirely sure if it still works, was floored to realize I'd written this a decade ago! But the sqlalchemy API tends to be quite stable.
Maybe you could use column_name, dtype, min, max columns and sidestep the nulls, but you’d have to convert “min” and “max” to text so they’re all the same type?
Could also split the table by dtype outside of sql to ditch null values without needing to convert non-null ranges to text, but then you split your data into multiple tables, maybe that’s inconvenient to break table schemas into chunks?
I’ve used it for exactly this same sorta metadata logging.
Given what this function is doing, a limit on the output seems reasonable. Others have mentioned 1600 columns, which seems like a bit of a strange number, but might correlate internally to a maximum tuple of pointers to the source data.
In a quick skim of various searches I don't see any good alternatives to crosstab (AKA pivot) for such monstrously large use cases. Nearly all the toy examples include months, or possibly days where the aggregate inputs collect summary statistics. At 1600+ columns, it's probably better to re-process the data and convert it from tabular to columnar format in whatever way best fits the particular data.
https://stackoverflow.com/questions/20618323/create-a-pivot-...
So for example when you have a table like described (column_name, meta_key, value), you would create a query like this:
(edit: formatting)https://learn.microsoft.com/en-us/sql/t-sql/queries/from-usi...
https://blogs.oracle.com/sql/post/how-to-convert-rows-to-col...
https://github.com/makmanalp/sqlalchemy-crosstab-postgresql
I recently implemented a pivot table widget at work and it was mostly based on your code.
It was a while ago, but I remember it worked with very few changes.
Could also split the table by dtype outside of sql to ditch null values without needing to convert non-null ranges to text, but then you split your data into multiple tables, maybe that’s inconvenient to break table schemas into chunks?