PostgreSQL and Its Annoying Crosstab

Whenever I needed to do a pivot in postgres, I used this approach that is described in this stackoverflow anwser:

https://stackoverflow.com/questions/20618323/create-a-pivot-...

So for example when you have a table like described (column_name, meta_key, value), you would create a query like this:

  SELECT
    column_name,
    MAX(CASE WHEN meta_key='total_rows' THEN value ELSE NULL END) AS total_rows,
    MAX(CASE WHEN meta_key='not_null_count' THEN value ELSE NULL END) AS not_null_count,
    -- for all other metrics....
  FROM tall_table
  GROUP BY 1

(edit: formatting)

jasonpbecker · 2 years ago

Most of the time I'm using `filter ... where` for cases like these... for example

  select
    column_name,
    MAX(value) FILTER (where meta_key='total_rows') as total_row,
    MAX(value) FILTER (where meta_key='not_null_count') as not_null_count,
    ROUND(SUM (amount_in_cents) FILTER (WHERE EXTRACT(MONTH   FROM TIMESTAMP '2006-01-01 03:04:05) = 1) / 100.0, 2) as 'january_sub_total'
  FROM table
  GROUP BY column_name

kanobi · 2 years ago

Yup, that's even better. I think it didn't work for me because I was using Redshift, which didn't support FILTER at that time.

wswope · 2 years ago

In case it saves anyone a headache: Pivoting with Postgres’ JSON features (json_build_object and json_object_agg, specifically) is usually less difficult than crosstab IME.

I’ve used it for exactly this same sorta metadata logging.

singingfish · 2 years ago

Interesting, thanks! On a similar but unrelated note, I think instead of MERGE INTOs, self-updating CTEs are a better bet for similar reasons.

nicholasjon · 2 years ago

Just ran into the fiddlyness of crosstab this week attempting to extend an existing query that used it. I ended up converting the query to use filters instead, which was a failure on my part to get through that fiddlyness -- but I'm consoling myself by saying it's a little more explicit this way for the next person. :)

mjevans · 2 years ago

https://www.postgresql.org/docs/current/tablefunc.html#TABLE...

Given what this function is doing, a limit on the output seems reasonable. Others have mentioned 1600 columns, which seems like a bit of a strange number, but might correlate internally to a maximum tuple of pointers to the source data.

In a quick skim of various searches I don't see any good alternatives to crosstab (AKA pivot) for such monstrously large use cases. Nearly all the toy examples include months, or possibly days where the aggregate inputs collect summary statistics. At 1600+ columns, it's probably better to re-process the data and convert it from tabular to columnar format in whatever way best fits the particular data.

TheRealDunkirk · 2 years ago

AND the function is hard-limited to 1600 columns. Don't ask me how I know; you already know.

ttfkam · 2 years ago

Someday Postgres will get support for PIVOT/UNPIVOT, but today is sadly not that day.

https://learn.microsoft.com/en-us/sql/t-sql/queries/from-usi...

https://blogs.oracle.com/sql/post/how-to-convert-rows-to-col...

makmanalp · 2 years ago

In a similar vein, here is my very similar rant and a teeny tiny sqlalchemy plugin that deals with the grossness a bit if you're using python. Not entirely sure if it still works, was floored to realize I'd written this a decade ago! But the sqlalchemy API tends to be quite stable.

https://github.com/makmanalp/sqlalchemy-crosstab-postgresql

tommasoamici · 2 years ago

Thanks for your work on that!

I recently implemented a pivot table widget at work and it was mostly based on your code.

It was a while ago, but I remember it worked with very few changes.

Oh wow, it's a tiny world! Glad to hear it worked out and you're more than welcome! :-)

bionhoward · 2 years ago

Maybe you could use column_name, dtype, min, max columns and sidestep the nulls, but you’d have to convert “min” and “max” to text so they’re all the same type?

Could also split the table by dtype outside of sql to ditch null values without needing to convert non-null ranges to text, but then you split your data into multiple tables, maybe that’s inconvenient to break table schemas into chunks?