It's clear that whatever tests he writes cover well established and understood concepts.
This is where I believe people are missing the point. GPT4 is not a general intelligence. It is a highly overfit model, but it's overfit to literally every piece of human knowledge.
Language is humanities way of modelling real world concepts. So GPT is able to leverage the relationships we create through our language to real world concepts. It's just learned all language up until today.
It's an incredible knowledge retrieval machine. It can even mimick how our language is used to conduct reasoning very well.
It can't do this efficiently, nor can it actually stumble upon a new insight because it's not being exposed in real time to the real world.
So, this professors 'new' test is not really new. It's just a test that fundamentally has already been modelled.
If you were to say “pandas in long format only” then yes that would be correct, but the power of pandas comes in its ability to work in a long relational or wide ndarray style. Pandas was originally written to replace excel in financial/econometric modeling, not as a replacement for sql. Models written solely in the long relational style are near unmaintainable for constantly evolving models with hundreds of data sources and thousands of interactions being developed and tuned by teams of analysts and engineers. For example, this is how some basic operations would look.
Bump prices in March 2023 up 10%:
# pandas
prices_df.loc['2023-03'] *= 1.1
# polars
polars_df.with_column(
pl.when(pl.col('timestamp').is_between(
datetime('2023-03-01'),
datetime('2023-03-31'),
include_bounds=True
)).then(pl.col('val') * 1.1)
.otherwise(pl.col('val'))
.alias('val')
)
Add expected temperature offsets to base temperature forecast at the state county level: # pandas
temp_df + offset_df
# polars
(
temp_df
.join(offset_df, on=['state', 'county', 'timestamp'], suffix='_r')
.with_column(
( pl.col('val') + pl.col('val_r')).alias('val')
)
.select(['state', 'county', 'timestamp', 'val'])
)
Now imagine thousands of such operations, and you can see the necessity of pandas in models like this.