closed (u/closed) - Readit News

closed commented on Polars Cloud and Distributed Polars now available pola.rs/posts/polars-clou... · Posted by u/jonbaer

phailhaus · 6 days ago

The problem with the dataframe API is that whenever you want to change a small part of your logic, you usually have to rethink and rewrite the whole solution. It is too difficult to write reusable code. Too many functions that try to do too many things with a million kwargs that each have their own nuances. This is because these libraries tend to favor fewer keystrokes over composable design. So the easy stuff is easy and makes for pretty docs, but the hard stuff is obnoxious to reason through.

This article explains it pretty well: https://dynomight.net/numpy/

closed · 6 days ago

I have used numpy, but don't understand what it has to do with dataframe apis

Take two examples of dataframe apis, dplyr and ibis. Both can run on a range of SQL backends because dataframe apis are very similar to SQL DML apis.

Moreover, the SQL translation for tools for pivot_longer in R are a good illustration of complex dynamics dataframe apis can support, that you'd use something like dbt to implement in your SQL models. duckdb allows dynamic column selection in unpivot. But in some SQL dialects this is impossible. dataframe apis -> SQL tools (or dbt) enable them in these dialects.

closed commented on What makes code hard to read: Visual patterns of complexity (2023) seeinglogic.com/posts/vis... · Posted by u/homarp

memhole · 6 months ago

Good to know. I assumed it was all done via objects or things like objects.

So is piping more functional programming?

closed · 6 months ago

I think it's often a syntax convenience. For example, Polars and Pandas both have DataFrame.pipe(...) methods, that create the same effect. But it's a bit cumbersome to write.

Here's a comparison:

* Method chaining: `df.pipe(f1, a=1, b=2).pipe(f2, c=1)`

* Pipe syntax: `df |> f1(a=1, b=2) |> f2(c=1)`

closed commented on What makes code hard to read: Visual patterns of complexity (2023) seeinglogic.com/posts/vis... · Posted by u/homarp

memhole · 6 months ago

For anyone interested in this as design, it’s called method chaining.

closed · 6 months ago

I think piping and method chaining are a little bit different.

Piping generally chains functions, by passing the result of one call into the next (eg result is first argument to the next).

Method chaining, like in Python, can't do this via syntax. Methods live on an object. Pipes work on any function, not just an object's methods (which can only chain to other object methods, not any function whose eg first argument can take that object).

For example, if you access Polars.DataFrame.style it returns a great_tables.GT object. But in a piping world, we wouldn't have had to add a style property that just calls GT() on the data. With a pipe, people would just be able to pipe their DataFrame to GT().

closed commented on Revisiting Stereotype Threat speakandregret.michaelinz... · Posted by u/systemstops

disconap · 9 months ago

I participated as a subject in a research study at Stanford involving race and stereotype threat in the early 2000s. The details are hazy, but the final readout was the distance I put my chair to a group of chairs that students of a particular racial group were supposed to sit. Evidently I put them in a position that was contrary to the effect the researcher was seeking. She intensely asked me a ton of questions about my background and eventually tossed my data point for having lived in a racially diverse area growing up. This wasn't a pre-inclusion criteria, but a possible act of scientific fraud. Huge bummer since there are honest people in every profession, and I imagine a lot of them didn't succeed the way that the fraudsters thrived.

closed · 9 months ago

Sounds like this study (published in 2008):

"The space between us: stereotype threat and distance in interracial contexts"

It mentions being run at Stanford, and was pretty popular (Claude Steele discussed in his book Whistling Vivaldi).

https://psycnet.apa.org/doiLanding?doi=10.1037%2F0022-3514.9...

closed commented on A data table thousands of years old (2020) datafix.com.au/BASHing/20... · Posted by u/rickcarlino

closed · 9 months ago

It's neat to see tablets discussed in the context of modern tools. I recently helped edit an article for Great Tables[1] that discusses the history of tables like this, and recently Hannes mentioned a protocuniform tablet in his duckdb keynote at posit::conf()[2].

There's something really inspiring from realizing how far back tables go.

[1]: https://posit-dev.github.io/great-tables/blog/design-philoso...

[2]: https://youtu.be/GELhdezYmP0?si=bSISmFjeRpKxfLWq

closed commented on The Opposite of Documentation is Superstition (2020) buttondown.com/hillelwayn... · Posted by u/BerislavLopac

closed · 9 months ago

This is an interesting case, since the pigeon study is about what happens when the underlying process is random.

But if the shape drawing process isn't random, I think the author's experience of feeling unable to articulate the rules AND gravitating to a set of behaviors is a good example of procedural memory (implicit vs explicit).

Explicit rules would probably help speed things up, though!

closed commented on Research in psychology: are we learning anything? experimental-history.com/... · Posted by u/ctoth

godelski · a year ago

I think one of the great ironies is that psychology is one of the hardest sciences but is treated so soft. I say this holding a degree in physics! (undergrad physics, grad CS/ML)

By this I mean that to make confident predictions, you need some serious statistics, but psych is one of the least math heavy sciences (thankfully they recently learned about Bayes and there's a revolution going on). Unlike physics or chemistry, you have so little control over your experiments.

There's also the problem of measurements. We stress in experimental physics that you can only measure things by proxy. This is like you measure distance by using a ruler, and you're not really measuring "a meter" but the ruler's approximation of a meter. This is why we care so much about calibration and uncertainty, making multiple measurements with different measuring devices (gets stats on that class of device) and from different measuring techniques (e.g. ruler, laser range finder, etc). But psych? What the fuck does it even mean "to measure attention"?! It's hard enough dealing with the fact that "a meter" is "a construct" but in psych your concepts are much less well defined (i.e. higher uncertainty). And then everything is just empirical?! No causal system even (barely) attempted?! (In case you've ever wondered, this is a glimpse of why physicists struggle in ML. Not because the work, but accepting the results. See also Dyson and von Neumann's Elephant)

I've jokingly likened psych to alchemy, meaning proto-chemistry -- chemistry prior to the atomic model (chemistry is "the study of electrons") -- or to astrology (astronomy pre-Kepler, not astrology we see today). I do think that's where the field is at, because there is no fundamental laws. That doesn't mean it isn't useful. Copernicus, Brahe, Galileo (same time as Kepler; they fought), and many others did amazing work and are essential figures to astronomy and astrophysics today. But psych is in an interesting boat. There are many tools at their disposal that could really help them make major strides towards determining these "laws". But it'll take a serious revolution and some major push to have some extremely tough math chops to get there. It likely won't come from ML (who suffers similar issues of rigor), but maybe from neuroscience or plain old stats (econ surprisingly contributes, more to sociology though). My worry is that the slop has too much momentum and that criticism will be dismissed because it is viewed as saying that the researchers are lazy, dumb, or incompetent rather than the monumental difficulties that are natural to the field (though both may be true, and one can cause the other). But I do hope to see it. Especially as someone in ML. We can really see the need to pin down these concepts such as cognition, consciousness, intelligence, reasoning, emotions, desire, thinking, will, and so on. These are not remotely easy problems to solve. But it is easy to convince yourself that you do understand, as long as you stop asking why after a certain point.

And I do hope these conversations continue. Light is the best disinfectant. Science is about seeking truth, not answers. That often requires a lot of nuance, unfortunately. I know it will cause some to distrust science more, but I have the feeling they were already looking for reasons to.

closed · a year ago

As someone who did statistics and psychology, I'm very surprised by this take, for a few reasons:

1. Many of the early pioneers in statistics were psychologists.

2. The econ x psych connection is strong (eg econometrics and psychometrics share a lot in common and know of each other)

3. Many of the people I see with math chops trying to do psychology are bad at the philosophy side (eg what is a construct; how do constructs like intelligence get established)

closed commented on Increasing Retention Without Increasing Study Time [pdf] files.eric.ed.gov/fulltex... · Posted by u/JustinSkycak

CuriouslyC · a year ago

This is pretty much the current state of the art in learning research: https://bjorklab.psych.ucla.edu/research/

closed · a year ago

From a more applied angle, a book like "10 steps to complex learning" might be helpful.

I come from a similar cog psych background as the Bjork Lab, so am a big fan of their research, but books like 10 steps come from instructional design, which is a bit more focused on the big picture (designing a whole course vs individual mechanisms).

closed commented on The design philosophy of Great Tables posit-dev.github.io/great... · Posted by u/randyzwitch

closed · a year ago

Hey one of the co-maintainers of Great Tables, along with Rich Iannone, here!

I just wanted to say that Rich is the only software developer I know, who when asked to lay out the philosophy of his package, would give you 5,000 years of history on the display of tables. :)