ketozhang (u/ketozhang)

ketozhang commented on Read to forget mo42.bearblog.dev/read-to... · Posted by u/diymaker

HPsquared · 6 months ago

That's the idea behind "Getting Things Done" (GTD)

ketozhang · 6 months ago

GTD has the addition that you must create a system of reminders/followups. GTD is great to practice being okay with forgetting stuff and trusting your tracking system.

ketozhang commented on Polars Cloud and Distributed Polars now available pola.rs/posts/polars-clou... · Posted by u/jonbaer

drej · 6 months ago

Having done a bit of data engineering in my day, I'm growing more and more allergic to the DataFrame API (which I used 24/7 for years). From what I've seen over the past ~10 years, 90+% of use cases would be better served by SQL, both from the development perspective as well as debugging, onboarding, sharing, migrating etc.

Give an analyst AWS Athena, DuckDB, Snowflake, whatever, and they won't have to worry about looking up what m6.xlarge is and how it's different from c6g.large.

ketozhang · 6 months ago

I think your argument focuses a lot on the scenario where you already have cleaned data (i.e., data warehouse). I and many other data engineers agree, you're better off with hosting it on SQL RDBMS.

However, before that, you need a lot of code to clean the data and raw data does not fit well into a structured RDBMS. Here you choose to either map your raw data into row view or a table view. You're now left with the choice of either inventing your own domain object (row view) or use a dataframe (table view).

ketozhang commented on Where's the shovelware? Why AI coding claims don't add up mikelovesrobots.substack.... · Posted by u/dbalatero

noodletheworld · 6 months ago

> people are going to get a variety of results.

Yes, but the point of this article is surely that on average if it's working, there would be obvious signs of it working by now.

Even if there are statistical outliers (ie. 10x productivity using the tools), if on average, it does nothing to the productivity of developers, something isn't working as promised.

ketozhang · 6 months ago

We need long running averages and 2023-2025 is still too early to determine it's not effective. The barriers of entry for 2023 and 2024, I'd argue is too high for inexperienced developers to start churning software. For seasoned developers, the skepticism and company adoption wasn't there yet (and still isn't).

ketozhang commented on Where's the shovelware? Why AI coding claims don't add up mikelovesrobots.substack.... · Posted by u/dbalatero

ketozhang · 6 months ago

The data is surprising. However, I do wish this article looked carefully into barriers of entry as it can explain the lack of increases in your data.

For example, in Steam, it costs $100 to release a game. You may extend your game with what's called a DLC and that costs $0 to release. If I were to build shovelware with especially with AI-generated content, I'd more keen to make a single game with a bunch of DLC.

For game development, integration of AI into engines is another barrier. There aren't that many choices of engines that gives AI an interface to work with. The obvious interface is games that can be entirely build with code (e.g., pygame; even Godot is a big stretch)

ketozhang commented on uv: An extremely fast Python package and project manager, written in Rust github.com/astral-sh/uv... · Posted by u/chirau

incognito124 · 9 months ago

uv is almost perfect. my only pet peeve is updating dependencies. sometimes I just want to go "uv, bump all my dependencies to the as latest version as possible while respecting their constraints". I still haven't found an elegant way to do this, but I have written a script that parses pyproject.toml, removes the deps, and invokes `uv add --upgrade` with them.

other than that, it's invaluable to me, with the best features being uvx and PEP 723

ketozhang · 9 months ago

You could either delete the .venv and recreate it or run `uv pip install --upgrade .`

Much prefer not thinking about venvs.

ketozhang commented on The Dunning-Kruger effect is autocorrelation economicsfromthetopdown.c... · Posted by u/ljosifov

crazygringo · 2 years ago

Yup. Assuming the sample sizes are statistically significant, the original paper clearly shows:

- On average, people estimate their ability around the 65th percentile (actual results) rather than the 50th (simulated random results) -- a significant difference

- That people's self-estimation increases with their actual ability, but only by a surprisingly small degree (actual results show a slight upwards trend, simulated random results are flat) -- another significant difference

The author's entire discussion of "autocorrelation" is a red herring that has nothing to do with anything. Their randomly-generated results do not match what the original paper shows.

None of this really sheds much light on to what degree the results can be or have been robustly replicated, of course. But there's nothing inherently problematic whatsoever about the way it's visualized. (It would be nice to see bars for variance, though.)

ketozhang · 2 years ago

The autocorrelation is important to show that it's transformation to D-K plot will always give you the D-K affect for independent variables.

However, the focus on autocorrelation is not very illuminating. We can explain the behaviors found quite easily:

- If everyone's self-assessment score are (uniformally) random guesses, then the average self-assessment score for any quantile is 50%. Then of course those of lower quantile (less skilled) are overestimating.

- If self-assessment score vs actual score are dependent proportionally, then the average of each quantile is always at least it's quantile value. This is the D-K effect, which is weaker as the correlation grows.

-The opposite is true for disproportional relation.

So, the D-K plot is extremely sensitive to correlations and can easily over-exaggerate the weakest of correlations.

ketozhang commented on The Dunning-Kruger effect is autocorrelation economicsfromthetopdown.c... · Posted by u/ljosifov

snarkconjecture · 2 years ago

Nonstandard terminology warning: the author is using "autocorrelation" in a way I've never seen before. There is a much more common usage of "autocorrelation" to refer to the correlation of a timeseries with itself (shifted by some amount).

If you use autocorrelation to refer to the thing in OP, you'll probably confuse people who know statistics, and vice versa.

ketozhang · 2 years ago

The more common experience with autocorrelations are with time series, but what the author said is correct even in that context. A time series autocorrelation relates the same time series function at different times. At the simplest you plot the arrays X vs X where X[i] = f(t[i]). You then may complicate it further by some transformation g(X) vs X (e.g., moving average).

ketozhang commented on Fast self-hostable open-source workflow engine windmill.dev/blog/launch-... · Posted by u/rubenfiszel

ketozhang · 2 years ago

Did you guys considered existing standards when you chose what to use for representing workflow definitions before choosing OpenFlow? For example, Common Workflow Language

ketozhang commented on Use Timestamps jankremer.eu/micro/timest... · Posted by u/jankremer

lijok · 2 years ago

Few things irk me as much as systems that show you "N hours/minutes/seconds ago" instead of the timestamp. GitHub for example, of all systems, should know better. Trying to write up a report of any sort and not having access to accurate timestamps is very annoying.

ketozhang · 2 years ago

Hover your mouse over those and you should get the absolute date. Some if not many are using time tags.

ketozhang commented on SciPy builds for Python 3.12 on Windows are a minor miracle labs.quansight.org/blog/b... · Posted by u/todsacerdoti

aj7 · 2 years ago

Does anyone else find the handling of arrays by Python so horrific that they can’t bring themselves to use it?

ketozhang · 2 years ago

It's not popular because you're mostly hearing from the science community who want more features in their array (vector/matrix/tensors).

Why would you want to use C-like arrays in Python anyways?