quodlibetor (u/quodlibetor)

quodlibetor commented on Polars Cloud and Distributed Polars now available pola.rs/posts/polars-clou... · Posted by u/jonbaer

bobbylarrybobby · 3 days ago

You'd still have problems.

1. There are no variadic functions so you need to take a tuple: `|(Col<i32>("a"), Col<f64>("b"))|`

2. Turbofish! `|(Col::<i32>("a"), Col::<f64>("b"))|`. This is already getting quite verbose.

3. This needs to be general over all expressions (such as `col("a").str.to_lowercase()`, `col("b") * 2`, etc), so while you could pass a type such as Col if it were IntoExpr, its conversion into an expression would immediately drop the generic type information because Expr doesn't store that (at least not in a generic parameter; the type of the underlying series is always discovered at runtime). So you can't really skip those `.i32()?` calls.

Polars definitely made the right choice here — if Expr had a generic parameter, then you couldn't store Expr of different output types in arrays because they wouldn't all have the same type. You'd have to use tuples, which would lead to abysmal ergonomics compared to a Vec (can't append or remove without a macro; need a macro to implement functions for tuples up to length N for some gargantuan N). In addition to the ergonomics, Rust’s monomorphization would make compile times absolutely explode if every combination of input Exprs’ dtypes required compiling a separate version of each function, such as `with_columns()`, which currently is only compiled separately for different container types.

The reason web frameworks can do this is because of `$( $ty: FromRequestParts<S> + Send, )*`. All of the tuple elements share the generic parameter `S`, which would not be the case in Polars — or, if it were, would make `map` too limited to be useful.

quodlibetor · a day ago

Thanks for the insight!

quodlibetor commented on Polars Cloud and Distributed Polars now available pola.rs/posts/polars-clou... · Posted by u/jonbaer

bobbylarrybobby · 3 days ago

The issue with Rust is that as a strict language with no function overloading (except via traits) or keyword arguments, things get very verbose. For instance, in python you can treat a string as a list of columns as in `df.select('date')` whereas in Rust you need to write `df.select([col('date')])`. Let's say you want to map a function over three columns, it's going to look something like this:

``` df.with_column( map_multiple( |columns| { let col1 = columns[0].i32()?; let col2 = columns[1].str()?; let col3 = columns[3].f64()?; col1.into_iter() .zip(col2) .zip(col3) .map(|((x1, x2), x3)| { let (x1, x2, x3) = (x1?, x2?, x3?); Some(func(x1, x2, x3)) }) .collect::<StringChunked>() .into_column() }, [col("a"), col("b"), col("c")], GetOutput::from_type(DataType::String), ) .alias("new_col"), ); ```

Not much polars can do about that in Rust, that's just what the language requires. But in Python it would look something like

``` df.with_columns( pl.struct("a", "b", "c") .map_elements( lambda row: func(row["a"], row["b"], row["c"]), return_dtype=pl.String ) .alias("new_col") ) ```

Obviously the performance is nowhere close to comparable because you're calling a python function for each row, but this should give a sense of how much cleaner Python tends to be.

quodlibetor · 3 days ago

> Not much polars can do about that in Rust

I'm ignorant about the exact situation in Polars, but it seems like this is the same problem that web frameworks have to handle to enable registering arbitrary functions, and they generally do it with a FromRequest trait and macros that implement it for functions of up to N arguments. I'm curious if there are were attempts that failed for something like FromDataframe to enable at least |c: Col<i32>("a"), c2: Col<f64>("b")| {...}

https://github.com/tokio-rs/axum/blob/86868de80e0b3716d9ef39...

quodlibetor commented on Hurricane category 6 could be introduced under new storm severity scale livescience.com/planet-ea... · Posted by u/geox

varenc · 7 days ago

I believe this is the real paper for those curious: https://pure.lib.usf.edu/ws/portalfiles/portal/40758246/Adeq...

This new rating system uses the old system and 2 new rating categories

   Wind (from old system, 1min sustained speeds)
   Cat 1: 33–42 m/s (~74–95 mph)
   Cat 2: 43–49 m/s (~96–110 mph)
   Cat 3: 50–58 m/s (~111–129 mph)
   Cat 4: 59–69 m/s (~130–156 mph)
   Cat 5: >70 m/s   (>157 mph)
   
   Storm surge (peak surge height above tide)
   Cat 1: 0.75–1.54 m
   Cat 2: 1.55–2.34 m
   Cat 3: 2.35–3.14 m
   Cat 4: 3.15–3.99 m
   Cat 5: >4.00 m
   
   Accumulated rainfall (event total)
   Cat 1: 100–262 mm
   Cat 2: 263–425 mm
   Cat 3: 426–588 mm
   Cat 4: 589–749 mm
   Cat 5: >750 mm

quodlibetor · 7 days ago

And the following criteria:

(a) The final category can never be lower than the highest hazard-based category;

(b) The TCSS should adequately reflect the case of high potential risk of two or more hazards. We consider a hazard of high risk when its respect- ive category is classified as 3 or higher (equal to the definition for a Major Hurricane on the SSHWS). Whenever (at least) two high risk haz- ards have the same category value and the third hazard has a lower category value, the final category should increment the highest hazard- based category. This implies that a TC scoring a Category 3 on both wind and storm surge, and a Category 1 on rainfall, will be classified as a Category 4.

(c) To warn the general public for an event with multiple extreme hazards, a high-risk TC can be classified as a Category 6 when either 1. at least two of the hazard-based categories are of Cat- egory 5; or 2. two categories are of Category 4, and one of Category 5.

quodlibetor commented on The anti-abundance critique on housing is wrong derekthompson.org/p/the-a... · Posted by u/rbanffy

ch4s3 · a month ago

Vacancy doesn’t mean units held empty as either a parking place for cash or held off the market. Vacancy happens when you’re painting and repairing between rentals. Vacancy happens when there’s a renovation. Things like that are normal and not nefarious. Have 1.4% vacancy rate means there is essentially no usable housing for rent.

I was talking about the myth that there are tons of apartments held by rich people who don’t use them for anything.

quodlibetor · a month ago

My understanding is that vacancy means available units for rent. So, plausibly, if you say 50 of the 100 units in your building aren't available for rent because you say they're being painted then they don't contribute to the vacancy of your building.

That's almost the exact opposite of your definition, but I agree that a 1.4% vacancy rate means there's almost nothing available for rent.

I'm having trouble finding an official definition from a source that reports them, but my definition matches things that I can find online, eg https://www.brickunderground.com/rent/vacancy-rate-what-does...

quodlibetor commented on The anti-abundance critique on housing is wrong derekthompson.org/p/the-a... · Posted by u/rbanffy

tptacek · a month ago

1.4% vacancy in a housing market is extraordinarily low. Remember: there is structurally always some material amount of vacancy, because people vacate housing units well before new people move into them. This, by the way, is a stat whose interpretation you can just look up. Real estate people use it as a benchmark.

quodlibetor · a month ago

Yeah I know it's among the lowest in the world, it's still an ~order of magnitude higher than a few tenths of a percent, which would be shocking for the reasons you mention.

My point though was just that I've seen arguments that these numbers can be manipulated, and the city's own data doesn't make sense by itself: either the 1.4% number is wrong or the slowly recovering population estimate is wrong. Especially considering the 60,000 housing units (representing 2% growth) created.

quodlibetor commented on The anti-abundance critique on housing is wrong derekthompson.org/p/the-a... · Posted by u/rbanffy

ch4s3 · a month ago

Empty properties barely exist as a percentage of total housing supply in high cost of living areas in the US. You’re looking at no more than a few tenths of of a percentage point of NYC’s more than 4 million units.

quodlibetor · a month ago

> a few tenths of a percentage point of NYC

Feb 2024 (last year there's data, I think) was a record low and it was 1.4% empty, according to NYC[1].

But I don't really know the methodology, and according to other nyc gov data it's surprising, since we still haven't recovered our population from COVID[2].

The first statistic (housing pressure) is based on population growth, but the NYC population statistics suggest still meaningful population loss since 2020.

I have seen articles in the past that suggest that apartment vacancy rates in NYC are self-reported and misleading at best, but I don't really understand how that would work and I can't find any sources on that now.

It's also my understanding that some classes of landlords can mark empty apartments as income losses, basically or partially making up for the loss of revenue in tax rebates. But that's also not something I understand well, just something I have seen asserted.

[1]: https://www.nyc.gov/site/hpd/news/007-24/new-york-city-s-vac... [2]: https://s-media.nyc.gov/agencies/dcp/assets/files/pdf/data-t...

quodlibetor commented on S5cmd: Parallel S3 and local filesystem execution tool github.com/peak/s5cmd... · Posted by u/polyrand

quodlibetor · 3 months ago

I recently wrote a similar tool focused more on optimizing the case of exploring millions or billions of objects when you know a few aspects of the path: https://github.com/quodlibetor/s3glob

It supports glob patterns like so, and will do smart filtering at every stage possible: */2025-0[45]-*/user*/*/object.txt

I haven't done real benchmarks, but it's parallel enough to hit s3 parallel request limits/file system open file limits when downloading.*

quodlibetor commented on Some of us like "interdiff" code review gist.github.com/thoughtpo... · Posted by u/todsacerdoti

quodlibetor · a year ago

I have been chasing the gerrit code review high since I left a company that used it almost 5 years ago.

Stacked pull requests are usually what people point to to get this back, but this article points out that _just_ stacked pull requests don't handle it correctly. Specifically with github, you can't really see the differences in response to code review comments, you just get a new commit. Additionally, often github loses conversations on lines that have disappeared due to force pushes.

That said, I have a couple scripts that make it easier to to work with stacks of PRs (the git-*stack scripts in[1]) and a program git-instafix[2] that makes amending old commits less painful. I recently found ejoffe/spr[3] which seems like a tool that is similar to my scripts but much more pleasant for working with stacked PRs.

There's also spacedentist/spr[4] which gets _much_ closer to gerrit-style "treat each commit like a change and make it easier for people to review responses" with careful branch and commit management. Changes don't create new commits locally, they only create new commits in the PR that you're working on. It's, unfortunately, got many more rough edges than ejoffe/spr and is less maintained.

[1]: https://github.com/quodlibetor/dotfiles/tree/main/dot_local/... [2]: https://github.com/quodlibetor/git-instafix/ [3]: https://github.com/ejoffe/spr [4]: https://github.com/spacedentist/spr

quodlibetor commented on Show HN: Inshellisense – IDE style shell autocomplete github.com/microsoft/insh... · Posted by u/cpendery

guessmyname · 2 years ago

> It is very possible to write sub 100ms procedures in TS, […]

I won’t dispute this statement since I currently lack the means to assess inshellisense. Would it be possible for you (or someone with a functional Node + NPM setup) to install inshellisense and share the actual performance figures? You could use a tool like hyperfine (https://github.com/sharkdp/hyperfine) for this purpose.

As an attempt to test this myself, I used a Docker image (version 21.1.0-bookworm from https://hub.docker.com/_/node/). The TypeScript tool installed without any issues, along with the binding, which simply adds the following line into ~/.bashrc:

    [ -f ~/.inshellisense/key-bindings.bash ] && source ~/.inshellisense/key-bindings.bash

However, when I initiated a new Bash session within the same Docker container to activate the updated Bash configuration, I encountered the following error:

    bash: /root/.inshellisense/key-bindings.bash: line 1: syntax error near unexpected token `$'{\r''
    'ash: /root/.inshellisense/key-bindings.bash: line 1: `__inshellisense__() {

Due to this issue, I am unable to perform a performance test using hyperfine.

The version of Bash available in this Docker image is 5.2.15(1)-release.

I verified that the content of /root/.inshellisense/key-bindings.bash is exactly the same as https://github.com/microsoft/inshellisense/blob/main/shell/k...

quodlibetor · 2 years ago

I'm pretty sure that the scripts generated by inshellisense are CRLF, and the carriage returns aren't recognized by unix shells.

You should be able to fix it with:

    vi $HOME/.inshellisense/key-bindings.zsh -c "set ff=unix" -c ":wq"