jake-low (u/jake-low)

jake-low commented on Hobby's algorithm for aesthetic Bézier splines (2020) jakelow.com/blog/hobby-cu... · Posted by u/abetusk

jake-low · 2 years ago

For anyone interested in implementing Hobby's algorithm in their own projects, I recommend the paper "Typographers, programmers and mathematicians, or the case of an aesthetically pleasing interpolation" by Bogusław Jackowski [0]. It was my primary reference when working on the code for the examples in the blog post, and I found it easier to understand than Hobby's original paper. I mention this paper in a comment in the linked source code but it looks like I left it out of the post itself so figured I should share it here.

[0]: https://tug.org/TUGboat/tb34-2/tb107jackowski.pdf

jake-low commented on Hobby's algorithm for aesthetic Bézier splines (2020) jakelow.com/blog/hobby-cu... · Posted by u/abetusk

raphlinus · 2 years ago

No. Among other things, they'd have a higher degree of continuity if they were.

I personally find this a fascinating topic, so much so that I did a PhD on it. The most natural loss function to optimize for is minimizing bending energy, and there are solutions to that going back about 300 years. However, in practice that's unlikely to be what you want - the problem with it is that its scaling properties give an advantage to curves with a longer arclength even if they have more curvature variation. Intuitively, the smoothest curve through a set of co-circular points should be a circular arc, but that's not what you get from the minimum energy curve, at least unless you impose an additional arc length constraint.

The long story is of course much more complicated, but the short version is that the Euler spiral fixes these scaling issues and has some other really nice properties. If your problem is "what is the smoothest curve that goes through this sequence of points" then arguably the piecewise Euler spiral G2 continuous spline is the distinguished answer.

jake-low · 2 years ago

Hi Raph! I'm the author of the blog post. I actually read your PhD thesis when I was working on this project and trying to wrap my head around splines. It was a huge help to me in understanding the landscape of the field and how to think about and compare different classes of splines. Just wanted to say thanks!

jake-low commented on The Dendera Dating Controversy historytoday.com/archive/... · Posted by u/samizdis

jdlshore · 3 years ago

This intrigued me, so I looked it up. Sadly, your memory doesn’t seem to be correct. Mercury’s day is 59 earth days and its year is 88 earth days.

https://www.nhm.ac.uk/discover/planet-mercury.html

jake-low · 3 years ago

It depends on the definition of a “day” that you use, but due to Mercury’s elliptical orbit sunrises and sunsets are weird. [1] It takes 59 Earth days for the planet to revolve once around its axis, but from the surface it takes 176 Earth days to observe the sun making a complete circuit of the sky.

https://solarsystem.nasa.gov/planets/mercury/in-depth.amp

jake-low commented on Overlapping markup en.wikipedia.org/wiki/Ove... · Posted by u/akkartik

mdciotti · 3 years ago

I've frequently wondered why a hierarchical approach is the norm for text formatting. It seems that many problems could be solved trivially using a text buffer and a list of formatting sequences defined by a starting index and a length. The only place I've seen this in practice is in Telegram's TL Schema [1]. Is this method found anywhere else?

Edit to note: there is one obvious advantage to in-band markup such as HTML -- streaming formatted content. Though I wonder if this could be done with a non-hierarchical method, for example using in-band start tags which also encode the length.

Edit 2: looks like Condé Nast maintains a similar technology called atjson [2].

[1]: https://core.telegram.org/api/entities

[2]: https://github.com/CondeNast/atjson

jake-low · 3 years ago

There are a number of rich text editors that model documents as a flat array of characters and a separate table of formatting modifiers (each with an offset and length). Medium's text editor is one of them. This post [1] on their engineering blog introduced me to the idea, and I think it's a good starting point for anyone interested in this topic.

ProseMirror (a JavaScript library for building rich text editors) also employs a document model like this. The docs for that project [2] do a good job of explaining how their implementation of this idea works, and what problems it solves.

[1]: https://medium.engineering/why-contenteditable-is-terrible-1...

[2]: https://prosemirror.net/docs/guide/#doc

jake-low commented on A Mathematical Keyboard Layout (2018) terathon.com/blog/a-mathe... · Posted by u/susam

jake-low · 5 years ago

Very cool that the author had a custom keyboard made with their keymap printed on it.

Personally I like being able to type math symbols on occasion but don't do so often enough to benefit from a custom keyboard layout that I'd then have to memorize. I didn't have a good way to do this until about a year ago, when I learned about Espanso [1] which is a cross-platform text expander. I installed it and set it up to substitute various (vaguely LaTeX-inspired) macros to UTF-8 strings. For example, typing the following keystrokes

    x = R cos(:phi) sin(:lambda :minus :lambda:nought)

becomes x = R cos(φ) sin(λ − λ₀)

I chose ':' as a prefix for all my macros but this is just a self-enforced convention; you can configure a substitution for any sequence of keystrokes. Since I gave all the characters names that made sense to me, I don't have to think much when I type them.

A few of the substitutions I get the most mileage out of:

- The Greek alphabet, both upper and lowercase (:theta → θ and :Omega → Ω)

- Double-struck letters for numerical sets; e.g. :RR → ℝ

- :infinity → ∞

- :neq → ≠

- :pm → ±

[1]: https://github.com/federico-terzi/espanso

jake-low commented on The New York Times Releases Its Dataset of U.S. Confirmed Coronavirus Cases nytco.com/press/the-new-y... · Posted by u/infodocket

pj_mukh · 6 years ago

Noob Question: Anyone know an easy way to convert FIPS data (that this nytimes dataset uses) to Postal code?

jake-low · 6 years ago

I don't think it's generally possible. The FIPS code just identifies the county (the first two digits are the state, e.g. 06 for California) and the final three identify a county in that state. So technically that column is redundant with the "county" and "state" columns; I suspect they've included it to make joining this dataset to other data that might use different name-formatting of counties/states easier.

ZIP/postal codes are generally smaller (in the county I live in there are almost a hundred ZIP codes). I'm not sure they're even guaranteed to be entirely within one county either. We tend to think of ZIP codes as boundaries but they're actually delivery routes (which if you squint can be converted into boundaries by joining the properties that those routes serve together).

You might be interested in this article: https://carto.com/blog/zip-codes-spatial-analysis/

jake-low commented on The New York Times Releases Its Dataset of U.S. Confirmed Coronavirus Cases nytco.com/press/the-new-y... · Posted by u/infodocket

jake-low · 6 years ago

This is phenomenal. I've been scraping the data from primary sources for just Washington state for the past week [0], in order to make this chart which I hacked together last weekend [1].

[0]: https://github.com/jake-low/covid-19-wa-data

[1]: https://observablehq.com/@jake-low/covid-19-in-washington-st...

Doing this for just one state was a pretty substantial effort. I imagine there are multiple people at the Times who are spending several hours a day reviewing and cleaning scraped data (seems every couple of days some formatting change breaks your scripts, or a source publishes data that later needs to be retracted).

The Times dataset appears to contain per-county case and death observations in a time series, going all the way back to the first confirmed U.S. case in January in Snohomish County, WA. This makes it by far the most comprehensive time series dataset of U.S. COVID-19 cases publicly available.

Some people in this thread linked to the Johns Hopkins CSSE dataset; I've looked at this data but it doesn't go back very far in time for the U.S., and the tables are published as daily summaries with differing table schemas which makes them hard to use out of the box. For some days earlier in March, "sublocations" aren't even structured (for example the same column contains, "Boston, MA" and "Los Angeles County", making it very hard to use). No disrespect to the team behind the JHU dataset; it attempts to cover the whole world since the outbreak began which is an incredible and difficult goal. But for mapping and studying the outbreak in the U.S., the Times dataset will likely be the best choice right now.

Huge kudos to the New York Times team for making this data freely available.