I personally find this a fascinating topic, so much so that I did a PhD on it. The most natural loss function to optimize for is minimizing bending energy, and there are solutions to that going back about 300 years. However, in practice that's unlikely to be what you want - the problem with it is that its scaling properties give an advantage to curves with a longer arclength even if they have more curvature variation. Intuitively, the smoothest curve through a set of co-circular points should be a circular arc, but that's not what you get from the minimum energy curve, at least unless you impose an additional arc length constraint.
The long story is of course much more complicated, but the short version is that the Euler spiral fixes these scaling issues and has some other really nice properties. If your problem is "what is the smoothest curve that goes through this sequence of points" then arguably the piecewise Euler spiral G2 continuous spline is the distinguished answer.
Edit to note: there is one obvious advantage to in-band markup such as HTML -- streaming formatted content. Though I wonder if this could be done with a non-hierarchical method, for example using in-band start tags which also encode the length.
Edit 2: looks like Condé Nast maintains a similar technology called atjson [2].
ProseMirror (a JavaScript library for building rich text editors) also employs a document model like this. The docs for that project [2] do a good job of explaining how their implementation of this idea works, and what problems it solves.
[1]: https://medium.engineering/why-contenteditable-is-terrible-1...
Personally I like being able to type math symbols on occasion but don't do so often enough to benefit from a custom keyboard layout that I'd then have to memorize. I didn't have a good way to do this until about a year ago, when I learned about Espanso [1] which is a cross-platform text expander. I installed it and set it up to substitute various (vaguely LaTeX-inspired) macros to UTF-8 strings. For example, typing the following keystrokes
x = R cos(:phi) sin(:lambda :minus :lambda:nought)
becomes x = R cos(φ) sin(λ − λ₀)I chose ':' as a prefix for all my macros but this is just a self-enforced convention; you can configure a substitution for any sequence of keystrokes. Since I gave all the characters names that made sense to me, I don't have to think much when I type them.
A few of the substitutions I get the most mileage out of:
- The Greek alphabet, both upper and lowercase (:theta → θ and :Omega → Ω)
- Double-struck letters for numerical sets; e.g. :RR → ℝ
- :infinity → ∞
- :neq → ≠
- :pm → ±
ZIP/postal codes are generally smaller (in the county I live in there are almost a hundred ZIP codes). I'm not sure they're even guaranteed to be entirely within one county either. We tend to think of ZIP codes as boundaries but they're actually delivery routes (which if you squint can be converted into boundaries by joining the properties that those routes serve together).
You might be interested in this article: https://carto.com/blog/zip-codes-spatial-analysis/
[0]: https://github.com/jake-low/covid-19-wa-data
[1]: https://observablehq.com/@jake-low/covid-19-in-washington-st...
Doing this for just one state was a pretty substantial effort. I imagine there are multiple people at the Times who are spending several hours a day reviewing and cleaning scraped data (seems every couple of days some formatting change breaks your scripts, or a source publishes data that later needs to be retracted).
The Times dataset appears to contain per-county case and death observations in a time series, going all the way back to the first confirmed U.S. case in January in Snohomish County, WA. This makes it by far the most comprehensive time series dataset of U.S. COVID-19 cases publicly available.
Some people in this thread linked to the Johns Hopkins CSSE dataset; I've looked at this data but it doesn't go back very far in time for the U.S., and the tables are published as daily summaries with differing table schemas which makes them hard to use out of the box. For some days earlier in March, "sublocations" aren't even structured (for example the same column contains, "Boston, MA" and "Los Angeles County", making it very hard to use). No disrespect to the team behind the JHU dataset; it attempts to cover the whole world since the outbreak began which is an incredible and difficult goal. But for mapping and studying the outbreak in the U.S., the Times dataset will likely be the best choice right now.
Huge kudos to the New York Times team for making this data freely available.
[0]: https://tug.org/TUGboat/tb34-2/tb107jackowski.pdf