NIST was 5 μs off UTC after last week's power cut

I found the most interesting part of the NIST outage post [1] is NIST's special Time Over Fiber (TOF) program [2] that "provides high-precision time transfer by other service arrangements; some direct fiber-optic links were affected and users will be contacted separately."

I've never heard of this! Very cool service, presumably for … quant / HFT / finance firms (maybe for compliance with FINRA Rule 4590 [3])? Telecom providers synchronizing 5G clocks for time-division duplexing [4]? Google/hyperscalers as input to Spanner or other global databases?

Seriously fascinating to me -- who would be a commercial consumer of NIST TOF?

[1] https://groups.google.com/a/list.nist.gov/g/internet-time-se...

[2] https://www.nist.gov/pml/time-and-frequency-division/time-se...

[3] https://www.finra.org/rules-guidance/rulebooks/finra-rules/4...

[4] https://www.ericsson.com/en/blog/2019/8/what-you-need-to-kno...

dmurray · 4 days ago

I never saw a need for this in HFT. In my experience, GPS was used instead, but there was never any critical need for microsecond accuracy in live systems. Sub-microsecond latency, yes, but when that mattered it was in order to do something as soon as possible rather than as close as possible to Wall Clock Time X.

Still useful for post-trade analysis; perhaps you can determine that a competitor now has a faster connection than you.

The regulatory requirement you linked (and other typical requirements from regulators) allows a tolerance of one second, so it doesn't call for this kind of technology.

blibble · 4 days ago

> I never saw a need for this in HFT. In my experience, GPS was used instead, but there was never any critical need for microsecond accuracy in live systems.

mifid ii (uk/eu) minimum is 1us granularity

https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:...

goalieca · 4 days ago

My guess would be scientific experiments where they need to correlate or sequence data over large regions. Things like correlating gravitational waves with radio signals and gamma ray bursts.

prpl · 4 days ago

those are GPS based too. You typically would have a circuit you trained off off 1PPS and hopefully had a 10 or so satellites in view.

You can get 50ns with this. Of course, you would verify at NIST.

bob1029 · 4 days ago

> a commercial consumer

Where does it say these are commercial consumers?

https://en.wikipedia.org/wiki/Schriever_Space_Force_Base#Rol...

> Building 400 at Schriever SFB is the main control point for the Global Positioning System (GPS).

throw0101c · 4 days ago

> I've never heard of this! Very cool service, presumably for … quant / HFT / finance firms (maybe for compliance with FINRA Rule 4590 [3])?

To start with, probably for scientific stuff, à la:

* https://en.wikipedia.org/wiki/White_Rabbit_Project

But fibre-based time is important in case of GNSS time signal loss:

* https://www.gpsworld.com/china-finishing-high-precision-grou...

esseph · 4 days ago

I'm sure all of that is true, but so is "Department of Defense".

They're also the largest holder of IPv4 space, still. https://bgp.he.net/report/peers#_ipv4addresses

squigz · 3 days ago

Why does the DoD hold so many IPv4s?

ignoramous · 3 days ago

> Google/hyperscalers as input to Spanner or other global databases?

Think Google might have rolled their own clock sources and corrections.

Ex: Sundial, https://www.usenix.org/conference/osdi20/presentation/li-yul... / https://storage.googleapis.com/gweb-research2023-media/pubto... (pdf)

mmaunder · 4 days ago

SIGINT as a source clock for others in a network doing super accurate TDOA for example.

anilakar · 3 days ago

But they do not need absolute time, and internal rubidium clocks can keep the required accuracy for a few days. After that, sync can be transferred with a portable plug, which is completely viable in tactical/operational level EW systems.

secondcoming · 4 days ago

I think Google uses chrony instead of NTP

creatonez · 4 days ago

Google doesn't use chrony specifically, just an algorithm that is somewhat chrony-like (but very different in other ways). It's called Google TrueTime.

machinationu · 4 days ago

science equipment, distributed radio-telescopes where you need to precisely align data received at different locations

I'm missing the nuance or perhaps the difference between the first scenario where sending inaccurate time was worse than sending no time, versus the present where they are sending inaccurate time. Sorry if it's obvious.

opello · 4 days ago

The 5us inaccuracy is basically irrelevant to NTP users, from the second update to the Internet Time Service mailing list[1]:

To put a deviation of a few microseconds in context, the NIST time scale usually performs about five thousand times better than this at the nanosecond scale by composing a special statistical average of many clocks. Such precision is important for scientific applications, telecommunications, critical infrastructure, and integrity monitoring of positioning systems. But this precision is not achievable with time transfer over the public Internet; uncertainties on the order of 1 millisecond (one thousandth of one second) are more typical due to asymmetry and fluctuations in packet delay.

[1] https://groups.google.com/a/list.nist.gov/g/internet-time-se...

zahlman · 4 days ago

> Such precision is important for scientific applications, telecommunications, critical infrastructure, and integrity monitoring of positioning systems. But this precision is not achievable with time transfer over the public Internet

How do those other applications obtain the precise value they need without encountering the Internet issue?

BuildTheRobots · 4 days ago

It's a good question, and I wondered the same. I don't know, but I'd postulate:

As it stands at the minute, the clocks are a mere 5 microseconds out and will slowly get better over time. This isn't even in the error measurement range and so they know it's not going to have a major effect on anything.

When the event started and they lost power and access to the site, they also lost their management access to the clocks as well. At this point they don't know how wrong the clocks are, or how more wrong they're going to get.

If someone restores power to the campus, the clocks are going to be online (all the switches and routers connecting them to the internet suddenly boot up), before they've had a chance to get admin control back. If something happened when they were offline and the clocks drifted significantly, then when they came online half the world might decide to believe them and suddenly step change to follow them. This could cause absolute havoc.

Potentially safer to scram something than have it come back online in an unknown state, especially if (lots of) other things are are going to react to it.

In the last NIST post, someone linked to The Time Rift of 2100: How We lost the Future --- and Gained the Past. It's a short story that highlights some of the dangers of fractured time in a world that uses high precision timing to let things talk to each other: https://tech.slashdot.org/comments.pl?sid=7132077&cid=493082...

throw0101d · 4 days ago

> […] where sending inaccurate time was worse than sending no time […]

When you ask a question, it is sometimes better to not get an answer—and know you have not-gotten an answer—then to get the wrong answer. If you know that a 'bad' situation has arisen, you can start contingency measures to deal with it.

If you have a fire alarm: would you rather have it fail in such a way that it gives no answer, or fail in a way where it says "things are okay" even if it doesn't know?

semenko · 4 days ago

loph · 4 days ago

Only Boulder servers lost sync.

To say NIST was off is clickbait hyperbole.

This page: https://tf.nist.gov/tf-cgi/servers.cgi shows that NIST has > 16 NTP servers on IPv4, of those, 5 are in Boulder and were affected by the power failure. The rest were fine.

However, most entities should not be using these top-level servers anyway, so this should have been a problem for exactly nobody.

IMHO, most applications should use pool.ntp.org

crazydoggers · 3 days ago

I believe if you use time.nist.gov it round robins dns requests, so there’s a chance you’d have connected to the Boulder server. So for some people they would have experienced NIST 5 μs off.

NetMageSCW · 4 days ago

Who does use those top-level servers? Aren’t some of them propagating the error or are all secondary level servers configured to use dispersed top-level servers? And how do they decide who is right when they don’t match?

Is pool.ntp.org dispersed across possible interference and error correlation?

mcpherrinm · 4 days ago

You can look at who the "Stratum 2" servers are, in the NTP.org pool and otherwise. Those are servers who sync from Stratum 1, like NIST.

Anyone can join the NTP.org pool so it's hard to make blanket statements about it. I believe there's some monitoring of servers in the pool but I don't know the details.

For example, Ubuntu systems point to their Stratum 2 timeservers by default, and I'd have to imagine that NIST is probably one of their upstreams.

An NTP server usually has multiple upstream sources and can steer its clock to minimize the error across multiple servers, as well as detecting misbehaving servers and reject them ("Falseticker"). Different NTP server implementations might do this a bit differently.

yardstick · 3 days ago

From my own experience managing large numbers of routers, and troubleshooting issues, I will never use pool.ntp.org again. I’ve seen unresponsive servers as well as incorrect time by hours or days. It’s pure luck to get a good result.

Instead I’ll stick to a major operator like Google/Microsoft/Apple, which have NTP systems designed to handle the scale of all the devices they sell, and are well maintained.

ziml77 · 4 days ago

Nitpick: UTC stands for Coordinated Universal Time. The ordering of the letters was chosen to not match the English or the French names so neither language got preference.

userbinator · 4 days ago

Universal Time, Coordinated.

ChadMoran · 4 days ago

This is how I say it in my head.

O5vYtytb · 4 days ago

That doesn't quite match what the wikipedia page says:

> The official abbreviation for Coordinated Universal Time is UTC. This abbreviation comes as a result of the International Telecommunication Union and the International Astronomical Union wanting to use the same abbreviation in all languages. The compromise that emerged was UTC, which conforms to the pattern for the abbreviations of the variants of Universal Time (UT0, UT1, UT2, UT1R, etc.).

shawnz · 4 days ago

Follow the citation: https://www.nist.gov/pml/time-and-frequency-division/how-utc...

> ... in English the abbreviation for coordinated universal time would be CUT, while in French the abbreviation for "temps universel coordonné" would be TUC. To avoid appearing to favor any particular language, the abbreviation UTC was selected.

dagurp · 3 days ago

It's also the time in Iceland, conveniently

ambicapter · 4 days ago

That's an interesting rationale.

hunter2_ · 3 days ago

Reminds me of when a group is divided into two parts, dubbed group 1 and group A, such that neither feels secondary.

asdfman123 · 3 days ago

The trick is making sure no one is happy with the final outcome

stavros · 4 days ago

It also stands for Universel Temps Coordonné.

Aloisius · 4 days ago

It's le temps universel coordonné in French.

ComputerGuru · 4 days ago

Not exactly the topic of discussion but also not not on topic: just wanted to sing praise for chrony which has performed better than the traditional os-native NTP clients in our testing on a myriad of real and virtualized hardware.

steve1977 · 4 days ago

Chrony is the default already in some distros (RHEL and SLES that I know of), probably for this very reason.

politelemon · 4 days ago

gnabgib · 4 days ago

From NPR (22 points) https://news.ycombinator.com/item?id=46351105

Topgamer7 · 4 days ago

Out of curiosity, can anyone say the most impactful things they've needed incredibly accurate time for?

pezezin · 4 days ago

I work at a particle accelerator. We use White Rabbit (https://white-rabbit.web.cern.ch/) to synchronize some very sensitive devices, mostly the RF power systems and related data acquisition systems, down to nanosecond accuracy.

rcleveng · 4 days ago

Spanner

(See https://docs.cloud.google.com/spanner/docs/true-time-externa...)

0x457 · 4 days ago

Does it need to be this close to NIST, or just relative to each other? Because the latter one is solved by PTP.

Sanzig · 4 days ago

Spacecraft state vectors.

srean · 4 days ago

Not sure but synthetic massive aperture radio telescope would need syncing their local clocks.

I defer to the experts.

jasonwatkinspdx · 4 days ago

As far as I'm aware they just timestamp the sample streams based on a local gps backed atomic reference. Then when they get the data/tapes in one computing center they can just run a more sophisticated correlation entirely in software to smooth things out.

dyauspitr · 4 days ago

GPS

ted_dunning · 3 days ago

As a very coarse number, 5µs is 1500 meters of radio travel.

If (and it isn't very conceivable) GPS satellites were to get 5µs out of whack, we would be back to Loran-C levels of accuracy for navigation.

abeyer · 4 days ago

Telling people at the bar that I have an atomic clock at home.

rcleveng · 3 days ago

cesium or rubidium?

voidUpdate · 3 days ago

Maybe I missed something, but I don't quite understand the video title "NIST's NTP clock was microseconds from disaster". Is there some limit of drift before it's unrecoverable? Can't they just pull the correct time from the other campus if it gets too far off?

You'll never guess why. The answer might shock you.

nottorp · 3 days ago

I'll consider it clickbait...

... unless someone with real experience needing those tolerances chimes in and explains why it's true.

I'd have thought Jeff to be above clickbait, but here we are