NAT Is the Enemy of Low Power Devices

The problem(-s) described in the blog post are really acute for IoT in general, especially if you want your device to run on batteries or you have a limited data budget.

> Therefore, when you try to continue talking to the server over a previously established session, it will not recognize you. This means you’ll have to re-establish the session, which typically involves expensive cryptographic operations and sending a handful of messages back and forth before actually delivering the data you were interested in sending originally.

The blog post mentions Session IDs as a solution, but these require servers to be stateful, which can be challenging in most deployments. An alternative is Session Tickets (https://datatracker.ietf.org/doc/html/rfc5077), but these may cause issues when offloading networking to another device, such as a cellular modem, as their implementation may be non-standard or faulty.

incoming rant

These issues could be mitigated—or even solved—by using mature software platforms like Zephyr RTOS and their native networking stacks, which receive more frequent updates than traditional “all-in-one-package” SoCs. However, many corporations choose to save a few dollars on hardware at the expense of incurring thousands in software engineering costs and bug-hunting. It is often seen as more cost-effective to purchase a cellular modem with an internal MCU rather than a separate cellular modem and a host MCU to run the networking stack. It is one of the many reasons why many IoT devices are utter garbage.

hasheddan · 7 months ago

Author here -- thanks for engaging in the discussion! You won't find any pushback from us on using Zephyr -- we are contributors, the firmware example in the post is using it (or Nordic's NCS distribution of it), and we offer free Zephyr training [0] every month :)

[0]: https://training.golioth.io/

bsder · 7 months ago

> It is often seen as more cost-effective to purchase a cellular modem with an internal MCU rather than a separate cellular modem and a host MCU to run the networking stack.

This one isn't just cost--the compliance restrictions that the cellular carriers place on you are idiotic.

The big one we bumped into is "must allow allow carrier initiated firmware updates with no restrictions on scheduling" which translates to "the carrier will eat your battery often and without warning".

In addition, many IoT devices may not call home more than once every couple of months. And the carrier will happily roll out tower firmware that will kill those being able to call home.

If I use a module with my own firmware, the modem folks will simply point fingers at me. If I use a module with integrated SoC and firmware and it gets updeathed, I get the "joy" of yelling at the cellular module manufacturer.

(I had the wonderful experience of watching a cellular IoT project go gradually dead over 3 days as the carrier rolled out an "upgrade" across its system. We got a front seat as the module manufacturer was screaming bloody murder at the carrier who simply did "We Don't Care. We Don't Have To. We're the Phone Company.")

vv_ · 7 months ago

Your experience is bizzare. Was it Verizon (or AT&T/T-Mobile) and did you use a Cat-1bis/Cat-M/NB-IoT or higher class modem?

Normally US carriers require Firmware-Over-The-Air (FOTA) capability for the modem firmware. This is not the case for deployments outside of the United States, to my knowledge.

It would be interesting to hear more about your story!

immibis · 7 months ago

Sounds like you need a more watertight contract and more lawyers.

shipp02 · 7 months ago

What makes a separate cellular modem better than an internal cellular modem? Is it because software updates are available for the separate modems?

I am evaluating some Nordic semiconductor parts for a project. They seem to have an internal modem but Nordic uses zephyr. Any thoughts?

gwbas1c · 7 months ago

> What makes a separate cellular modem better than an internal cellular modem?

The US 3G shutdown required some rather expensive and unexpected upgrades. Vendors signed long-term contracts with 3G providers, and then "someone" was on the hook, to replace something, when the 3G vendors terminated their contracts prematurely.

The deeper the modem was integrated into a product, the harder it was to change. The shallower the modem was integrated, the easier it was to change.

For example:

One of my cars just lost its internet connectivity, and the automaker never offered any way to fix it. (I didn't care, I only used Android Auto in that car.)

My employer (IOT) sent out free chips to our customers. They had to arrange someone to go do a site visit and swap a chip while on a phone call with us. We're small and early enough that it wasn't a big deal.

My solar panel vendor wanted me to $pend big buck$ on a new smart meter and refused to honor their warranty. I told them to run a cable to the ethernet port in my meter.

vv_ · 7 months ago

> What makes a separate cellular modem better than an internal cellular modem?

When using a separate cellular modem, you can connect it to your MCU via either a USB or UART interface. In IoT applications, UART is the more common choice. Then you can multiplex the interface with CMUX, allowing simultaneous use of AT commands and PPP.

With frameworks like lwIP or Zephyr supporting PPP, you can get your network running really quickly and have full control over the networking and crypto stacks. Using Zephyr you get POSIX-compliant sockets which allows you to leverage existing protocol implementations.

In contrast, using a SoC's often requires reintegrating the entire network stack, as they typically do not support POSIX sockets. I've worked on SoC's that only support TLS 1.1 and the vendor refused to upgrade it, as it would require them to re-certify their modem. Switching to a different SoC can mean repeating this process from scratch as different vendors implement their own solutions (sometimes even the same vendor will have different implementation(-s) for different modems).

> I am evaluating some Nordic semiconductor parts for a project. They seem to have an internal modem but Nordic uses zephyr. Any thoughts?

It runs on Zephyr RTOS and can be built as a standalone modem (https://docs.nordicsemi.com/bundle/ncs-latest/page/nrf/appli...). You can either use Zephyr's native networking stack or offload it to the modem while retaining the same API. This means you get access to all of Zephyr’s protocol implementations without additional effort. The design makes it feel as though you have a completely independent MCU connected to an external modem.

That said, it does have quirks and bugs, particularly when offloading networking. It also has relatively limited resources, with only 256 kB of RAM and 1 MB of flash storage.

Overall, it is the best SoC I’ve worked with, but it is still an SoC. Whether it suits your project depends on your specific use case. If you plan to integrate additional radios in the future (e.g., UWB, BLE, Wi-Fi), I’d recommend using a separate MCU if your budget allows. This will provide significantly more flexibility. Otherwise it is definitely one of the better SoC's currently in the market, to my knowledge.

PS. It only supports Cat-M (& NB-IoT but I'm going to skip over it intentionally!) which is not globally supported, so you should make sure the region you want to deploy in supports that technology.

baobun · 7 months ago

Security.

On one hand, licensing requirements and regulation often mean that modems are locked down in terms of firmware updates, reference documentation, source, and capabilities. This often translates into a larger "black box" area, and one embedded inside your SoC instead of physically separate and connected over a serial bus.

On the other, on-chip modems often (not sure about those Nordics) have DMA.

The combination of those two is scary.

bluGill · 7 months ago

cellular modems go nonfunctional/obsolete much faster than other systems. 3g is almost entirely gone worldwide. 4g is still around, but providers are already reducing how much their towers dedicate to it. The standards body is working on 6g, who knows when that will come and push out older stuff.

If the case of my car I don't care - I have never found a use for the cellular connectivity it has (if any). However there are lots of other devices where the cellular connectivity is important and users will want to upgrade the modem to keep it working. If cellular connectivity is just a marketing bullet point nobody cares about then integrated is good enough, but if your device really isn't useful without the modem make that modem replaceable for somewhat cheap.

jasonjayr · 7 months ago

> The blog post mentions Session IDs as a solution, but these require servers to be stateful, which can be challenging in most deployments.

Doesn't this just move the 'state' into the operating system, or networking layer, in the form of an active TCP connection?

vv_ · 7 months ago

You can use Session Ticket's w/ UDP too.

Nevertheless, with modern cloud providers moving the state into the OS/Networking layer is still easier to scale. You don't need to write your own services to handle it.

oakwhiz · 7 months ago

Is DTLS a workaround for the session issue? Haven't had much experience with it myself but it does cut down some of the statefulness.

vv_ · 7 months ago

You don’t want to redo the handshake (which, as far as I know, is identical to TLS) every time you send a packet using DTLS. Therefore, you still need to retain state information. In my opinion, using DTLS (UDP) is considerably more nuanced than using TLS (TCP) in an embedded IoT context.

hasheddan · 7 months ago

The post details the use of CoAP over DTLS, employing Connection IDs.

Dead Comment

I have worked on a device with this exact same "send a tiny sensor reading every 30 minutes" use case, and this has not been my experience at all. We can run an STM32 and a few sensors at single digit microamps, add an LCD display and a few other niceties and it's one or two dozen. Simply turning on a modem takes hundreds of microamps, if not milliamps. In my experience it's always been better for power consumption to completely shut down the modem and start from scratch each time [1] - which means you're paying to start a new session every time anyway. Now I'll agree it's still inefficient to start up a full TLS session, a protocol like in the post will have it's uses, but I wouldn't blame it on NAT.

[1] Doing this of course kills any chance at server-to-device comms, you can only ever apply changes when the device next checks in. This does cause us complaints from time to time, especially for those with longer intervals.

vv_ · 7 months ago

Power Saving Mode (PSM), a power-saving mechanism in LTE, was specifically designed to address such issues. It allows the device to inform the eNB (base station) that it will be offline for a certain period while ensuring it periodically wakes up to perform a Tracking Area Update (TAU), preventing the loss of registration. This concept is similar to Session Tickets or Session IDs in (D)TLS—or at least, that’s how I like to think about it. However, there are no guarantees that the operator will support this feature or that they will support the report-in period that you want!

Maintaining an active session for communication between the endpoint and the edge device is highly power-intensive. Even with (e)DRX, the average power consumption remains significantly higher than in sleep mode. Moreover, the vast majority of devices do not need to frequently ping a management server, as configuration and firmware updates are typically rare in most IoT deployments.

hasheddan · 7 months ago

Great pointer! My sibling post in this thread references a few other blog entries where we have detailed using eDRX and similar low power modes alongside Connection IDs. I agree that many devices don't need to be immediately responsive to cloud to device communication, and checking in for firmware updates on the order of days is acceptable in many cases.

One way to get around this in cases where devices need to be fairly responsive to cloud to device communication (on the order of minutes) but in practice infrequently receive updates is using something like eDRX with long sleep periods alongside SMS. The cloud service will not be able to talk to the device directly after the NAT entry is evicted (typically a few minutes), but it can use SMS to notify the device that the server has new information for it. On the next eDRX check in, the SMS message will be present, then the device can ping the server, and if using Connection IDs, can pull down the new data without having to establish a new session.

lxgr · 7 months ago

802.11 supports the same thing. A STA (client) can tell an AP that it'll be going away for some time, and the AP will queue all traffic for the STA until it actively reports back. Broadcast traffic can also be synchronized to particular intervals (but low power devices are usually not interested in that anyway for efficiency reasons).

hasheddan · 7 months ago

Author of this post here -- thanks for sharing your experience! One thing I'll agree with immediately is that if you can afford to power down hardware that is almost always going to be your best option (see a previous post on this topic [0]). I believe the NAT post also calls this out, though I believe I could have gone further to disambiguate "sleeping" and "turning off":

> This doesn’t solve the issue of cloud to device traffic being dropped after NAT timeout (check back for another post on that topic), but for many low power use cases, being able to sleep for an extended period of time is more important than being able to immediately push data to devices.

(edit: there was originally an unfortunate typo here where the paragraph read "less important" rather than "more important")

Depending on the device and the server, powering down the modem does not necessarily mean that a session has to be started from scratch when it is powered on again. In fact, this is one of the benefits of the DTLS Connection ID strategy. A cellular device, for example, could wake up the next time in a completely different location, connect to a new base station, be assigned a fresh IP address, and continue communication with the server without having to perform a full handshake.

In reality, there is a spectrum of low power options with modems. We have written about many of them, including a post [1] that followed this one and describes using extended discontinuous reception (eDRX) [2] with DTLS Connection IDs and analyzing power consumption.

[0]: https://blog.golioth.io/power-optimization-recommendations/ [1]: https://blog.golioth.io/turn-off-subsystems-remotely-to-redu... [2]: https://www.everythingrf.com/community/what-is-edrx

dent9876543 · 7 months ago

That NAT is a problem presumes that we actually want our IoT devices reaching out to the out-of-intranet zone.

NAT gets the blame, and the intranet as a concept is generally a big corp term.

But I prefer my IoT devices not to need to reach out of my network. For me, NAT is an unwitting ally in the fight against such nonsense.

vollbrecht · 7 months ago

The mere existence of Tailscale should give a hint that NAT is only a speedbump and not any protection whatsoever. It protects you against nothing. Every method that Tailscale uses to traverse NAT can be in isolation used by any other piece of software. For more info about that you can read the following article.

https://tailscale.com/blog/how-nat-traversal-works

What people really want is a firewall, and since NAT acts as a firewall, they confuse it with that.

My university has a public IP for every computer, but you could still only connect to the servers, not random computers, from the outside. Because they had a firewall.

phendrenad2 · 7 months ago

"not any protection whatsoever" is way too strong a statement. NAT does raise the bar to exploiting a random smart lightbulb in your house significantly higher.

kccqzy · 7 months ago

The big distinction is that for Tailscale both endpoints know they want to talk to each other, and that both have Internet access. That's not the usual case firewalls are designed for.

Tailscale doesn't strictly need NAT traversal. They can run only their DERP servers and still continue to work. If your firewall tries to block two devices from communicating and yet allows both devices internet access, you have already lost.

Sounds like you like the idea of a stateful firewall, and good news: There are stateful firewalls for IPv6!

They have all the upsides of NATs (i.e. an option to block inbound connections by default), with none of the downsides (they preserve port numbers, can be implemented statelessly, they greatly simplify cooperative firewall traversal, you can allow inbound connections for some hosts).

Spivak · 7 months ago

I found it weird that IPv6 folks are so against NAT as a cultural thing when it works perfectly well on IPv6. They're not fundamentally opposed.

I could have all of my servers in public subnets and give them all public IP addresses, but I still prefer to put everything I can in private. Not only does the firewall not allow traffic in, but you can't even route to them. It now becomes really hard to accidentally grant more access than you intended.

I would hazard that most devices on there internet are in the boat of want to talk to the internet but not be reachable on it.

Deleted Comment

rcxdude · 7 months ago

If you don't want that, then complain about a lacking a configuration as such and configure your firewall so that that they can't. But don't cheer on something that's breaking functionality that others might want (especially if it doesn't actually achieve your own goals reliably).

Oh, I do that too.

But to your point about not cheering on NAT, well I will because I see NAT as useful tool.

It is not an opinion well aligned with the preferences of the IETF. But the purist model of transparent end-to-end networking has never sat well with me. It’s just not a thing we want.

A telematics tracker in a vehicle that logistic companies use (e.g. Amazon, FedEx) is also considered as an IoT device. I don't believe that the author is talking about Smart Home appliances exclusively.

kstrauser · 7 months ago

What NAT are you using that doesn’t have a firewall? I haven’t personally used one of those since the ‘90s.

kazinator · 7 months ago

The first NAT I used in the middle 90's was IP Masquerading in the Linux kernel, by Pauline Middelink. That had a firewall.

I agree until I discover I'm doing something where I want to access/change that device. It is really nice when I'm returning home early that I can change my thermostat out of vacation mode. I've often wished I had a way to tell if I left a door unlocked.

Security and privacy is of course critical to all this, but the concept of internet itself is not wrong.

craftkiller · 7 months ago

That's what a VPN is for. Every router I've had in the past decade has had support for running a VPN server so you can have one running 24/7 without any additional hardware. Even my retired elderly parents run a VPN server on their home router.

procaryote · 7 months ago

Especially if the data is unencrypted and only authenticated by source ip and a long lived token-like thing

raggi · 7 months ago

What I want from platforms, and I fought for at one time in Google with no success: a platform API that provides applications a way to schedule packets when the radio turns on.

The mobile platforms in particular continue to assume that you can live with their high level HTTP stuff, and it's just not good enough. The non-mobile platforms largely don't even approach the problem at all.

freedomben · 7 months ago

Indeed, that sounds like an obvious feature. Hard to believe it hasn't been implemented! I'd love to have that feature on Linux desktop/laptops. I think you could make lots of applications behave a whole lot better.

the arguments from folks on the "architecture review boards" was that multiple connections are always bad and that developers can't be trusted. i'm willing to accept that they did get beat up quite a bit over power at various points when at times applications were a big part of the problem. That said this is also a gross misunderstanding of the problem and overall solution space, as well as very much gatekeeping.

apple1417 · 7 months ago

clearint · 7 months ago

This article should clarify at the start whether TCP or UDP is under consideration. NAT idle timeouts for both are typically very different. RFC 5382 [0] specifies no less than 2 hours and 4 minutes for TCP. RFC 4787 [1] specifies no less than 2 minutes for UDP. Towards the end of the article it becomes clear that it's UDP.

The example diagrams also incorrectly show port numbers exceeding 65535. The port fields in TCP and UDP headers are 16 bits [2].

[0]: https://www.rfc-editor.org/rfc/rfc5382 [1]: https://www.rfc-editor.org/rfc/rfc4787 [2]: https://textbook.cs161.org/network/transport.html

apitman · 7 months ago

That's interesting. Do NATs in the wild tend to be spec-compliant with their timeouts?

justahuman74 · 7 months ago

Interestingly, IPv6 is not listed as the solution

LegionMammal978 · 7 months ago

An IPv6 router with a stateful firewall blocking incoming connections could have just the same issues with timeouts, I'd imagine. Switching to IPv6 doesn't just mean that anyone can make a P2P connection to anyone else (even STUN needs a third-party server to coordinate the two peers).

(D)TLS session resumption (I'm not sure if their "Connection IDs" are that or something similar) seems like the most foolproof solution to this scenario, assuming that the remote host can support it.

namibj · 7 months ago

But it'd be trivial to tell it to free the device from it, unlike with NAT, where you pretty much have to expire sessions to not run out of memory.

MerManMaid · 7 months ago

>An IPv6 router with a stateful firewall blocking incoming connections could have just the same issues with timeouts, I'd imagine.

You'd be surprised... PCP (Port Control Protocol) implemented by large vendors such as Cisco and Apple are able to punch through a firewall for up to 24 hours in a single session.

https://github.com/Self-Hosting-Group/wiki/wiki/Port-Mapping...

api · 7 months ago

The weird fear around it is crazy. It’s mostly just bigger IPs and it makes so much complexity and ugly hacks like NAT go away.

withinboredom · 7 months ago

and also so many other things. ARP goes away, dhcp goes away -- yet people reinvented dhcp anyway and did it wrong (IHMO).

I'm of the opinion that IPV6 changed some small things just enough to get people to have to learn new stuff -- and also forgot that NAT is not a firewall, somewhere along the way.

userbinator · 7 months ago

IPv6 is its own complexity.

gnabgib · 7 months ago

It was 3 months ago: https://news.ycombinator.com/item?id=41884515

Few cellular modems commonly used in IoT support IPv6, and not all mobile network operators provide an IPv6 address. Since cellular connectivity plays a major role in the industry, IPv6 cannot be used as a blanket solution to this problem.

altairprime · 7 months ago

Either it would or it wouldn’t help in most cases, but the absence of consideration of it at all weakens the article’s arguments from a carrier perspective. IPv6 adoption was at 90% by US mobile carriers a couple years ago, and the US is not known for its telco infrastructure investment; so, while using IPv6 may not be a uniform cure for their issues, the article’s total focus on legacy IPv4 NAT issues is in stark contrast to its availability carrier-side in one of the weakest examples available. China regulates that IPv6 be supported and enabled by default on all hardware sold for use in-country since last year, and telcos have six months left until a first-stage IPv4 new-hardware prohibition goes into effect later this year, so the assumption that most cellular modems don’t support IPv6 seems unlikely as well given their regulatory climate. This deserves more research or at least an explanation of why such was not done for the initial release of the paper.

wavesound · 7 months ago

On modern firewalls/routers, NAT is only one cooks in the network kitchen raining on this author's parade. Stateful Packet Inspection has timeouts too!

But for TCP, you don't even need to be stateful just to prevent inbound connections, which is a huge win over NATs.

UDP still needs state tracking, unfortunately.

It seems brave to let an IoT device talk to someone over an unencrypted wan though. They're often of pretty varying software quality and rarely updated.

If you really want IoT wifi devices, put them on a separate wifi, and only let them talk to a local device that you can keep up to date. Assume they're vulnerable to local attacks over wifi and act accordingly, e.g. don't give the IoT wifi access to your other devices beyond to that controller, and definitely not to the wider internet.

If they're closed source, assume they're already compromised from the factory

The issue in IoT is that most people expect to be able to control their IoT (e.g. Smart Home) devices from outside of their network. This requires you to have a central server these devices communicate with or a similar deployment on-premises with a public IP address.

I've always wondered how it is economically feasible to run these central services without a monthly subscription. If you stop selling devices you'll go under fairly quickly.

leptons · 7 months ago

I have all my "Tuya" IoT devices on a separate network, isolated from my personal intranet, and the cloud servers these devices talk to are in China. I pay nothing to "Tuya" for their cloud service, never have. If the "Tuya" service ever goes down, my home automation is also down, and that sucks. I'm trying to replace it all with "Tasmota" devices now, but it's not quite as easy or as cheap to do.

Yeah, sadly, people care about convenience and not security. So you get things like this:

https://www.malwarebytes.com/blog/news/2024/04/ring-agrees-t...

and somehow they are still in business, and popular.

If you do care about security, keeping your home-automation within your own control is probably the only sane path. Homeassistant and similar open source things like openhab are pretty good if a bit fiddly, and a wireguard vpn like tailscale a fairly practical way to access it when away from home.