Yeah, so now you're basically running a heavy instance in order to get the network throughput and the RAM, but not really using that much CPU when you could probably handle the encode with the available headroom. Although the article lists TLS handshakes as being a significant source of CPU usage, I must be missing something because I don't see how that is anywhere near the top of the constraints of a system like this.
Regardless, I enjoyed the article and I appreciate that people are still finding ways to build systems tailored to their workflows.
The scalable in-memory solution took quite a bit of testing to get right. Building this on the early side of the business when the requirements are not well known can be a giant budget and time tar pit. Plus without customers it’s hard to confidently test at scale.
Using S3 for an MVP and marking this component as “done” seems like the right solution, regardless of the serverless paradigm.
Agreed, but the first design principle is "eliminate complexity at the design level." MVPs and what they represent (a failure to design) are an albatross
They didn’t actually do what the headline claims. They made a memory cache which sits in front of S3 for the happy path. Cool but not nearly rolling your own S3
True, but, if i'm following, the memory cache has to clone S3 API for existing clients that can't be easily altered. Regardless of what you title it, it's an interesting project report!
My first thought is, why bother with local storage if your turnaround on video chunks is 2 seconds? What's disk going to add besides a little bit more resiliency in that 2 second time frame? This at the cost of having slower pod startups given you have to mount the PVC, and a small performance hit of writing to a filesystem instead of memory.
All moot anyway given that the cameras/proxy allegedly has retries built-in, but interested to hear your thoughts.
In HN style, I'm going to diverge from the content and rant about the company:
Nanit needs this storage because they run cloud based baby cameras. Every Nanit user is uploading video and audio of their home/baby
live to Nanit without any E2EE. It's a hot mic sending anything you say near it to the cloud.
Their hardware essentially requires a subscription to use, even though it costs $200/camera. You must spend an additional $200 on a Nanit floor stand if you want sleep tracking. This is purely a software limitation since there's plenty of other ways to get an overhead camera mount. (I'm curious how they even detect if you're using the stand since it's just a USB-C cable. Maybe etags?)
Of course Nanit is a popular and successful product that many parents swear by. It just pains me to see cloud based in-home audio/video storage being so normalized. Self-hosted video isn't that hard but no one makes a baby-monitor centric solution. I'm sure the cloud based video storage model will continue to be popular because it's easy, but also because it helps justifies a recurring subscription.
edit: just noticed an irony in my comment. I'm ranting about Nanit locking users into their 3rd party cloud video storage, and the article is about Nanit's engineering team moving off a 3rd party (S3) and self-hosting their own storage. Props to them for getting off S3.
As a happy customer, I picked nanit because it actually worked. We didn’t even use the “smart” features, but “you can turn on the app from anywhere you happen to be and expect the video feed to work” is unfortunately a bar that no competitor I tried could meet. The others were mostly made by non-software companies with outsourced apps that worked maybe 50% of the time.
I wish we could have local-first and e2ee consumer software for this sort of thing, but given the choice of that or actually usable software, I am going to pick the latter.
I self host my "baby monitor" with UniFi Protect on UCG-Max and a G6 Instant wireless camera. It's more work to setup, but pretty easy for a techie. It has the "turn on the app anywhere and it works" feature, and with a 2TB SSD I get a month+ of video storage. Because storage is local, it doesn't need to compress the video and I get a super clear 4K image. And I use Homebridge to expose the camera over Apple HomeKit which is a convenient and a more user friendly way to access it. And HomeKit also gives you out-of-home access with a hub. I love my setup, but I couldn't in good conscience recommend it to a non-techie friend, especially if they're sleep deprived from their infant.
But I do miss the lack of any baby-specific features like sleep tracking. It has support for crying detection, but that's it.
What competitor have you actually tried? My girlfriend’s parents have a few cheap TPlink solar powered CCTV and they work flawlessly since setup. I used to jerryrig an Android phone for Alfred and that too worked well.
My £15 TP-Link camera that we use as a baby monitor works 100% of the time. I can use it completely locally too with nothing sent to their servers at all, or use it through the internet if I want to. Got 4+ years of continuous use and counting, with zero issues.
I have 2 free-roaming rabbits in one room of the house, we've been using Eufy camera to access live feed and found no issues with it, definitely would buy again. And the SD card recording allows us to seek a couple days into the past - it is pretty fun to watch the rabbits scramble to the automatic feeder at the set time.
> you can turn on the app from anywhere you happen to be and expect the video feed to work
if i'm understanding "anywhere you happen to be" right: Real question -- I'm not a parent. What is your use case for wanting to monitor your baby remotely from a different location than your baby? Obviously someone is with them at the house or location with the baby! You don't trust em? Or just like seeing/hearing your baby when you are out?
I see why a baby monitor in general is helpful so you can be in another room in the house and still keep an eye/ear on baby, but obv someone has to actually be in the location with the baby! (and the monitor at least needs to be on the wifi, right? So the monitor is in a place you have network access to, yes?)
The vtech camera is working well enough for me for what it’s worth. But any such app solution generally implies transfer through the company’s servers.
> Every Nanit user is uploading video and audio of their home/baby live to Nanit without any E2EE. It's a hot mic sending anything you say near it to the cloud.
Your way of phrasing it makes it sound like it would be fine to upload the video if it were end-to-end-encrypted. I think this is worth clarifying (since many don’t really understand the E2EE trade-off): E2EE is for smart clients that do all the processing, plus dumb servers that are only used for blind routing and storage. In this instance, it sounds like Nanit aren’t doing any routing or (persistent) storage: the sole purpose of the upload is offloading processing to the cloud. Given that, you can have transport encryption (typically TLS), but end-to-end encryption is not possible.
If you wanted the same functionality with end-to-end encryption, you’d need to do the video analysis locally, and upload the results, instead of uploading the entire video. This would presumably require more powerful hardware, or some way of offloading that to a nominated computer or phone.
Exactly. There is no video analysis if the video is encrypted and they cannot decrypt it. If there is E2EE and you expect them to do the video analysis, they need to be able to decrypt the video. Alternatively, you do it locally, but then why bother uploading anything at all, encrypted or not? So ultimately E2EE would not help here at all.
It's true. But nanit only gives you things like sleep insights if you buy their $200 stand and pay for a bigger subscription. Many users aren't making use of this. They do provide motion alerts, but those could happen on device.
Apple has done some interesting this with privacy-centric cloud processing. Might be some way to eventually get the benefits of cloud based detections without revealing your video.
also my other gripe is they also store audio. Which personally I feel like is even more sensitive. Wish their was an option to allow live audio listening but not store any audio in the cloud.
My parents bought a camcorder in 1995 and "self-hosted" the video just fine. But you're right it shouldn't even be something consumers should consider, because it should be the default and should be easy. You can get low power SSD-powered NAS devices now so hopefully this will change soon.
I meant more that in the abstract technical sense it's not that hard of a problem, but I agree that given the options available to consumers it is hard.
If UniFi Protect was re-skinned and had a bunch of its security camera complexity removed and optimized for the baby-camera use case it'd be normal consumer level friendly.
We've used an offline Infant Optics baby camera for three kids and have never wished for any of the smart features that online cameras offer. You really just want to know whether they are asleep and when they are crying. I just don't see a good use case for recording all that video for most kids. (I'm sure there are special needs situations where it is helpful)
I actually don’t really get the point of a cloud service for this. Aren’t babies usually left in situations where there’s at least one trusted adult locally available?
The "point" of the cloud service is that it's sadly usually the easiest way to create a [on-premise-device]<->[user's smartphone/laptop] for B2C/residential deployments of appliances (like the baby monitor in this case).
It's much easier to create a device<->internet connection + a smartphone<->internet connection that it is to deal with the myriad of issues that occur if you try to do local device<->smartphone connections in networks with unknown topology and quirks (e.g. ISP provider being overly conservative in their firewall presets). If that in general would be a more trivial issue you would see less cloud services.
(You would probably still a similar amount of cloud services due the increased monetization options, but this would level the playing field for local-only options.)
Yes, a parent is always around. The part you might be missing is that the parent doesn’t want to have to limit their movements to areas where WiFi coverage is good.
Many cheap baby monitors are WiFi connected. You have to haul the video unit around and keep it live to hear when it cuts out, then move back toward where WiFi coverage was good.
This won’t seem like a big deal to someone who lives in an apartment or who has a house with 7 Ubiquiti APs covering everywhere inside and out, but it is a big deal to a parent who has a single WiFi router and wants to be able to do something like pull weeds in the yard, have a conversation with the neighbor, or go to a detached garage and work on a project without having to worry about their exact WiFi coverage at every moment to check on the baby.
It’s an over engineered solution to a, relatively, simple problem of access long the device on the local network. This used to be a hard problem to solve but in 2025 I’d question why they’re going through the headache of all this cloud stuff when they could just build a quality device that runs locally with a simple base station that triggers alerts. They only hosting they really need is something to send alerts to an app.
Leading cause of death under one year is sudden infant death syndrome which happens mostly at nap time, situations where the adult may need rest, self care or housekeeping. You cannot fathomly watch an infant 24/7 especially if one parent is working and there's minimal support sistem (living far from relative, working grandparents etc)
> You must spend an additional $200 on a Nanit floor stand if you want sleep tracking. This is purely a software limitation since there's plenty of other ways to get an overhead camera mount. (I'm curious how they even detect if you're using the stand since it's just a USB-C cable. Maybe etags?)
I made a simple wood mount and painted it to match the crib. It worked well. There was no software enforcement requiring you to buy their mount at the time. Has this changed recently?
> Self-hosted video isn't that hard but no one makes a baby-monitor centric solution.
It's not that easy. The only usecase that is actually really fucking easy is when both the camera and the device trying to access it is in the same network - broadcasts for discovery, that's it. Although I've seen people turn on "client isolation" in their wifi back when I did computer repairs, so it's not a given that this works!
But as soon as that assumption goes out the window - and if it's just you going into the garden to check on some weeds where the wifi doesn't reach - the task suddenly becomes so, so much harder:
- the "easiest" case is an ISP that hands your wifi router a globally routed IPv4 address, allows UPnP to be configured, and the user has UPnP configured. All that the camera has to do here is to request a port opening and that's it. Still, you as manufacturer need a server to store a mapping between user, IP address and port. (And you need to hope that the user's mobile device or their ISP doesn't have a nasty firewall blocking non-standard ports)
- No UPnP? Now you as manufacturer either need some STUN/TURN server or explain to the user how to manually enable port forwarding.
- Worst case: the user's ISP either has IPv6 only, CGNAT, double/triple/... NAT or similar shit in play because they don't have enough IP addresses to supply to their customer base. That's pretty much impossible even with STUN/TURN, sooo many ways for things to go wrong along the path.
- even a theoretical fully IPv6 world where everyone has globally routed IPv6 addresses everywhere and all ISPs have their routing working still wouldn't solve the issue... because consumer ISP routers enable a firewall on IPv6 to avoid stuff like "online game cheaters 0wning their opponents running an outdated version of their game".
The sad reality is, running a cloud service is the only actually pain-free way for any given smart Thing to work as the customer expects it.
And on top of that, a NAS capable of storing video costs about 300-ish bucks with a HDD capable of running 24/7 and eats about 10-ish watts of electricity, which is quite the cost factor on its own.
Sure, the "nerd population" here on HN can rig something up that works in a matter of a few days, including some rudimentary AI to spot if the baby managed to escape the crib. But the 99% of people out there will crash at the "please open your router's config page to allow UDP port 65535 passthrough" step, if only because they forgot the password that they set five years ago.
> But as soon as that assumption goes out the window - and if it's just you going into the garden to check on some weeds where the wifi doesn't reach - the task suddenly becomes so, so much harder:
Exactly. There are a lot of comments in this thread from people who are either non-parents or who haven’t lived in a situation where they didn’t have perfect WiFi coverage of their entire living area.
Being able to visit the neighbors or go out in the yard without worrying about missing baby monitor events is a huge advantage that many parents will pay for.
I think this entire comment section is a prime example of HN not understanding non-technical audiences.
> Self-hosted video isn't that hard but no one makes a baby-monitor centric solution
It sounds like they're not hosting it though. They are processing it, and storing it temporarily while it's queued.
A fully self hosted AI powered baby monitor that accurately detects sleep states and danger situations would be incredibly expensive today. Maybe not in a few years though.
You'll never convince me that the term "cloud" came into existence for any purpose other than to separate itself from "the internet". That way, normal people who were very steadfast for years about not putting personal information on the internet would start putting their personal information in the "cloud".
We just used ipcams with our kids. Now with ubiquity it is dead simple to setup also storage for it. I think synology supports anything that emits rtsp.
Baby monitors around here -Alecto is a popular brand - cost twice as much and have only half the capabilities.
They don't provide a display, so I put a Raspberry Pi, a display, and an audio hat in an enclosure. It plays an rtsp stream from the camera at startup and works pretty well.
+1 for Unifi. They’ve added “baby crying” to the audio monitoring for triggering alerts. Everything is kept local on your LAN. Can access remotely via an app if you wish, but that’s simply accessing the device on your LAN so no dumping all your footage into some random “cloud.” Stuff just works and requires no subscription so all your money goes towards better quality hardware.
This feels like they were using the wrong architecture from the start, and are now papering over that problem with additional layers of cache.
The only practical reason to put a video in S3 for an average of 2 seconds is to provide additional redundancy, and replacing that with a cache removes most of the redundancy.
Feels like if you uploaded these to an actual server, the server could process them on upload, and you could eliminate S3, the queue in SQS, and the lambdas all in one fell swoop...
What a great and helpful write-up, love when people share things like this so I can learn.
It's less about whether I would have a use case for this exact thing (or whether or not it was appropriate for this use case, i dunno, prob don't have enough context to know).
More just seeing what is possible, how they thought about it and analyzed it, what they found unexpected and how, etc. I learned a lot!
Exactly, my first thought was "Why in earth would anyone think that S3 was the right service to store millions of tiny ephemeral files?" and now it seems they have invented their own in-memory store instead of just using something like Redis. I also wonder what happens if their DIY thingy crashes, are the videos lost? Why not send to Kinesis or SQS in the first place?
From the article, individual video segments were 2-6 MB in size and SQS and Kinesis have a 1MB limit for individual records so they couldn’t have used either service directly. At least not without breaking their segments into even smaller chunks.
That's a great point. Sometimes we look for architecture or technology solutions for a problem that could be easily solved at the sales level by negotiating a PPA (Private Pricing Addendum) with AWS.
I suspect it's a massive amount, as S3 is one of the cheaper services. As we evaluate moving all of our compute off of AWS, S3 (and SQS) are probably services we'll retain because they are still amazing values.
This may be an obvious point, but I didn't see it mentioned in the (otherwise excellent) article: I would have been interested in the cost saving in just implementing the 'delete on read' with S3 that they ended up using with the home-made in-memory cache solution. I can't see this on the S3 billing page, but if the usage is billed per-second, as with some other AWS services, then the savings may be significant.
The solution they document also matches the S3 'reduced redundancy' storage option, so I hope they had this enabled from day one.
Sticking something with 2 second lifespan on disk to shoehorn it into aws serverless paradigm created problems and cost out of thin air here
Good solution moving at least partially to a in memory solution though
Regardless, I enjoyed the article and I appreciate that people are still finding ways to build systems tailored to their workflows.
Using S3 for an MVP and marking this component as “done” seems like the right solution, regardless of the serverless paradigm.
My first thought is, why bother with local storage if your turnaround on video chunks is 2 seconds? What's disk going to add besides a little bit more resiliency in that 2 second time frame? This at the cost of having slower pod startups given you have to mount the PVC, and a small performance hit of writing to a filesystem instead of memory.
All moot anyway given that the cameras/proxy allegedly has retries built-in, but interested to hear your thoughts.
Nanit needs this storage because they run cloud based baby cameras. Every Nanit user is uploading video and audio of their home/baby live to Nanit without any E2EE. It's a hot mic sending anything you say near it to the cloud.
Their hardware essentially requires a subscription to use, even though it costs $200/camera. You must spend an additional $200 on a Nanit floor stand if you want sleep tracking. This is purely a software limitation since there's plenty of other ways to get an overhead camera mount. (I'm curious how they even detect if you're using the stand since it's just a USB-C cable. Maybe etags?)
Of course Nanit is a popular and successful product that many parents swear by. It just pains me to see cloud based in-home audio/video storage being so normalized. Self-hosted video isn't that hard but no one makes a baby-monitor centric solution. I'm sure the cloud based video storage model will continue to be popular because it's easy, but also because it helps justifies a recurring subscription.
edit: just noticed an irony in my comment. I'm ranting about Nanit locking users into their 3rd party cloud video storage, and the article is about Nanit's engineering team moving off a 3rd party (S3) and self-hosting their own storage. Props to them for getting off S3.
I wish we could have local-first and e2ee consumer software for this sort of thing, but given the choice of that or actually usable software, I am going to pick the latter.
But I do miss the lack of any baby-specific features like sleep tracking. It has support for crying detection, but that's it.
My impression is live feed is a solved problem.
if i'm understanding "anywhere you happen to be" right: Real question -- I'm not a parent. What is your use case for wanting to monitor your baby remotely from a different location than your baby? Obviously someone is with them at the house or location with the baby! You don't trust em? Or just like seeing/hearing your baby when you are out?
I see why a baby monitor in general is helpful so you can be in another room in the house and still keep an eye/ear on baby, but obv someone has to actually be in the location with the baby! (and the monitor at least needs to be on the wifi, right? So the monitor is in a place you have network access to, yes?)
Dead Comment
Your way of phrasing it makes it sound like it would be fine to upload the video if it were end-to-end-encrypted. I think this is worth clarifying (since many don’t really understand the E2EE trade-off): E2EE is for smart clients that do all the processing, plus dumb servers that are only used for blind routing and storage. In this instance, it sounds like Nanit aren’t doing any routing or (persistent) storage: the sole purpose of the upload is offloading processing to the cloud. Given that, you can have transport encryption (typically TLS), but end-to-end encryption is not possible.
If you wanted the same functionality with end-to-end encryption, you’d need to do the video analysis locally, and upload the results, instead of uploading the entire video. This would presumably require more powerful hardware, or some way of offloading that to a nominated computer or phone.
Apple has done some interesting this with privacy-centric cloud processing. Might be some way to eventually get the benefits of cloud based detections without revealing your video.
also my other gripe is they also store audio. Which personally I feel like is even more sensitive. Wish their was an option to allow live audio listening but not store any audio in the cloud.
In the case of this product, there is only one client (and a server).
E2EE bills then down to having the traffic encrypted like you have with a https website.
Self-hosting video is not something the typical user of a baby monitor would ever even consider.
From the product description though it sounds like sleep analysis is what you're paying for, which they do on servers analyzing the video.
If UniFi Protect was re-skinned and had a bunch of its security camera complexity removed and optimized for the baby-camera use case it'd be normal consumer level friendly.
I'm not leaving a baby at home while I go on vacation. I would never be on another network, even. Why need the cloud?
It's much easier to create a device<->internet connection + a smartphone<->internet connection that it is to deal with the myriad of issues that occur if you try to do local device<->smartphone connections in networks with unknown topology and quirks (e.g. ISP provider being overly conservative in their firewall presets). If that in general would be a more trivial issue you would see less cloud services.
(You would probably still a similar amount of cloud services due the increased monetization options, but this would level the playing field for local-only options.)
Many cheap baby monitors are WiFi connected. You have to haul the video unit around and keep it live to hear when it cuts out, then move back toward where WiFi coverage was good.
This won’t seem like a big deal to someone who lives in an apartment or who has a house with 7 Ubiquiti APs covering everywhere inside and out, but it is a big deal to a parent who has a single WiFi router and wants to be able to do something like pull weeds in the yard, have a conversation with the neighbor, or go to a detached garage and work on a project without having to worry about their exact WiFi coverage at every moment to check on the baby.
I made a simple wood mount and painted it to match the crib. It worked well. There was no software enforcement requiring you to buy their mount at the time. Has this changed recently?
It's not that easy. The only usecase that is actually really fucking easy is when both the camera and the device trying to access it is in the same network - broadcasts for discovery, that's it. Although I've seen people turn on "client isolation" in their wifi back when I did computer repairs, so it's not a given that this works!
But as soon as that assumption goes out the window - and if it's just you going into the garden to check on some weeds where the wifi doesn't reach - the task suddenly becomes so, so much harder:
- the "easiest" case is an ISP that hands your wifi router a globally routed IPv4 address, allows UPnP to be configured, and the user has UPnP configured. All that the camera has to do here is to request a port opening and that's it. Still, you as manufacturer need a server to store a mapping between user, IP address and port. (And you need to hope that the user's mobile device or their ISP doesn't have a nasty firewall blocking non-standard ports)
- No UPnP? Now you as manufacturer either need some STUN/TURN server or explain to the user how to manually enable port forwarding.
- Worst case: the user's ISP either has IPv6 only, CGNAT, double/triple/... NAT or similar shit in play because they don't have enough IP addresses to supply to their customer base. That's pretty much impossible even with STUN/TURN, sooo many ways for things to go wrong along the path.
- even a theoretical fully IPv6 world where everyone has globally routed IPv6 addresses everywhere and all ISPs have their routing working still wouldn't solve the issue... because consumer ISP routers enable a firewall on IPv6 to avoid stuff like "online game cheaters 0wning their opponents running an outdated version of their game".
The sad reality is, running a cloud service is the only actually pain-free way for any given smart Thing to work as the customer expects it.
And on top of that, a NAS capable of storing video costs about 300-ish bucks with a HDD capable of running 24/7 and eats about 10-ish watts of electricity, which is quite the cost factor on its own.
Sure, the "nerd population" here on HN can rig something up that works in a matter of a few days, including some rudimentary AI to spot if the baby managed to escape the crib. But the 99% of people out there will crash at the "please open your router's config page to allow UDP port 65535 passthrough" step, if only because they forgot the password that they set five years ago.
Exactly. There are a lot of comments in this thread from people who are either non-parents or who haven’t lived in a situation where they didn’t have perfect WiFi coverage of their entire living area.
Being able to visit the neighbors or go out in the yard without worrying about missing baby monitor events is a huge advantage that many parents will pay for.
I think this entire comment section is a prime example of HN not understanding non-technical audiences.
It sounds like they're not hosting it though. They are processing it, and storing it temporarily while it's queued.
A fully self hosted AI powered baby monitor that accurately detects sleep states and danger situations would be incredibly expensive today. Maybe not in a few years though.
Baby monitors around here -Alecto is a popular brand - cost twice as much and have only half the capabilities.
I don't understand this attitude, sure its easy for some people but MOST people want an easy out of the box solution
its nothing wrong with that
They don't provide a display, so I put a Raspberry Pi, a display, and an audio hat in an enclosure. It plays an rtsp stream from the camera at startup and works pretty well.
Deleted Comment
The only practical reason to put a video in S3 for an average of 2 seconds is to provide additional redundancy, and replacing that with a cache removes most of the redundancy.
Feels like if you uploaded these to an actual server, the server could process them on upload, and you could eliminate S3, the queue in SQS, and the lambdas all in one fell swoop...
Don't know how they came up with such a bad and complicated cloud design for something that is straight forward.
It's less about whether I would have a use case for this exact thing (or whether or not it was appropriate for this use case, i dunno, prob don't have enough context to know).
More just seeing what is possible, how they thought about it and analyzed it, what they found unexpected and how, etc. I learned a lot!
> We used S3 even though it wasn’t the right service
The solution they document also matches the S3 'reduced redundancy' storage option, so I hope they had this enabled from day one.