Readit News logoReadit News
Posted by u/huntaub 9 months ago
Launch HN: Regatta Storage (YC F24) – Turn S3 into a local-like, POSIX cloud FS
Hey HN, I’m Hunter the founder of Regatta Storage (https://regattastorage.com). Regatta Storage is a new cloud file system that provides unlimited pay-as-you-go capacity, local-like performance, and automatic synchronization to S3-compatible storage. For example, you can use Regatta to instantly access massive data sets in S3 with Spark, Pytorch, or pandas without paying for large, local disks or waiting for the data to download.

Check out an overview of how the service works here: https://www.youtube.com/watch?v=xh1q5p7E4JY, and you can try it for free at https://regattastorage.com after signing up for an account. We wanted to let you try it without an account, but we figured that “Hacker News shares a file system and S3 bucket” wouldn’t be the best experience for the community.

I built Regatta after spending nearly a decade building and operating at-scale cloud storage at places like Amazon’s Elastic File System (EFS) and Netflix. During my 8 years at EFS, I learned a lot about how teams thought about their storage usage. Users frequently told me that they loved how simple and scalable EFS was, and -- like S3 -- they didn’t have to guess how much capacity they needed up front.

When I got to Netflix, I was surprised that there wasn’t more usage of EFS. If you looked around, it seemed like a natural fit. Every application needed a POSIX file system. Lots of applications had unclear or spikey storage needs. Often, developers wanted their storage to last beyond the lifetime of an individual instance or container. In fact, if you looked across all Netflix applications, some ridiculous amount of money was being spent on empty storage space because each of these local drives had to be overprovisioned for potential usage.

However, in many cases, EFS wasn’t the perfect choice for these workloads. Moving workloads from local disks to NFS often encountered performance issues. Further, applications which treated their local disks as ephemeral would have to manually “clean up” left over data in a persistent storage system.

At this point, I realized that there was a missing solution in the cloud storage market which wasn’t being filled by either block or file storage, and I decided to build Regatta.

Regatta is a pay-as-you-go cloud file system that automatically expands with your application. Because it automatically synchronizes with S3 using native file formats, you can connect it to existing data sets and use recently written file data directly from S3. When data isn’t actively being used, it’s removed from the Regatta cache, so you only pay for the backing S3 storage. Finally, we’re developing a custom file protocol which allows us to achieve local-like performance for small-file workloads and Lustre-like scale-out performance for distributed data jobs.

Under the hood, customers mount a Regatta file system by connecting to our fleet of caching instances over NFSv3 (soon, our custom protocol). Our instances then connect to the customer’s S3 bucket on the backend, and provide sub-millisecond cached-read and write performance. This durable cache allows us to provide a strongly consistent, efficient view of the file system to all connected file clients. We can perform challenging operations (like directory renaming) quickly and durably, while they asynchronously propagate to the S3 bucket.

We’re excited to see users share our vision for Regatta. We have teams who are using us to build totally serverless Jupyter notebook servers for their AI researchers who prefer to upload and share data using the S3 web UI. We have teams who are using us as a distributed caching layer on top of S3 for low-latency access to common files. We have teams who are replacing their thin-provisioned Ceph boot volumes with Regatta for significant savings. We can’t wait to see what other things people will build and we hope you’ll give us a try at regattastorage.com.

We’d love to get any early feedback from the community, ideas for future direction, or experiences in this space. I’ll be in the comments for the next few hours to respond!

garganzol · 9 months ago
I used the same approach based on Rclone for a long time. I wondered what makes Regatta Storage different than Rclone. Here is the answer: "When performing mutating operations on the file system (including writes, renames, and directory changes), Regatta first stages this data on its high-speed caching layer to provide strong consistency to other file clients." [0].

Rclone, on the contrary, has no layer that would guarantee consistency among parallel clients.

[0] https://docs.regattastorage.com/details/architecture#overvie...

huntaub · 9 months ago
This is exactly right, and something that we think is particularly important for applications that care about data consistency. Often times, we see that customers want to be able to quickly hand off tasks from one instance to another which can be incredibly complex if you don't have guarantees that your new operations will be seen by the second instance!
wanderingmind · 9 months ago
Might be useful to show the differences with Rclone, s3fs as a table to make it obvious
benatkin · 9 months ago
The headline seems misleading, then.

rclone can work with AWS' different offerings, some of which at least partially address this: https://aws.amazon.com/blogs/aws/new-amazon-s3-express-one-z...

huntaub · 9 months ago
I'm not totally sure what you mean. I don't think that S3 Express One Zone offers any additional atomic semantics in the file system world.
freedomben · 9 months ago
Thanks, this was my thought as well. I use and love rclone and it wasn't immediately clear what this offered above that
dheera · 9 months ago
I suppose rclone doesn't provide byte range file locking? Running sqlite over rclone would be a disaster.
garganzol · 9 months ago
Running sqlite over rclone is not a disaster as long as you run only a single instance working with that database. Rclone provides no support for locking semantics.
huntaub · 9 months ago
That would be my expectation, you need something in the middle to actually broker the file locks.
memset · 9 months ago
This is honestly the coolest thing I've seen coming out of YC in years. I have a bunch of questions which are basically related to "how does it work" and please pardon me if my questions are silly or naive!

1. If I had a local disk which was 10 GB, what happens when I try to contend with data in the 50 GB range (as in, more that could be cached locally?) Would I immediately see degradation, or thrashing, at the 10 GB mark?

2. Does this only work in practice on AWS instances? As in, I could run it on a different cloud, but in practice we only really get fast speeds due to running everything within AWS?

3. I've always had trouble with FUSE in different kinds of docker environments. And it looks like you're using both FUSE and NFS mounts. How does all of that work?

4. Is the idea that I could literally run Clickhouse or Postgres with a regatta volume as the backing store?

5. I have to ask - how do you think about open source here?

6. Can I mount on multiple servers? What are the limits there? (ie, a lambda function.)

I haven't played with the so maybe doing so would help answer questions. But I'm really excited about this! I have tried using EFS for small projects in the past but - and maybe I was holding it wrong - I could not for the life of me figure out what I needed to get faster bandwidth, probably because I didn't know how to turn the knobs correctly.

huntaub · 9 months ago
Wow, thanks for the nice note! No questions are silly, and I'll also note that we now have a docs site (https://docs.regattastorage.com) and feel free to email me (hleath [at] regattastorage.com) if I don't fully address your questions.

> If I had a local disk which was 10 GB, what happens when I try to contend with data in the 50 GB range (as in, more that could be cached locally?) Would I immediately see degradation, or thrashing, at the 10 GB mark?

We don't actually do caching on your instance's disk. Instead, data is cached in the Linux page cache (in memory) like a regular hard drive, and Regatta provides a durable, shared cache that automatically expands with the working set size of your application. For example, if you were trying to work with data in the 50 GiB range, Regatta would automatically cache all 50 GiB -- allowing you to access it with sub-millisecond latency.

> Does this only work in practice on AWS instances? As in, I could run it on a different cloud, but in practice we only really get fast speeds due to running everything within AWS?

For now, yes -- the speed is highly dependent on latency -- which is highly dependent on distance between your instance and Regatta. Today, we are only in AWS, but we are looking to launch in other clouds by the end of the year. Shoot me an email if there's somewhere specifically that you're interested in.

> I've always had trouble with FUSE in different kinds of docker environments. And it looks like you're using both FUSE and NFS mounts. How does all of that work?

There are a couple of different questions bundled together in this. Today, Regatta exposes an NFSv3 file system that you can mount. We are working on a new protocol which will be mounted via FUSE. However, in Docker environments, we also provide a CSI driver (for use with K8s) and a Docker volume plugin (for use with just Docker) that handles the mounting for you. We haven't released these publicly yet, so shoot me an email if you want early access.

> Is the idea that I could literally run Clickhouse or Postgres with a regatta volume as the backing store?

Yes, you should be able to run a database on Regatta.

> I have to ask - how do you think about open source here?

We are in the process of open sourcing all of the client code (CSI driver, mount helper, FUSE), but we don't have plans currently to open source the server code. We see the value of Regatta in managing the infrastructure so you don't have to, and if we release it via open-source, it would be difficult to run on your own.

> Can I mount on multiple servers? What are the limits there? (ie, a lambda function.)

Yes, you can mount on multiple servers simultaneously! We haven't specifically stress-tested the number of clients we support, but we should be good for O(100s) of mounts. Unfortunately, AWS locks down Lambda so we can't mount arbitrary file systems in that environment specifically.

> efs performance

Yes, the challenge here is specifically around the semantics of NFS itself and the latency of the EFS service. We think we have a path to solving both of these in the next month or two.

gizmo · 9 months ago
Do I understand correctly that the data gets decrypted at your Regatta AWS instances, before the data ends up in the customer's S3 bucket? It sounds like the SSL pipe used for NFS is terminated at Regatta servers. Can customers run the Regatta service on their own hardware?

Or does Regatta only have access to filesystem metadata -- enough to do POSIX stuffs like locks, mv, rm -- but the file contents themselves remain encrypted end-to-end?

memset · 9 months ago
Thank you for the detailed answers! Honestly, this project inspires me to work on infrastructure problems.

So you are saying that regatta's own SaaS infrastructure provides the disk caching layer. So you all make sure the pipe between my AWS instance and your servers are very fast and "infinitely scalable", and then the sync to S3 happens after the fact.

0x1ceb00da · 9 months ago
Are you planning to support android? How? AFAIK android doesn't have FUSE or NFS.

Dead Comment

mritchie712 · 9 months ago
Pretty sure we're in your target market. We [0] currently use GCP Filestore to host DuckDB. Here's the pricing and performance at 10 TiB. Can you give me an idea on the pricing and performance for Regatta?

Service Tier: Zonal

Location: us-central1

10 TiB instance at $0.35/TiB/hr

Monthly cost: $2,560.00

Performance Estimate:

Read IOPS: 92,000

Write IOPS: 26,000

Read Throughput: 2,600 MiB/s

Write Throughput: 880 MiB/s

0 - https://www.definite.app/blog/duckdb-datawarehouse

huntaub · 9 months ago
Yes, you should be in our target market. I don't think that I can give a cost estimate without having a good sense of what percentage of your data you're actively using at any given time, but we should absolutely support the performance numbers that you're talking about. I'd love to chat more in detail, feel free to send me a note at hleath [at] regattastorage.com.
mritchie712 · 9 months ago
I'll send you a note!

Found this in the docs:

> By default, Regatta file systems can provide up to 10 Gbps of throughput and 10,000 IOPS across all connected clients.

Is that the lower bound? The 50 TiB filestore instance has 104 Gbps read through put (albeit at a relatively high price point).

_bare_metal · 9 months ago
Out of curiosity, why not go bare metal in a managed colocation? Is that for the geographic spread? Or unpredictable load?

Every few months of this spend is like buying a server

Edit: back at my pc and checked, relevant bare metal is ~$500/m, amortized:

https://baremetalsavings.com/c/LtxKMNj

Edit 2: for 100tb..

mritchie712 · 9 months ago
agreed, one month of 50 TiB is $12,800!

we're using Filestore out of convenience right now, but actively exploring alternatives.

nthh · 9 months ago
This is compelling but it would be useful to compare upfront costs here. Investing $20,000+ in a server isn't feasible for many. I'd also be curious to know how much a failsafe (perhaps "heatable" cold storage, at least for the example) would cost.
nine_k · 9 months ago
Hiring someone who knows how to manage bare metal (with failover and stuff) may take time %)
cuno · 9 months ago
Founder of cunoFS here, brilliant to see lots of activity in this space, and congrats on the launch! As you'll know, there's a whole galaxy of design decisions when building file storage, and as a storage geek it's fun to see what different choices people make!

I see you've made some similar decisions to what we did for similar reasons I think - making sure files are stored 1:1 exactly as an object without some proprietary backend scrambling, offering strong consistency and POSIX semantics on the file storage, with eventual consistency between S3 and POSIX interfaces, and targeting high performance. Looks like we differ on the managed service vs traditional download and install model, and the client-first vs server-first approach (though some of our users also run cunoFS on an NFS/SMB gateway server), and caching is a paid feature for us versus an included feature for yours.

Look forward to meeting and seeing you at storage conferences!

huntaub · 9 months ago
Great to hear from you, I think cunoFS is doing a lot of things right! It’s certainly a fun problem space!
Andys · 9 months ago
Is that Gweo? Didn't know you were in the storage space, good to see you!
cuno · 9 months ago
Good to see you too! Lets catchup sometime - tried to connect with you separately but no luck.
jitl · 9 months ago
I’m very interested in this as a backing disk for SQLite/DuckDB/parquet, but I really want my cached reads to come straight from instance-local NVMe storage, and to have a way to “pin” and “unpin” some subdirectories from local cache.

Why local storage? We’re going to have multiple processes reading & writing to the files and need locking & shared memory semantics you can’t get w/ NFS. I could implement pin/unpin myself in user space by copying stuff between /mnt/magic-nfs and /mnt/instance-nvme but at that point I’d just use S3 myself.

Any thoughts about providing a custom file system or how to assemble this out of parts on top of the NFS mount?

huntaub · 9 months ago
Hey -- I think this is something that's in-scope for our custom protocol that we're working on. I'd love to chat more about your needs to make sure that we build something that will work great for you. Would you mind shooting an email to hleath [at] regattastorage.com and we can chat more?
juancampa · 9 months ago
We're also interested in SQLite shared by multiple processes on something like Regatta but my concerns are the issues described in the SQLite documentation about NFS [1]. Notably "SQLite relies on exclusive locks for write operations, and those have been known to operate incorrectly for some network filesystems."

[1] https://sqlite.org/useovernet.html

daviesliu · 9 months ago
Founder of JuiceFS here, congrats to the Launch! I'm super excited to see more people doing creative things in the using-S3-as-file-system space. When we started JuiceFS back in 2017, applied YC for 2 times but no luck.

We are still working hard on it, hoping that we can help people with different workloads with different tech!

huntaub · 9 months ago
Wow, thanks for coming out! I hope that you're heartened to see the number of people who immediately think of JuiceFS when they see our launch. I totally agree with you, storage is such an interesting space to work in, and I'm excited that there are so many great products out there to fit the different needs of customers.
gumbojuice · 9 months ago
As someone who is already happily using JuuceFS, perhaps you can provide a short list of differences (conceptual and/or technical). Thanks for a great product.
dsvf · 9 months ago
I'm a happy and satisfied JuiceFS user here, so I too would be interested in the difference between these. Is the Regatta key point caching?
ChocolateGod · 9 months ago
Also a user of JuiceFS, replaced a GlusterFS cluster a few years ago, far cheaper and easier to scale with no issues or changes needed to the applications using GlusterFS.
huntaub · 9 months ago
I know that I've answered this question a couple times in the thread, so I don't know if my words add extra value here. But, I agree that it would be interesting to hear what Davies is thinking.
boulos · 9 months ago
I love this space, and I have tried and failed to get cloud providers to work on it directly :). We could not get the Avere folks to admit that their block-based thing on object store was a mistake, but they were also the only real game in town.

That said, I feel like writeback caching is a bit ... risky? That is, you aren't treating the object store as the source of truth. If your caching layer goes down after a write is ack'ed but before it's "replicated" to S3, people lose their data, right?

I think you'll end up wanting to offer customers the ability to do strongly-consistent writes (and cache invalidation). You'll also likely end up wanting to add operator control for "oh and don't cache these, just pass through to the backing store" (e.g., some final output that isn't intended to get reused anytime soon).

Finally, don't sleep on NFSv4.1! It ticks a bunch of compliance boxes for various industries, and then they will pay you :). Supporting FUSE is great for folks who can do it, but you'd want them to start by just pointing their NFS client at you, then "upgrading" to FUSE for better performance.

huntaub · 9 months ago
> That is, you aren't treating the object store as the source of truth. If your caching layer goes down after a write is ack'ed but before it's "replicated" to S3, people lose their data, right?

This is exactly why we're building our caching layer to be highly-durable, like S3 itself. We will make sure that the data in the cache is safe, even if servers go down. This is what gives us the confidence to respond to the client before the data is in S3. The big difference between the data living in our cache and the data living in S3 is cost and performance, not necessarily durability.

> I think you'll end up wanting to offer customers the ability to do strongly-consistent writes (and cache invalidation). You'll also likely end up wanting to add operator control for "oh and don't cache these, just pass through to the backing store" (e.g., some final output that isn't intended to get reused anytime soon).

I think this is exactly right. I think that storage systems are too often too hands off about the data (oh, give us the bytes and we will store them for you). I believe that there are gains to be had by asking the users to tell you more about what they're doing. If you have a directory which is only used to read files and a directory which is only used to write files, then you probably want to have different cache strategies for those directories? I believe we can deliver this with good enough UX for most people to use.

> Finally, don't sleep on NFSv4.1! It ticks a bunch of compliance boxes for various industries, and then they will pay you :). Supporting FUSE is great for folks who can do it, but you'd want them to start by just pointing their NFS client at you, then "upgrading" to FUSE for better performance.

I certainly don't, and this is why we are supporting NFSv3 right now. That's not going away any time soon. We want to offer something that's highly compatible with the industry at large today (NFS-based, we can talk specifics about whether or not that should be v3 or v4) and then something that is high-performance for the early adopters who can use something like FUSE. I think that both things are required to get the breadth of customers that we're looking for.

convivialdingo · 9 months ago
Wow, looks like a great product! That's a great idea to use NFS as the protocol. I honestly hadn't thought of that.

Perfect.

For IBM, I wrote a crypto filesystem that works similarly in concept, except it was a kernel filesystem. We crypto split the blocks up into 4 parts, stored into cache. A background daemon listened to events and sync'ed blocks to S3 orchestrated with a shared journal.

It's pure magic when you mount a filesystem on clean machine and all your data is "just there."

huntaub · 9 months ago
> It's pure magic when you mount a filesystem on clean machine and all your data is "just there."

I totally agree! I am hoping that Regatta can power a future where teams don't need more than ~8 GiB of local storage for their operating system, and can store the rest on something like Regatta to get rid of the waste of overprovisioned block volumes.

lijok · 9 months ago
That would sell like hot cakes to the public sector.