Readit News logoReadit News
noahl commented on Colossus for Rapid Storage   cloud.google.com/blog/pro... · Posted by u/alobrah
immibis · 8 months ago
I meant it sarcastically, but for "serious money" you can have any software system you can dream of. You have to dream of it, though - that's one of the hard parts.

It looks like every other clustered file system. What's special about Google's Colossus?

noahl · 8 months ago
There are some semantic differences compared to POSIX filesystems. A couple big ones:

  - You can only append to an object, and each object can only have one writer at the time. This is useful for distributed systems - you could have one process adding records to the end of a log, and readers pulling new records from the end.
  - It's also possible to "finalize" an object, meaning that it can't be appended to any more.
(I work on Rapid storage.)

noahl commented on Colossus for Rapid Storage   cloud.google.com/blog/pro... · Posted by u/alobrah
derefr · 8 months ago
In Google Cloud parlance, "regional" usually means "transparently master-master replicated across the availability zones within a region", while "zonal" means "not replicated, it just is where it is."
noahl · 8 months ago
Slight nit: "zonal" doesn't necessarily mean "not replicated", it means that the replicas could all be within the same zone. That means they can share more points of failure. (I don't know if there's an official definition of zonal.)

NB: I am on the rapid storage team.

noahl commented on Colossus for Rapid Storage   cloud.google.com/blog/pro... · Posted by u/alobrah
dang · 8 months ago
That link doesn't work for me, so here's the relevant bit:

Rapid Storage: A new Cloud Storage zonal bucket that enables you to colocate your primary storage with your TPUs or GPUs for optimal utilization. It provides up to 20x faster random-read data loading than a Cloud Storage regional bucket.

(Normally we wouldn't allow a post like this which cherry-picks one bit of a larger article, but judging by the community response it's clear that you've put your finger on something important, so thanks! We're always game to suspend the rules when doing so is interesting.)

noahl · 8 months ago
There's now another blog post about Rapid storage specifically: https://cloud.google.com/blog/products/storage-data-transfer... . (That wasn't up yet when the original post was made.)
noahl commented on Google Launches AI Supercomputer Powered by Nvidia H100 GPUs   tomshardware.com/news/goo... · Posted by u/jonbaer
ttul · 3 years ago
And my main worry is: are they just going to cancel the new thing that my company invested six months and $250,000 of engineering time integrating with…
noahl · 3 years ago
No, not for GCP stuff.

I don't know of a single GCP product that's been shut down, although I could be missing something. But their track record for GCP is, I think, what you would want a cloud provider's record to be.

(I should mention that I work for GCP. But this is just based on my own memory.)

noahl commented on Discovering Azure's unannounced breaking change with Cosmos DB   metrist.io/blog/how-we-fo... · Posted by u/jmartens
HorizonXP · 3 years ago
Azure is our cloud provider. Interface is flexible, since our current implementation leverages Prisma ORM connected to Postgres & SQL Server. We're going to have to rebuild it anyway.
noahl · 3 years ago
Got it, thank you! CockroachDB is the only one I know offhand that does what you're looking for. Another comment mentioned Vitess, which might also work.

It seems like there are a lot of options for large scale analytics, but I don't know a lot for high throughout geo-redundant transaction processing.

noahl commented on Discovering Azure's unannounced breaking change with Cosmos DB   metrist.io/blog/how-we-fo... · Posted by u/jmartens
HorizonXP · 3 years ago
So, as someone who was in the midst of planning a migration of a multi-billion $ revenue platform to using CosmosDB...

Alternatives? LOL

Basically just looking for geo-redundant, high read & write throughput. Our intention was to leverage Azure Event Grid/Kafka Connect to have event streaming used to coordinate writes between Redis (cache), Cosmos (transactional DB), and our systems of record (legacy). Majority of read/writes would occur via our API, but some would occur via the systems of record, hence the use of a log-based architecture.

noahl · 3 years ago
Spanner offers that on GCP, and I believe CockroachDB offers something similar cross-cloud.

Do you have any specific requirements for which cloud provider you use, or any particular interface you really need?

noahl commented on Show HN: I made a simple platform to buy/sell side projects    · Posted by u/heyarviind2
xEnOnn · 3 years ago
What are usually the solutions to such problems with a two-sided marketplace?
noahl · 3 years ago
The solution is posting this marketplace on a forum full of people who might like to buy and sell side projects
noahl commented on Project Starline: Feel like you're there, together   blog.google/technology/re... · Posted by u/ra7
Pxtl · 5 years ago
This is cool as hell, but I have to say I feel like we're solving top-level problems when most consumers don't even seem to be getting solutions to the most basic pain-points.

For me, the problem with video-calling isn't the image-quality. It's all the much more mundane technological problems - high latency, lag-spikes caused by bad ISPs, failed noise-cancellation for people who don't use headsets for audio, bad wifi routers cutting out, etc.

First thing I did when I realized we were going to be WFH long-term was buy myself a $100 gaming headset. Next thing I did was get all my home computer stations wired with Cat 6.

That stuff is far more fundamental and far less interesting than 3D telepresence, but it's the real unsexy problem that so many people are suffering through this pandemic.

Even simple things like latency make simple, natural reactions agonizing. Talkcover and crosstalk is incessent and I've developed a filthy habit of just talking over people because otherwise it's a solid 20 seconds of "you go no you go" caused by awful latency. I've had to defuse angry reactions by co-workers who feel they're being interrupted by other co-workers and explain to them that the latency makes interruptions feel worse than they are.

I've tried to push friends to join me on my private Mumble server where the latency is near-nil and the call-quality is excellent, but there's always one person who doesn't have a working headset and wants to just use a laptop or tablet mic with no feedback-cancelling that destroys the conversation through echos (plus Mumble's auth system is needlessly bewildering).

Then with video, problems are similar but less impactful - cheap cameras, poor lighting, compression artifacts, poor sync with the audio, etc. And it's infuriating because every person has a wonderfully powerful camera in their pocket right now - and there's software to connect them but it's just too tricky for most people.

Good on Google for taking an interest in the subject, but I feel like they're decorating the apex of the technological pyramid while most people are pushing stones around at the bottom.

noahl · 5 years ago
I mean to be fair Google has also tried very, very hard to improve home internet access for people, to the point of setting up their own ISP and running municipal fiber networks. That's a pretty big try, and I really wish it had taken off beyond the places where Google Fiber operates.

(NB: I work at Google, but this comment has nothing to do with my work.)

Deleted Comment

noahl commented on Improving large monorepo performance on GitHub   github.blog/2021-03-16-im... · Posted by u/todsacerdoti
crecker · 5 years ago
I can bet whatever you want they did this improvement for microsoft/windows repo.
noahl · 5 years ago
Microsoft/windows is hosted on Azure DevOps, and they have also blogged about what they've done to improve its performance!

Here's a recent post: https://devblogs.microsoft.com/devops/introducing-scalar/

u/noahl

KarmaCake day1375March 22, 2011View Original