_ben_ (u/_ben_) - Readit News

_ben_ commented on Show HN: Edge HTTP to S3 edge.mq/... · Posted by u/_ben_

Would you mind sharing a motivating use case for those of us who don't think S3 is complicated or unreliable? Doesn't S3 already include HTTP upload capability? Are ML engineers really avoidant of basic operations like "HTTP retries and S3 multipart uploads"?

_ben_ · 3 months ago

Thanks for the question. You’re right that S3 itself is simple and reliable, and yes, most engineers *can* write HTTP retries and multipart uploads. EdgeMQ isn’t trying to replace S3’s API, it’s what you need around S3 when you have lots of producers on the public internet.

It gives you:

* edge HTTPS endpoints (auto-scale, multi-region HA) * a WAL so accepted events aren’t lost * segmentation + compression * explicit commit markers for consumers * backpressure instead of silent data loss * and a standardized way every team lands data in S3

You could build that yourself on top of S3; many companies do. EdgeMQ exists for folks who wants that behavior but dont want to operate a custom HTTP to S3 ingest service forever.

Its also worth noting that its in the early stages and the next features to be developed are transformations whereby you can input format a (say, JSON) and deliver in s3 as format b (e.g. csv, parquet etc).

_ben_ commented on How Modern SQL Databases Are Changing Web Development: Part 1 blog.whimslab.io/how-mode... · Posted by u/thunderbong

skybrian · 3 years ago

This pitch is rather opaque to me. How does cache invalidation actually work?

I don't see how cache invalidation happens at all unless all changes go through PolyScale. What about making a change to the database directly?

_ben_ · 3 years ago

Thanks for the questions. At a very high level, the AI uses statistical models that learn in real-time and estimate how frequently the data on the database is changing. The TTL's get set accordingly and are set per SQL query. The model looks at many inputs such as the payload sizes being returned from the database as well as arrival rates.

If PolyScale can see mutation queries (inserts, updates, deletes) it will automatically invalidate, just the effected data from the cache, globally.

If you make changes directly to the database out of band to PolyScale, you have a few options depending on the use case. Firstly, the AI, statistical based models will invalidate. Secondly, you can purge - for example after a scheduled import etc. Thirdly, you can plug in CDC streams to power the invalidations.

Feel free to ping me if you would like to dig in deeper (ben at) and this document provides more detail on the caching protocol: https://docs.polyscale.ai/how-does-it-work#caching-protocol

This blog also goes in to detail on how invalidation works: https://www.polyscale.ai/blog/approaching-cache-invalidation...

_ben_ commented on How Modern SQL Databases Are Changing Web Development: Part 1 blog.whimslab.io/how-mode... · Posted by u/thunderbong

_ben_ · 3 years ago

At PolyScale [1] we tackle many of the same challenges. Some of this article feels a little dated to me but the data distribution, connectivity and scaling challenges are valid.

We use caching to store data and run SQL compute at the edge. It is wire protocol compatible with various databases (Postgres, MySQL, MS SQL, MariaDB) and it dramatically reduces query execution times and lower latency. It also has a JS driver for SQL over HTTP, as well as connection pooling for both TCP and HTTP.

https://www.polyscale.ai/

_ben_ commented on So, you want to deploy on the edge? zknill.io/posts/edge-data... · Posted by u/zknill

arrty88 · 3 years ago

What about local read only replicas of the db in each region, and one primary in your primary region? Or a write through cache in each region.

_ben_ · 3 years ago

PolyScale [1] focuses on many of these issues. It provides a globally distributed database cache at the edge. Writes pass through to the database and reads are cached locally to the app tier. The Smart Invalidation feature inspects updates/deletes/inserts and invalidates just the changed data from the cache, globally.

1. https://www.polyscale.ai/

_ben_ commented on How PlanetScale Boost serves SQL queries faster planetscale.com/blog/how-... · Posted by u/mrbbk

rbranson · 3 years ago

I tried to use PolyScale in the past but had issues with performance because updating a row would invalidate the entire cache. I wonder if that has improved?

_ben_ · 3 years ago

Yes, in the early versions of the automated invalidation, the logic cleared all cached data based on tables. That is no longer the case. The invalidations only remove the affected data from the cache, globally. You can read more here: https://docs.polyscale.ai/how-does-it-work#smart-invalidatio...

_ben_ commented on How PlanetScale Boost serves SQL queries faster planetscale.com/blog/how-... · Posted by u/mrbbk

_ben_ · 3 years ago

For database caching outside of PlanetScale, PolyScale.ai [1] provides a serverless database edge cache that is compatible with Postgres, MySQL, MariaDB and MS SQL Server. Requires zero configuration or sizing etc.

1.https://www.polyscale.ai/