Supabase Wrappers: A framework for building Postgres foreign data wrappers

hey hn, supabase ceo here

this one might be more in of a “Show HN” because it’s a pre-release - something that you might want to try yourself or contribute to. The GitHub repo is here: https://github.com/supabase/wrappers

For context, Postgres has Foreign Data Wrappers (FDW) [0]. These allow you to connect one Postgres database to another, and then query the secondary database directly in the first.

    CREATE FOREIGN TABLE other_database_table ( 
      id integer, 
      title text 
    )
    SERVER other_database;

The data does not need to be “moved” to the primary database - it stays in the secondary and is fetched when you run a select query:

    select * from other_database_table;

Our release today is a framework which extends this functionality to other databases/systems. If you’re familiar with Multicorn[1] or Steampipe[2], then it’s very similar. The framework is written in Rust, using the excellent pgx[3].

We have developed FDWs for Stripe, Firebase, BigQuery, Clickhouse, and Airtable - all in various pre-release states. We'll focus on the systems we’re using internally while we stabalize the framework.

There’s a lot in the blog post into our goals for this release. It’s early, but one of the things I’m most excited about.

[0] Postgres FDW: https://www.postgresql.org/docs/current/sql-createforeigndat...

[1] Multicorn: https://multicorn.org/

[2] Steampipe: https://steampipe.io/

[2] pgx: https://github.com/tcdi/pgx

chrisjc · 3 years ago

The blog post, repo and your comment still leave me a little confused about what Supabase's wrappers project is, the intent and its current state.

So this "wrappers" project is just a framework that simplifies the development and manages the curation and distribution of Postgres Foreign Data Wrappers? At the end of the day, are these just FDWs that run on top of any Postgres instance? Do I have to run Supabase to take advantage of the FDWs created through this framework?

Additionally one of your examples is Snowflake, but I don't see a matching SF FDW in your repo? Is there a generic JDBC/SQL/etc FDW for any SQL based database like Snowflake, Oracle, etc? Or is this just to create a spark in someone's mind (guilty!) about what is possible and yet to be implemented?

Despite my confusion, this sounds like a very exciting project to follow.

burmecia · 3 years ago

| At the end of the day, are these just FDWs that run on top of any Postgres instance?

Yes, the FDWs developed by Wrappers can be used on any Postgres db.

| Do I have to run Supabase to take advantage of the FDWs created through this framework?

No, you don't need to. Those are just normal FDWs can be used on any Postgres db.

| Is there a generic JDBC/SQL/etc FDW for any SQL based database like Snowflake, Oracle, etc?

The Snowflake FDW isn't developed yet, the example is just for concept demo. It has ClickHouse and BigQuery FDWs although they are not JDBC based. Generic JDBC FDW will need JVM embedded so it is not in plan at this moment.

Disclaimer: I am Supabase developer.

kiwicopple · 3 years ago

Full documentation is here: https://supabase.github.io/wrappers/

This is wonderful!

I love the pragmatism behind tools like osquery and steampipe that expose a lot of APIs as database tables. It makes these datasets available to non-programmers that are more comfortable with a database/tabular format.

Is it fair to say though, that FDWs have to run as compiled shared libs? I've always wondered if there can be (like with VS code and language servers) a protocol where we can run a generic FDW that speaks a particular API over the network, and then you can build out-of-process connectors to it and just have to know the usual tools (HTTP, JSON, etc).

Thoughts? Maybe this already exists.

Then anyone could potentially build a bespoke API/dataset and just point an FDW to it.

kiwicopple · 3 years ago

> speaks a particular API over the network

it's a interesting idea, and one of the things that we were toying with in our pg_net extension (https://github.com/supabase/pg_net). This is a "generic" async network extension, so you can fetch/put/post. It works well for APIs.

I think the generic approach works for some things where the data is less "fixed" - for example, an OpenAI API endpoint.

But for "fixed" data (data warehouses), the wrapper usually needs some custom work for security, protocols, and "push down". I'll be interested to get HN's take on this - they might have some suggestions for us for this framework

atonse · 3 years ago

I don't think so... in your example those are functions we just call in SQL right? I'm talking about exposing arbitrary data as a table like an FDW, but without having to compile a whole module.

So more like, (this is a bit contrived but bear with me):

Let's say I wanted to make a table that lists all my keybase proofs (so sad Keybase is basically dead, but I digress).

I'd build a graphql endpoint where you can query those proofs.

Then I'd have to define a metadata endpoint that maybe returns a JSON schema?

Then you can tell the FDW to define a virtual table that (as much as possible) uses the JSON schema to generate columns in this virtual table.

You'd define the FDW to say, here's the virtual table (like you'd do now), and then put in an option to say "you'll get the mappings for this table with the JSON schema returned by this metadata endpoint" and then tell the FDW "send select queries to this REST/graphql endpoint" which would be something in our example that returns keybase proofs).

I wouldn't need to build a whole new keybase FDW. I'd just use the generic one to hit that existing endpoint.

Is that making sense? It's kind of a dynamic FDW that can be configured for the long tail of API endpoints.

tpetry · 3 years ago

Like pgsql-http?

https://github.com/pramsey/pgsql-http

sisve · 3 years ago

So much great postgres stuff comming out from supabase. I'm really impressed with how much focus and direction the team have. So many companies would try to broaden their product/feature suite, but fail at understanding what made the core sucsessful and stretch the company in many directions. It really seems like they are just making them self better and better experts at postgres. And how to use it well. Really looking forward to see where this is going.

On a sidenote, is not the wrappers for Airtable, BigQuery and ClickHouse opensourced? Or why did they skip that column in the second table?

Is all of supabase opensourced now? I'm meaning I heard something about function not being opensourced but I can be remembering wrong. Most stuff is on GitHub I see

oliverrice · 3 years ago

> On a sidenote, is not the wrappers for Airtable, BigQuery and ClickHouse opensourced? Or why did they skip that column in the second table?

All of the wrappers are open source. You can see the source for the Airtable, BigQuery, and ClickHouse wrappers here https://github.com/supabase/wrappers/tree/5fac8afb62e6e8362b...

The `self-hosted` column is only missing from the "under development" wrappers in the blog post's table because those are not production ready and shouldn't be self hosted (yet).

> Is all of supabase opensourced now?

Yep! The whole stack is open source and can be self hosted

nixalaister · 3 years ago

I'm excited about Wrappers because of an idea born in the GraphQL community: using GraphQL as the point of integration for many different services in your app. A use case GraphQL seemed made for. It's not without problems, though, such as when you need data from one of those services inside your database. So this takes that idea and moves it one level down into the database. You still get the benefits of a unified API (through REST or GraphQL) without the same limitations. I think there couldn't be a more ideal point of integration!

disclaimer: I am a Supabase employee, but these views are my own.

jensneuse · 3 years ago

We (https://wundergraph.com) do exactly this, API integration with GraphQL as the integration layer. Integrating APIs is messy. It could require custom middleware, etc... Glueing something together is the easy part. How do you mock it? How do you test it? Add custom logic? Doing it at the API layer allows you to seamlessly scale your serverless workloads. Doing all this work in the database means that we're eating up resources for tasks that the database is not designed to do. Just because FDW exists doesn't mean we should use it for this use case.

> How do you mock it? How do you test it?

Stripe and Firebase both offer test endpoints for their services. The foreign data wrappers for each allow subbing in a user defined endpoint in their `options` so thats how I'd recommend testing the two that have released. Some of the pre-release integrations can be spun up locally in docker e.g. clickhouse. The FDWs for those similarly take an arbitrary connection string making it pretty straightforward to write tests. Here's an example from the repo https://github.com/supabase/wrappers/blob/5fac8afb62e6e8362b...

> Add custom logic?

Views!

> tasks that the database is not designed to do

The C API for foreign data wrappers is baked right into Postgres proper. They've been around since 2013 and are pretty battle hardened at this point. Supabase Wrappers exposes that API in rust so users can write FDWs without worrying about accidentally tanking the Postgres process. Its more about making a great Postgres feature more accessible than tricking the database into doing anything new.

claytongulick · 3 years ago

I love Supabase and what they're doing! I evaluated them heavily when designing architecture for a healthcare product.

I'm not sure about this one though - rust is a great systems language, but it wouldn't be my first choice for bridging the db <-> api gap.

I wonder why this wasn't built on top of, or an enhancement to, the existing (excellent) multicorn[1] project. Python seems like a better choice of language for dealing with io bound problems like talking to remote APIs.

Multicorn is mature, stable, well tested and has a ton of FDW implementations already.

The dynamic nature of python simplifies the development/debug cycle, and serialization to/from JSON is easier than in any mainstream language except for javascript.

I'd love to understand more about the technical rationale that drove this.

[1] https://multicorn.org/

> I wonder why this wasn't built on top of, or an enhancement to, the existing (excellent) multicorn[1] project

Have to agree with you there, multicorn is extremely cool. I'm a big sqlalchemy fan so their default SQLA wrapper was a killer feature to give up (although maybe we could do something similar with launchbadge/sqlx[1]). We investigated using multicorn early this year and had a few hiccups. Activity on the original repo[2] quieted way down ~3 years ago. For example, pg14 support hasn't landed and the newest supported python version is EOL in 2022. There is new fork[3] with pg14 support (15 in the pipe) that might pick up in adoption but thats still TBD.

Supabase aims to support new major Postgres versions within 2-3 months so we have to be very careful taking on dependencies that might slow that process.

> I'd love to understand more about the technical rationale that drove this.

Architecturally, multicorn has postgres communicate with a separate python process on the host that does all the hard work. That's convenient, but it can bloat over time and/or be memory hungry for larger result sets. The rust implementation runs in-process and is generally a lot lighter.

Currently I'd say supabase/wrappers is a safer/easier version of the C API vs a direct analog to multicorn. Over time I think we'll see that comparison become more appropriate. There's a lot of excitement around the concept internally and we've already been floating some ideas wrt `auto-mapping` tables for common SQL dialects, a generic JSON HTTP API wrapper, etc. Stay tuned!

[1]https://github.com/launchbadge/sqlx [2]https://github.com/Segfault-Inc/Multicorn [3]https://github.com/pgsql-io/multicorn2

This was an excellent write-up, thank you for taking the time to explain, can't wait to see where it goes!

abuehrle · 3 years ago

Sorry I couldn't figure this out from the docs, but Stripe data is queried "live" from Stripe, right? The abstraction is great, but won't this lead to unexpected N API calls when joining across my domain + Stripe?

Yes, the data is queried live.

> unexpected N API calls when joining across my domain

I'm not sure why they would be unexpected (because it should displace some other API calls). I'll hazard a guess that you're worried about fetching the same data multiple times? If that's the case, then yes, you should materialize the data into your database.

The Wrapper itself handles pagination, but you'd also want to make sure you're using filters & limits in your SQL statements.

Thanks for the reply. I'll flesh out my thought process in case it's helpful. My immediate reaction was excitement about the abstraction. An example use case is joining my users to their corresponding Stripe Customers in SQL. The kinds of queries I can reasonably write depend on implementation details of the connector. For example, if Stripe has a bulk customer lookup (list of customer IDs -> Customers), and the connector uses it, I can estimate I'd be able to query on the order of 500 Users at a time in a performant way. But if the API only supports looking up one customer at a time, that 500 User query kicks off 500 API requests, which isn't going to work.

You're right -- it's not unexpected -- maybe more like a leaky abstraction.

ako · 3 years ago

Materialized views enable you to "cache" the response, and only refresh it periodically.

Wait, you're telling me that you can create materialized views using foreign tables in Postgres?

Is there a way to propagate changes from the foreign data source through the FDW to Postgres?

Or would it just be some kind of task polls the foreign data source pulling a delta?

thedangler · 3 years ago

So you're telling me I can use this to create my own wrapper to get data directly from another API?

I can now use this instead of creating my own workflow to get the data via an api which will be stored in the DB anyways?

yes, that's correct. we've built a read-only version for Stripe (which is API-based), and we aim to have the read/write implementation done soon.

You will be able to do something like this:

    insert into stripe_products (name)
    values ('Pizza'), ('Pasta');

This will insert a value into Stripe via the API. Then you can query your stripe products like this:

    select *
    from stripe_products
    limit 10;

omnibrain · 3 years ago

So I could implement one myself for arbitrary REST APIs?

I can't wait for somebody offering a generator, where you plug in your API, it pulls and parses the JSON, then you can select the fields you want and it generates the wrapper. (Alternatively, for put/post you could supply your own JSON).

Practically like some low/no code tools like Appsmith/Budibase and the likes already do today.

Sadly I lack the necessary skills and more important the time to dive into that.

Is there documentation on how to interact with a API for the wrapper? The API I'm probably going to try this with needs some crazy logic to parse the responses and does everything through query parameters.

Can't wait until this is ready.

jzelinskie · 3 years ago

I really want to love foreign data wrappers for Postgres and this seems like a big improvement over existing Python library, but the lack of support for them in managed databases services makes them a non-starter for so many use-cases.

Because RDS, for example, will only support the foreign data wrapper for reading from another "Postgres", what we really need is a server that supports the Postgres wire protocol (easier said than done) and you implement your drivers as a handler to that server.

radiowave · 3 years ago

Never tried this with RDS, but it's entirely possible to use postgres_fdw to interact with a foreign table on another postgres server, where you might have more choice about what extensions you run.

I'm doing this right now because I have a postgres installation that it's not yet convenient to upgrade beyond v12, but where I'd really like the benefit of the recently-improved JDBC FDW, which requires at least v13.

intrasight · 3 years ago

I am very interested in this idea. Has anyone used postgres_fwd in RDS to connect to their own postgres instance where they run custom extensions?