How and why GraphQL will influence the Sourcehut alpha

I don't understand the attraction to Graphql. (I do understand it if maybe you actually want the things that gRPC or Thrift etc gives you)

It seems like exactly the ORM solution/problem but even more abstract and less under control since it pushes the orm out to browser clients and the frontend devs.

ORM suffer from being at beyond arms length from the query analyzer in the database server.

https://en.wikipedia.org/wiki/Query_optimization

A query optimizer that's been tuned over decades by pretty serious people.

Bad queries, overfetching, sudden performance cliffs everywhere.

Graphql actually adds another query language on top of the normal orm problem. (Maybe the answer is that graphql is so simple by design that it has no dark corners but that seems like a matter of mathematical proof that I haven't seen alluded to).

Why is graphql not going to have exactly this problem as we see people actually start to work seriously with it?

Four or five implementations in javascript, haskell and now go. From what I could see none of them were mentioning query optimization as an aspiration.

baddox · 5 years ago

GraphQL is quite similar to SQL. They’re both declarative languages, but GraphQL is declaring a desired data format, whereas SQL is declaring (roughly) a set of relational algebra operations to apply to a relational database. GraphQL is really nothing like an ORM beyond the fact that they are both software tools used to get data from a database. You might use an ORM to implement the GraphQL resolvers, but that’s certainly not required.

I wouldn’t expect the performance issues to be much more problematic than they would be for REST endpoints that offer similar functionality. If you’re offering a public API, then either way you’re going to need to solve for clients who are requesting too many expensive resources. If you control the client and the server, then you probably don’t need to worry about it beyond the testing of your client code you would need to do anyway.

As far as query optimization goes, that’s largely out of scope of GraphQL itself, although many server implementations offer interesting ways to fulfill GraphQL queries. Dataloader is neat, and beyond that, I believe you can do any inspection of the query request you want, so you could for example see the nested path “Publisher -> Book -> Author -> name” and decide to join all three of those tables together. I’m not aware of any tools that provide this optimization automatically, but it’s not difficult to imagine it existing for some ORMs like those in Django or Rails.

snorremd · 5 years ago

Hasura is one example of a graphql server that uses the SQL joins pattern to optimize graphql queries. See https://hasura.io/blog/architecture-of-a-high-performance-gr.... It automatically generates graphql queries based on your own schema and translates it to efficient postgres sql queries. The post I linked also briefly discuss the Dataloader pattern for batching graphql resolvers to avoid the N + 1 problem.

zwkrt · 5 years ago

For a fun hack-week project I created a graphql server that would automatically create any necessary SQL tables to support a query given to it. So if you made the query

    query {
        user {
            name
            email {
                primary
                secondary
            }
            posts {
                body
                karma
            }
        }
    }

It would create an entire database schema with users, emails, and posts, and the correct indexes and fk relations to support the graphql query. It would also generate mutations for updating the entities in a relational, cascading manner, so deleting a user would also delete their email and posts.

kevan · 5 years ago

Seems like you're looking at this through the lens of a single system that could submit a query to a single database and get all the data it needs. From that perspective GraphQL is definitely an extra layer that probably doesn't make sense. But even then there's still some value in letting the client specify the shape of the the data it needs and having client SDKs (there's definitely non-GraphQL ways to achieve these too).

My impression is GraphQL starts to shine when you have multiple backend systems, probably separated based on your org chart, and the frontend team needs to stitch them together for cohesive UX. The benchmark isn't absolute performance here, it's whether it performs better than the poor mobile app making a dozen separate API calls to different backends to stitch together a view.

kabes · 5 years ago

That's indeed one of its selling points. But most products I see that adopt graphql are exactly 1 database.

WhatIsDukkha · 5 years ago

Yeah as you say """(there's definitely non-GraphQL ways to achieve these too)."""

These are largely matter of architecture design and graphql doesn't really fix those problems (my sense is it will make those problems harder actually).

devit · 5 years ago

The advantage of GraphQL is that the code for each API endpoint, which depends on frontend design (e.g. how many comments should be visible by default on a collapsed Facebook story), is now part of the frontend codebase (as a GraphQL query, that is then automatically extracted and moved to the backend), and thus frontend and backend development are no longer entangled.

Without it or a similar system frontend developers have to ask backend developers to create or modify an API endpoint every time the website is redesigned.

Also, it allows to combine data fetching for components and subcomponents automatically without having to do that manually in backend code, and automatically supports fine-grained caching of items.

macca321 · 5 years ago

Or the frontend devs could have just, like, learned how to write sql queries... :D

A major issue with pushing it to the frontend is that malicious clients can issue unexpected requests, putting strain on the database.

If the graphql query implementation doesn't allow that level of querying on the database, then it's not offering much more before you need to speak to the backend devs than a filterable rest endpoint.

This all came up years ago with OData.

real_ben_michel · 5 years ago

Having seen many product teams implement graphQL, concerns were never around performances, and more around speed of development.

A typical product would require integrations with several existing APIs, and potentially some new ones. These would be aggregated (and normalised) into a single schema built on top of GraphQL. Then the team would build different client UIs and iterate on them.

By having a single queryable schema, it's very easy to build and rebuild interfaces as needed. Tools like Apollo and React are particularly well suited for this, as you can directly inject data into components. The team can also reason on the whole domain, rather than a collection of data sources (easier for trying out new things).

Of course, it would lead to performance issues, but why would you optimise something without validating it first with the user? Queries might be inefficient, but with just a bit of caching you can ensure acceptable user experience.

jayd16 · 5 years ago

I wonder if GraphQL would make more sense as a client side technology. The goals of dev ease seem better served by a graph the client can build (and thus span multiple remote services). Instead of transforming the backend, you simply get a better UI dev experience and the middleware handles query aggregation.

You'd want code gen to easily wrap REST services.

You could get some of the pipeline query/subquery stuff back (and lose caching) by setting up a proxy running this service or fallback to client side aggregation to span services not backed by the graph system (and maybe keep caching).

Maybe we're back to SOAP and WSDLs, though.

stevenjohns · 5 years ago

Where I'm at now is my first foray with GraphQL - Graphene on the Django backend and Apollo on the frontend.

I'm not sure if it is the implementation - and it could very well be - but there has been more overhead and complexities than with traditionally accessed REST APIs. I can't see much value-add.

This becomes a lot more apparent when you start to include TS in the mix.

Perhaps it just wasn't a good use case.

tomnipotent · 5 years ago

It all depends on whether you expect to 1) use GraphQL as an ORM replacement or 2) use GraphQL as a layer to aggregate multiple disparate services into a unified API.

Most people want #1, Graphene is a bad choice because you still have to write a lot of boilerplate code. It has the added benefit that the current process is responsible for parsing the GraphQL query and directly calling the database, vs. using something like Prisma/Hasura which (may) require a separate process which in turn calls your database (so 2 network hops).

GraphQL was never intended to be an ORM replacement, but many have steered it towards that direction. It's not a bad thing, but it's still the same level of abstraction and confusion that people have wrestled with when using traditional ORM's except now you're introducing a different query API vs. native code/fluent interfaces/SQL.

dmitriid · 5 years ago

> I don't understand the attraction to Graphql.

It's attractive primarily to frontend developers. Instead of juggling various APIs (oftne poorly designed or underdesigned due to conflicting requirements and time constraints) you have a single entry into the system with almost any view of the data you want.

Almost no one ever talks about what a nightmare it becomes on the server-side, and how inane the implementations are. And how you have to re-do so many things from scratch, inefficiently, because you really have no control of the queries coming into the system.

My takeaway from GraphQL so far has been:

- good for frontend

- usable only for internal projects where you have full control of who has access to your system, and can't bring it down because you forgot an authorisation on a field somewhere or a protection against unlimited nested queries.

rhlsthrm · 5 years ago

As a full-stack dev, I'm going to always reach for things like Hasura for building my backend from now on. It auto generates a full CRUD GraphQL API from my Postgres DB schema. Most of the backend boilerplate is eliminated this way. If I need to add additional business logic, I can use serverless functions that run between the GraphQL query from the front end and the DB operations (actions in Hasura). Most of the heavy lifting is through the front end anyways, and this keeps everything neatly in sync.

GordonS · 5 years ago

> - good for frontend

I'm not even sure about this part.

I worked on a project recently where the mobile front end team hated working with GraphQL. They were far more used to working with REST/HTTP APIs, and in this particular project they only communicated with a single backend.

The team saw it as extra layers of complexity for no benefit.

The GraphQL backend was responsible for pulling together data from several upstream systems and providing it to clients. But the architect never was able to convince me of a single benefit here compared to REST.

hurricaneSlider · 5 years ago

> usable only for internal projects where you have full control of who has access to your system, and can't bring it down because you forgot an authorization on a field somewhere or a protection against unlimited nested queries.

As someone who is building a public facing GraphQL API, I would disagree with this. Directives make it easy to add policies to types and fields in the schema itself, making it amenable to easy review.

A restful API also has the problem that if you want fine grained auth, you'll need to remember to add the policy to each controller or endpoint, so not that different.

The typed nature of GraphQL offers a way of extending and enriching behavior of your API in a very neat, cross cutting way.

For example we recently built a filtering system that introspected over collection types at startup to generate filter input types. We then built middleware that converted filter inputs into query plans for evaluation.

I previously worked at another company that offers a public REST API for public transport. Public transport info is quite a rich interconnected data set. Despite efforts to ensure that filtering was fairly generic, there was a lot of adhoc code that needed to be written to handle filtering. The code grew exponentially more complex as more filters were added. Maybe this system could have been architected in a better way, but the nature of REST doesn't make that exactly easy to do.

Bottom line is that I feel for public APIs, that there is a lot of demand for flexibility, and eventually a public facing RESTful API will grow to match or even exceed that of a GraphQL API in complexity.

jayd16 · 5 years ago

I agree with you but I do wish for something that can improve on rest in ways GraphQL at least purports to.

Query chaining/batching and specifying a sub-selection of response data seem like solid features.

The graph schema seems to make good on some of the HATEOS promises.

I like the idea of GraphQL but the downsides have me worried.

jblwps · 5 years ago

Graphiti always seemed like a cool project in this vein.

https://www.graphiti.dev/guides/

tomnipotent · 5 years ago

> but the downsides

What do you consider the downsides?

rojobuffalo · 5 years ago

GraphQL was developed by Facebook to be used in conjunction with their frontend GraphQL client library called Relay. Most people opt Apollo + Redux because they were more active early on in releasing open source, and people argue it is an easier learning curve. IMO Relay is a huge win for the frontend to deal with data dependencies; and is a much better design than Apollo + Redux.

GraphQL formalizes the contract between front and back end in a very readable and maintainable way, so they can evolve in parallel and reconcile changes in a predictable, structured place (the GraphQL schema and resolvers). And it allows the frontend, with Relay, to deal with data dependencies in a very elegant and performant way.

eveningcoffee · 5 years ago

It is attractive for a front end dev who does not have control over the backend endpoints i.e. public API like Facebook.

It clearly looks questionable adaption for a single organization.

searchableguy · 5 years ago

That is upto the graphql framework and the consumers of them. Graphql is just a query language.

You need to have data loader (batching) on the backend to avoid n+1 queries and some other similar stuff with cache to improve the performance.

You also have cache and batching on the frontend usually. Apollo client (most popular graphql client in js) uses a normalized caching strategy (overkill and a pain).

For rate/abuse limiting, graphql requires a completely different approach. It's either point based on the numbers of nodes or edges you request so you can calculate the burden of the query before you execute it or deep introspection to avoid crashing your database. Query white listing is another option.

There are few other pain points you need to implement when you scale up. So yeah defo not needed if it's only a small project.

kabes · 5 years ago

"You have to calculate the burden of the query before you execute it so you don't end up crashing your database."

This sounds like disaster waiting to happen.

cnorthwood · 5 years ago

I don't see GraphQL as an ORM type solution, I see it more like a replacement for REST.

AsyncAwait · 5 years ago

What would be the advantage of GraphQL over gRPC in a REST replacement scenario?

Doxin · 5 years ago

Honestly graphql is a fairly small step up from REST if you squint at it hard enough. You could get pretty much 90% of the effect of graphql with a REST framework and a couple of conventions:

- Have the client specify which fields to return, and return only those fields

- Use the above to allow for expanding nested objects when needed

- Specify an API schema somehow.

All GraphQL does is formalize these things into a specification. In my experience the conditional field inclusion is one of the most powerful features. I can simply create a query which contains all of the fields without paying for a performance penalty unless the client actually fetches all those fields simultaneously.

GraphQL queries tend to map rather neatly on ORM queries. Of course you run into the same sort of nonsense you get with ORMS, such as the n+1 one problem and whatnot. The same sort of tools for fixing those issues are available since your graphql query is just going to call the ORM in any case, with one large addition. Introspecting graphql queries is much easier than ORM or SQL queries. I can avoid n+1 problems by seeing if the query is going to look up a nested object and prefetch it. With an ORM I've yet to see one which allows you to do that.

Lastly GraphQL allows you to break up your API very smartly. Just because some object is nested in another doesn't mean they are nested in source code. One object type simply refers to another object type. If an object has some nested objects that needs query optimizing you can stick that optimization in a single place and stop worrying about it. All the objects referring to it will benefit from the optimization without knowing about it.

GraphQL combines all of the above rather smartly by having your entire API declared as (more or less) a single object. That only works because queries only run if you actually ask for the relevant fields to be returned. It's very elegant if you ask me!

Long story short: yes you run into the same sort of issues optimization wise you get with an ORM, but importantly they don't stack on top the problems your ORM is causing already.

osrec · 5 years ago

I couldn't agree more. While GraphQL does allow you to be explicit about what you want from your backend, I've yet to see an implementation/solution that gives you back your data efficiently. If anything, the boilerplate actually seems to introduce inefficiency, with some especially inefficient joins.

And when you are explicit about how you want to implement joins etc, you pretty much have to hand code the join anyway, so I don't see the point.

In almost all use cases that I've come across, a standard HTTP endpoint with properly selected parameters works just as well as a GraphQL endpoint, without the overhead of parsing/dealing with GraphQL.

umvi · 5 years ago

Super useful for bandwidth sensitive situations where you need to piece together a small amount of data from several APIs that normally return a large amount of data.

kodablah · 5 years ago

Pardon my naivete, are the benefits of the flexibility that GraphQL give worth the unpredictability costs (or costs of customizing to add limits) of that same flexibility compared to writing a tailored server side call to do those calls and return the limited data instead?

nawgszy · 5 years ago

I would say that you're also completely ignoring the benefits of typing. It's fair to say JS's lack of typing is a deep flaw, and so tools like TypeScript and GraphQL (which pair magically by the way; free type generation!) are ways to lift the typing from the backend to the frontend and give frontends stories around typing and mocking APIs that greatly improve the testability of the code.

goto11 · 5 years ago

I don't see the conflict? If the GraphQL query is translated into SQL on the server, then then the query optimizer would optimize that just as effectively as if the query had been written in SQL originally.

jmull · 5 years ago

The SQL your graphql implementation’s ORM middleware generates won’t optimize as well as hand-written SQL in many cases.

A decent system will provide the hooks you need to hand optimize certain cases somehow, but There are always limitations and hoops to jump through and additional complexity to manage. The extra layers that are meant to make your life easier are getting in the way instead. (May or may not still be worth it, but the point is, it’s not a foregone conclusion.)

mixedCase · 5 years ago

> then the query optimizer would optimize that just as effectively as if the query had been written in SQL originally

...and other lies we tell ourselves to sleep soundly at night.

But just like ORMs, they do work for the simple cases which tend to abound and you can hand-optimize the rest.

Deleted Comment

Dead Comment

DrFell · 5 years ago

GraphQL is instant API to frontenders.

Their justification for needing it is that the API team takes too long to implement changes, and endpoints never give them the data shape they need.

The silent reason is that server-side code, databases, and security are a big scary unknown they are too lazy to learn.

A big project cannot afford to ask for high standards from frontenders. You need a hoard of cheap labor to crank out semi-disposable UIs.

yuchi · 5 years ago

There’s also a different side of the story: UI are very pricey to get right — that’s because user research is not cheap — so a very efficient approach is having disposable UIs since you know you are going to get them wrong at the start.

Carefully designed endpoints require a lot of back and forth between teams with very different skill set so to build them you need to plan in advice, but it does not work here where you literally have to move fast (and fail fast!).

You only have two options left: (1) you either ask frontenders (or UX-wise devs) to do the whole thing or (2) you build an abstraction layer that let frontenders query arbitrarily complex data structures with near-perfect performances (YMMV).

In case (1) you’re looking for REAL full-stacks, and it’s not that easy to find such talented developers. In case (2) well… that’s GraphQL.

GordonS · 5 years ago

> Their justification for needing it is that the API team takes too long to implement changes, and endpoints never give them the data shape they need.

I've certainly seen timing issues between frontend and backed teams; actually, I don't think I've ever been on a project where that wasn't an issue!

But on my last project, which had a GraphQL backend, this was still a problem. The backend integrated with several upstream databases and REST APIs, so the backend team had to build the queries and implementations before the frontend could do anything. At least with REST they would have been able to mock out some JSON more easily.

Well, that's too bad. I always thought this was a cool project. But if you can't dev your way into decent performance for a small alpha project using python/flask/sql, I don't think your tools are the problem. And I guarantee that a graphql isn't the solution.

So, I mean, good luck.

searchableguy · 5 years ago

I didn't read the post that way. I feel scaling is a small issue more so organization. Graphql does make sense for something like source hunt.

Why not use type hints in python? Isn't that a good enough substitute?

I wonder why go instead of rust if he wanted static typing, long term ease of maintanence and performance. Go's type system is not great especially for something like graphql. Gqlgen relies heavily on code generation. Last time I used it, I ran into so many issues. I ditched go together after several painful clashes with it that community always responded with: oh you don't need this.

(yeah except they implemented all those parts in arguably worse ways and ditched community solutions in the next few years)

One major benefit the GP fails to mention is that with graphql, it is easy to generate types for frontend. This makes your frontend far more sane. It's also way easier to test graphql since there are tools to automatically generate queries for performance testing unlike rest.

There is no need to add something for docs or interactivity like swagger.

erk__ · 5 years ago

As for the reason to use Go instead of Rust it is probably just down to the creator of Sourcehut he has multiple times expressed that he dislikes rust quite a bit.

He has written a blog post about how he chooses programming languages as well https://drewdevault.com/2019/09/08/Enough-to-decide.html

CameronNemo · 5 years ago

>One major benefit the GP fails to mention is that with graphql, it is easy to generate types for frontend. This makes your frontend far more sane.

By GP (grandparent?) do you mean the article / blog post?

Because if so I see no indication that Drew plans to adopt a SPA architecture -- he seems intent on continuing to use server side rendering with little javascript, which would make "frontend types" sort of irrelevant.

TeeWEE · 5 years ago

Code generation (of types and clinet/server stubs) can be done with any IDL (interface definition language). OpenAPI, GRPC, Thrift etc etc all support it. No reason to choose GraphQL only because of this.

The power in GraphQL comes from the graph and flexibility in fetching what you need. Usefull in general purpose APIs (like GitHub has).

CameronNemo · 5 years ago

This quote stuck out to me:

>Today, the Python backends to the web services communicate directly with PostgreSQL via SQLAlchemy, but it is my intention to build out experimental replacement backends which are routed through GraphQL instead. This way, the much more performant and robust GraphQL backends become the single source of truth for all information in SourceHut.

I wonder how adding a layer of indirection can significantly improve performance. If I were writing this service, I would go all in on GraphQL and have the frontend talk to the GraphQL services directly rather than routing the requests from Python through to a GraphQL service then presumably to PostgreSQL.

Perhaps I am missing something. Indeed good luck to Drew here.

ianamartin · 5 years ago

That quote is sort of exactly what's conceptually wrong with what's goin on in my opinion. Yes, I know, armchair quarterback and I'm not the one out there building stuff like this for free, etc., etc.

But claiming some nebulous backend that's more performant and robust than Postgres is like, WTF? Are you using an actual GraphDB like Neo4J? Are you putting a graph frontend on Postgres like PostGraphQL? None of the post really makes any sense because GraphQL is a Query Language, not a data store. What are the CAP theorem tradeoffs in the new backend? What does more robust mean? What does more performant mean? This is a source control app. Those tradeoffs are meaningful.

There seems to be a lot of conflation between API design and data store and core programming tools all mixed into a big post that mostly sounds to me like, "I don't get how to make this (extremely popular and well-known platform that drives many websites 10000x my size) work well, so I'm trying something different that sounds cool."

Which, again, the author has always said this is an experiment, and that's cool. But the conceptual confusion in the post makes me think that moving away from boring tools and trying new tools is not going to end up going well.

But this is a source control app, and it's hopefully backed up somewhere besides sourcehut so it should be fine if he needs to backtrack.

ddevault · 5 years ago

It will be performing this indirection over localhost. I don't see it as much different from the indirection of SQLAlchemy. Yes, there is the question of parsing the GQL and so on, but I think that they're surmountable and fit well within our desired performance budget.

Performance is also just one of many reasons why this approach is being considered.

Deleted Comment

ddevault · 5 years ago

Performance is a secondary concern. SourceHut already has the best performance in the industry, built on Python:

https://forgeperf.org

But I think it could be even better, and this work will help. It will make it easier to write performant code without explicitly hand-optimizing everything.

There are more important reasons to consider GraphQL than performance, which I cover in detail in TFA.

pknopf · 5 years ago

> And I guarantee that a graphql isn't the solution.

I agree. Out of the pan and into the frier.

He had a good idea though.

ghoshbishakh · 5 years ago

what do you feel the problem is?

ianamartin · 5 years ago

There are lots of production sites that serve 10,000x as much traffic as sourcehut that are built on Python/flask/sqlalchemy serving RESTful APIs.

If you can't make that combination work well, there's another place to look for problems besides your tool kit. You might need to ask yourself if you really understand the tools you're trying to use.

But like I said, this has always been a very cool project. My "good luck" was meant more as actual good luck than a Morgan Freeman You're-trying-to-blackmail-batman kind of good luck.