> The QUERY method provides a solution that spans the gap between the use of GET and POST. As with POST, the input to the query operation is passed along within the content of the request rather than as part of the request URI. Unlike POST, however, the method is explicitly safe and idempotent, allowing functions like caching and automatic retries to operate.
The practical limits (ie, not standard specified limits) on query param size seems fair but its worth mentioning there are practical limits to body size as well introduced by web servers, proxies, cloud api gateways, etc or ultimately hardware. From what I can see this is something like 2kb for query params in the lower end case (Chrome's 2083 char limit) and 10-100mb (but highly variable and potentially only hardware bound) for body size.
In both cases it's worth stating that the spec is being informed by practical realities outside of the spec. In terms of the spec itself, there is no limit to either body size or query param length. How much should the spec be determined by particular implementation details of browsers, cloud services, etc.?
With a request body you can also rely on all the other standard semantics that request bodies afford, such as different content-types, content encoding. Query parameters are also often considered less secure for things like PII and (less important these days) query parameters don't really define character encoding.
But generally the most important reason is that you can take advantage of the full range of mimetypes, and yes practicality speaking there's a limit on how much you should stuff in a query parameter.
This has resulted in tons of protocols using POST, such as GraphQL. This is a nice middle ground.
- URI length limits (not just
browsers, but also things
like Jersey)
- q-params are ugly and limiting
by comparison to an arbitrarily
large and complex request body
with a MIME type
However, q-params are super convenient because the query is then part of the URI, which you can then cut-n-paste around. Of course, a convenient middle of the road is to use a QUERY which returns not just query results but also a URI with the query expressed as q-params so you can cut-n-paste it.
This avoid changing the definition of GET. Who knows how many middle boxes would mess with things if you did that because they “know” what GET means and so their thing is “safe”.
Until GET changes.
People aren’t using QUERY yet so that problem doesn’t exist.
> Unlike POST, however, the method is explicitly safe and idempotent, allowing functions like caching and automatic retries to operate.
A shitload of answers to GET requests, although cacheable, are stale though. If you issue a GET for a page which contains stuff like "Number of views: xxx" or "Number of users logged in: yyy" or "Last updated on: yyyy-mm-dd" there goes idempotency.
Some GET requests are actually idempotent ("Give me the value of AAPL at close on 2021-07-21") but many aren't.
Stale data won't break much but there's a world between "it's an idempotent call" and "I'm actually likely to be seeing stale data if I'm using a cached value".
I mean... Enter in you browser "https://example.org", that's a GET. And that's definitely not idempotent for most sites we're all using on a daily basis.
Idempotence does not mean immutability, it means that two or more identical operations have the same effect on the resource as a single one. Since GET operations, by virtue of also being safe, generally have no effect at all, this is almost always trivially true. Just because the resource's content changed for some other reason doesn't mean GET is not idempotent.
Idempotency in the context of HTTP requests isn't about the response you receive, but about the state of the resource on the server. You're supposed to be able to call GET /{resource}/{id} any amount of times without the resource {id} being different at the end. You can't do the same for POST, as POST /{resource} could create a new entity with every call.
A view counter also doesn't break this, as the view counter isn't the resource you're interacting with. As long as you're not modifying the resource on a `GET` call, the request is idempotent.
I suppose the counterargument is that "users logged in" or "views" is stale all the time, because it could have changed while the response was in flight.
If you really need live data for "views" or what have you, then perhaps the front end should be querying the backend repeatedly via a separate POST.
I was just working with OpenSearch's API recently, who sort of abuse the semantics of the HTTP spec by using GETs with a body to perform search queries, largely to solve a similar problem. A "QUERY" message type would probably easily replace the GET with a body used by OpenSearch etc. I'd go as far as to argue thats largely what this new QUERY type is - official recognition for a "GET" request with a body.
It's not abuse of the spec. The spec says that GET is allowed to have a body. Whether an endpoint must honor it is left undefined, and this has lead people to believe wrongly that that means endpoints are not allowed to expect it. But certainly the endpoint itself defines what it does with your requests, and that's what actually matters.
QUERY is just GET once more with feeling, because people are worried about existing software that made the wrong decision about denying or stripping GET bodies and would need to be updated to allow them through, and detecting an unimplemented verb is I guess maybe simpler than detecting some middle layer being a dick and dropping your data.
This is not a correct interpretation of the spec, as far as I can tell. I did some more research on this + primary sources on my blog if you're interested in why and why people are confused about this:
I don't like using the body. GET's are easily shareable, bookmarkeable, etc, (even editable by humans). etc. thans to query strings. I rather have a GET with better (shorter) serialization for parameters and a standarized max-length of 16k or something like that.
>A server MAY create or locate a resource that identifies the query operation for future use. If the server does so, the URI of the resource can be included in the Location header field of the response (see Section 10.2.2 of [HTTP]). This represents a claim that a client can send a GET request to the indicated URI to repeat the query operation just performed without resending the query parameters.
I can't stop thinking how HTTP REST is just an abuse on the original design of HTTP and the whole thing of using HTTP verbs to add meaning to requests is not a very good abstraction for RPC calls.
The web evolved from being a tool to access documents on directories to this whole apps in the cloud thing and we kept using the same tree abstraction to sync state to the server, which doesn't make a lot of sense in lots of places.
Maybe we need a better abstraction to begin with, something like discoverable native RPCs using a protocol designed for it like thrift or grpc.
HTTP as a pure transport protocol keeps coming back to the default, because it works. Its superpower is that it pushes stateless (and secure) design from end to end. You have to fight a losing battle to get around that, so if you play along, you can end up with better power efficiency, resiliency and scalability.
REST is just very simple to understand and easy to prototype with. There's better abstractions on top of HTTP, like GraphQL and gRPC (as you mentioned), but you can layer those on after you have a working solution and are looking for more performance.
HTTP3 is on the way this decade and I'm excited for its promises. Given how long it took for HTTP2 to standardize, I'm not optimistic it will be soon, but it does mean we have a path forward.
HTTP2 has already been mostly replaced by HTTP3, iirc. We now mostly have a split between HTTP1.1 And 3, if I'm remembering an article I read correctly.
> The web evolved from being a tool to access documents on directories to this whole apps in the cloud thing and we kept using the same tree abstraction to sync state to the server, which doesn't make a lot of sense in lots of places.
The first part is correct, but hard disagree on the rest. HTTP makes a lot of sense for RPC-ish things because a) it can do those things better than RPC, b) HTTP can do things that RPC typically can't (like: content type negotiations and conversions, caching, online / indefinite length content transmission, etc).
HTTP is basically a filesystem access protocol with extra semantics made possible by a) headers, b) MIME types, and if you think of some "files" as active/dynamic resources rather than static resources, then presto, you have a pretty good understanding of HTTP. ("Dynamic" means code processes a request and produces response content, possibly altering state, that a plain filesystem can't possibly do. An RDBMS is an example of "dynamic", while a filesystem is an example of "static".)
REST is quite fine. It's very nice actually, and much nicer than RPC. And everything that's nice about REST and not nice about RPC is to do with those extensible headers and MIME types, and especially semantics and cache controls.
But devs always want to generate APIs from IDLs, so RPC is always so tempting.
As for RPC, there's nothing terribly wrong with it as long as one can generate async-capable APIs from IDLs. The thing everyone hates about RPC is that typically it gets synchronous interfaces for distributed computations, which is a mistake. But RPC protocols do not imply synchronous interfaces -- that's just a mistake in the codegen tools designs.
Ultimately the non-RESTful thing about RPCs that sucks is that
- no URIs (see below response to
your point about discovery)
- nothing is exposed about RPC
idempotence, which makes
"routing" fraught
- lack of 3xx redirects, which
makes "routing" hard
- lack of cache controls
- lack of online streaming
("chunked" encoding w/
indefinite content-length)
Conversely, the things that make HTTP/REST good are:
- URIs!!
- idempotence is explicitly part
of the interface because it is
part of the protocol
- generic status codes including
redirects
- content type negotiation
- conditional requests
- byte range requests
- request/response body streaming
> Maybe we need a better abstraction to begin with, something like discoverable native RPCs using a protocol designed for it like thrift or grpc.
That's been tried. ONC RPC and DCE RPC for example had a service discovery system. It's not enough, and it can't be enough. You really need URIs. And you really need URIs to be embeddable in contents like HTML, XML, JSON, etc. -- by convention if need be (e.g., in JSON it can only be by convention / schema). You also need to be able to send extra metadata in request/response headers, including URIs.
(Re: URIs and URIs headers, HATEOAS really depends on very smart user-agents, which basically haven't materialized because it turns out that HTML+JS is enough to make UIs good, and so URIs in headers are not useful for UIs, but they are useful for APIs.)
It took me a long time to understand all of this, that REST is right and typical RPCs are lame. Many of the points are very subtle, and you might have to build a RESTful application that uses many of these features of HTTP/REST in order to come around -- that's a lot to ask for!
The industry seems to be constantly vacillating between REST and RPC. SOAP came and went; no one misses it. gRPC is the RPC of the day, but I think in the end the only nice thing about it is binary encodings and schema, and that it won't survive in the long run.
Looks interesting. Sadly it will likely be hamstring by CORS rules. When designing an API, I frequently send all non-GET requests as POSTs with content type text/plain in order for them to classify as simple requests and avoid CORS preflights, which add an entire round trip per request. Obviously only safe if you're doing proper authorization with a token or something. Another fun bit is that you have to put the token in the query string, going against best practices for things like OAuth2, because the Authorization header isn't CORS approved. CORS enforcement is an abomination.
Because of the semantics of QUERY CORS rules should apply as to GETs -- one would think anyways. The Internet-Draft ought to say something about CORS though, that's for sure.
it looks great. Covering a few pain points in modelling APIs when you need to retrieve data idempotently but the URL is not identifying a specific resource, ie, "The content returned in response to a QUERY cannot be assumed to be a representation of the resource identified by the target URI".
This is a "work in progress". Is there any estimation of when it will be finalized by? Something like during 2025, and frameworks/libraries starting to support it by something like 2026?
Just to have a reference, anyone remember how long it took for PATCH?
This appears to be a Working Group item of the HTTPbis WG [0] which has a Zulip stream [1], a mailing list [2][3], a site [4], and a GitHub organization [5], thus lots of ways to send feedback. The Internet-Draft itself has a GitHub repository [6]. For this I think a GitHub issue/issues [7] would be best; some already exist now for issues in this thread.
Many things never need explicit support, because good HTTP citizens allow unknown HTTP methods (and treat them basically as POST). True for for example PHP and fetch().
Node.js needs explicit support, this got added a few months ago. There's a good chance that as long as your server supports it, you can start using it today.
> So, given the current web standards, it's still not possible to determine what's lurking behind a URL given the correct headers? Seems like that could've been the first task for structured data on the internet
.well-known is not a good place for this kind of thing. That discovery mechanism should only be used in situations where only a domainname is known, and a feature or endpoint needs to be discovered for a full domain. In most cases you want a link.
The building blocks for most of this stuff is certainly there. There's a lot of wheels being reinvented all the time.
HTTP servers SHOULD send one or more Accept: {content_type} HTTP headers in the HTTP Response to an HTTP OPTIONS Request in order to indicate which content types can be requested from that path.
> Examples of how to represent GraphQL, SPARQL, LDP, and SOLID APIs [as properties of one or more rdfs:subClassOf schema:WebAPI (as RDFa in HTML or JSON-LD)]
> When doing so, caches SHOULD first normalize request content to remove semantically insignificant differences, thereby improving cache efficiency, by: [...]
That part sounds like it's asking for trouble. I'm curious if this will make it to the final draft. If the client mis-identifies which parts of the request body are semantically insignificant, the result would be immediate cache poisoning and fun hard-to-debug bugs.
If it's meant as a "MAY", then that seems kind of meaningless: If the client for some reason knows that one particular aspect of the request body is insignificant, it could just generate request bodies that are normalized in the first place..?
Request body normalization is not really feasible, not without having the normalization function be specified by the request body's MIME type, and MIME types specify no such thing. Besides, normalization of things like JSON is fiendishly tricky, and may not be feasible in the case of many MIME types. IMO this should be removed from the spec.
Instead the server should normalize if it can and wants to, and the resulting URI should be used by the cache. The 3xx approach might work well for this, whereas having the server immediately assign a Content-Location: URI as I propose elsewhere in this thread would not allow for server-side normalization in time for a cache.
Yeah, that’s nuts and is obviously flawed behaviour that can interact poorly with any number of things - not least of all any kind of checksums within the response.
I’m surprised to see that in a RFC.
Edit: it’s only for the cache key:
> Note that any such normalization is performed solely for the purpose of generating a cache key; it does not change the request itself.
Still super dangerous.
Edit edit: I just typed out a long message on the GitHub issue tracker for this, but submitting it errored and I’ve lost all the content. Urgh
> The QUERY method provides a solution that spans the gap between the use of GET and POST. As with POST, the input to the query operation is passed along within the content of the request rather than as part of the request URI. Unlike POST, however, the method is explicitly safe and idempotent, allowing functions like caching and automatic retries to operate.
I'm not necessarily against it - I've had the urge to send a body on a GET request before (cant recall or justify the use-case, however).
Reasons I can think of:
- browsers limit url (and therefore) query param size
- general dev ergonomics
The practical limits (ie, not standard specified limits) on query param size seems fair but its worth mentioning there are practical limits to body size as well introduced by web servers, proxies, cloud api gateways, etc or ultimately hardware. From what I can see this is something like 2kb for query params in the lower end case (Chrome's 2083 char limit) and 10-100mb (but highly variable and potentially only hardware bound) for body size.
In both cases it's worth stating that the spec is being informed by practical realities outside of the spec. In terms of the spec itself, there is no limit to either body size or query param length. How much should the spec be determined by particular implementation details of browsers, cloud services, etc.?
But generally the most important reason is that you can take advantage of the full range of mimetypes, and yes practicality speaking there's a limit on how much you should stuff in a query parameter.
This has resulted in tons of protocols using POST, such as GraphQL. This is a nice middle ground.
Also just impossible to read, and url encoding is full of parser alignment issues. I have to imagine that QUERY would support a json request body.
Not showing the URL in various logs can be a concern if your query parameters are sensitive.
Until GET changes.
People aren’t using QUERY yet so that problem doesn’t exist.
A shitload of answers to GET requests, although cacheable, are stale though. If you issue a GET for a page which contains stuff like "Number of views: xxx" or "Number of users logged in: yyy" or "Last updated on: yyyy-mm-dd" there goes idempotency.
Some GET requests are actually idempotent ("Give me the value of AAPL at close on 2021-07-21") but many aren't.
Stale data won't break much but there's a world between "it's an idempotent call" and "I'm actually likely to be seeing stale data if I'm using a cached value".
I mean... Enter in you browser "https://example.org", that's a GET. And that's definitely not idempotent for most sites we're all using on a daily basis.
A view counter also doesn't break this, as the view counter isn't the resource you're interacting with. As long as you're not modifying the resource on a `GET` call, the request is idempotent.
If you really need live data for "views" or what have you, then perhaps the front end should be querying the backend repeatedly via a separate POST.
> https://opensearch.org/docs/latest/api-reference/search/
QUERY is just GET once more with feeling, because people are worried about existing software that made the wrong decision about denying or stripping GET bodies and would need to be updated to allow them through, and detecting an unimplemented verb is I guess maybe simpler than detecting some middle layer being a dick and dropping your data.
https://evertpot.com/get-request-bodies/
>A server MAY create or locate a resource that identifies the query operation for future use. If the server does so, the URI of the resource can be included in the Location header field of the response (see Section 10.2.2 of [HTTP]). This represents a claim that a client can send a GET request to the indicated URI to repeat the query operation just performed without resending the query parameters.
Deleted Comment
The web evolved from being a tool to access documents on directories to this whole apps in the cloud thing and we kept using the same tree abstraction to sync state to the server, which doesn't make a lot of sense in lots of places.
Maybe we need a better abstraction to begin with, something like discoverable native RPCs using a protocol designed for it like thrift or grpc.
REST is just very simple to understand and easy to prototype with. There's better abstractions on top of HTTP, like GraphQL and gRPC (as you mentioned), but you can layer those on after you have a working solution and are looking for more performance.
HTTP3 is on the way this decade and I'm excited for its promises. Given how long it took for HTTP2 to standardize, I'm not optimistic it will be soon, but it does mean we have a path forward.
The first part is correct, but hard disagree on the rest. HTTP makes a lot of sense for RPC-ish things because a) it can do those things better than RPC, b) HTTP can do things that RPC typically can't (like: content type negotiations and conversions, caching, online / indefinite length content transmission, etc).
HTTP is basically a filesystem access protocol with extra semantics made possible by a) headers, b) MIME types, and if you think of some "files" as active/dynamic resources rather than static resources, then presto, you have a pretty good understanding of HTTP. ("Dynamic" means code processes a request and produces response content, possibly altering state, that a plain filesystem can't possibly do. An RDBMS is an example of "dynamic", while a filesystem is an example of "static".)
REST is quite fine. It's very nice actually, and much nicer than RPC. And everything that's nice about REST and not nice about RPC is to do with those extensible headers and MIME types, and especially semantics and cache controls.
But devs always want to generate APIs from IDLs, so RPC is always so tempting.
As for RPC, there's nothing terribly wrong with it as long as one can generate async-capable APIs from IDLs. The thing everyone hates about RPC is that typically it gets synchronous interfaces for distributed computations, which is a mistake. But RPC protocols do not imply synchronous interfaces -- that's just a mistake in the codegen tools designs.
Ultimately the non-RESTful thing about RPCs that sucks is that
Conversely, the things that make HTTP/REST good are: > Maybe we need a better abstraction to begin with, something like discoverable native RPCs using a protocol designed for it like thrift or grpc.That's been tried. ONC RPC and DCE RPC for example had a service discovery system. It's not enough, and it can't be enough. You really need URIs. And you really need URIs to be embeddable in contents like HTML, XML, JSON, etc. -- by convention if need be (e.g., in JSON it can only be by convention / schema). You also need to be able to send extra metadata in request/response headers, including URIs.
(Re: URIs and URIs headers, HATEOAS really depends on very smart user-agents, which basically haven't materialized because it turns out that HTML+JS is enough to make UIs good, and so URIs in headers are not useful for UIs, but they are useful for APIs.)
It took me a long time to understand all of this, that REST is right and typical RPCs are lame. Many of the points are very subtle, and you might have to build a RESTful application that uses many of these features of HTTP/REST in order to come around -- that's a lot to ask for!
The industry seems to be constantly vacillating between REST and RPC. SOAP came and went; no one misses it. gRPC is the RPC of the day, but I think in the end the only nice thing about it is binary encodings and schema, and that it won't survive in the long run.
This is a "work in progress". Is there any estimation of when it will be finalized by? Something like during 2025, and frameworks/libraries starting to support it by something like 2026? Just to have a reference, anyone remember how long it took for PATCH?
Node.js needs explicit support, this got added a few months ago. There's a good chance that as long as your server supports it, you can start using it today.
OPTIONS and PROPFIND don't get it.
There should be an HTTP method (or a .well-known url path prefix) to query and list every Content-Type available for a given URL.
From https://x.com/westurner/status/1111000098174050304 :
> So, given the current web standards, it's still not possible to determine what's lurking behind a URL given the correct headers? Seems like that could've been the first task for structured data on the internet
You could also use the Accept header in response to a HTTP OPTIONS request, or as part of a 415 error response.
https://httpwg.org/specs/rfc9110.html#field.accept (yes Accept can be used in responses).well-known is not a good place for this kind of thing. That discovery mechanism should only be used in situations where only a domainname is known, and a feature or endpoint needs to be discovered for a full domain. In most cases you want a link.
The building blocks for most of this stuff is certainly there. There's a lot of wheels being reinvented all the time.
I think that would solve.
HTTP servers SHOULD send one or more Accept: {content_type} HTTP headers in the HTTP Response to an HTTP OPTIONS Request in order to indicate which content types can be requested from that path.
https://github.com/schemaorg/schemaorg/issues/1423#issuecomm... :
> Examples of how to represent GraphQL, SPARQL, LDP, and SOLID APIs [as properties of one or more rdfs:subClassOf schema:WebAPI (as RDFa in HTML or JSON-LD)]
That part sounds like it's asking for trouble. I'm curious if this will make it to the final draft. If the client mis-identifies which parts of the request body are semantically insignificant, the result would be immediate cache poisoning and fun hard-to-debug bugs.
If it's meant as a "MAY", then that seems kind of meaningless: If the client for some reason knows that one particular aspect of the request body is insignificant, it could just generate request bodies that are normalized in the first place..?
Instead the server should normalize if it can and wants to, and the resulting URI should be used by the cache. The 3xx approach might work well for this, whereas having the server immediately assign a Content-Location: URI as I propose elsewhere in this thread would not allow for server-side normalization in time for a cache.
I’m surprised to see that in a RFC.
Edit: it’s only for the cache key:
> Note that any such normalization is performed solely for the purpose of generating a cache key; it does not change the request itself.
Still super dangerous.
Edit edit: I just typed out a long message on the GitHub issue tracker for this, but submitting it errored and I’ve lost all the content. Urgh