A lot of people are commenting that SemVer doesn't work, because it's still at the mercy of humans choosing good version numbers.
Elm's package manager, elm-package, actually tries to remove humans from the equation, by automatically choosing the next version number, based on a diff of the API and the exported types of a package: https://github.com/elm-lang/elm-package#publishing-updates
It's not perfect, but it's better than anything else I've seen.
It litterally changes nothing. I've had to fork a lib to bump a dependency version just recently.
The only things that protect elm from knowing dependency hell are the age of the ecosystem and the fact that there are not that many common package publishers outside of the core team.
That’s a pretty neat strategy. Elm really brings a lot of cool (and fun) sugar to dev flows. I’ve been looking for a reason (project) to use it in so I can transition from an enamored fan-person to a true evangelist :)
I'm doing this kind of automation with Maven in Java. There is a plugin (build helper I believe is the name) that gives you properties like "next.release.version", "current.release.version", "next.snapshot.version", etc.
So I've setup an infrastructure where you just click a button and it performs a release with _proper_ version number in accordance with semver, simply does the right thing. Works like a charm.
I don't understand complains about human factor when you eliminate it with ease.
No, it isn't computable (that is, correctly determining one of "these functions behave the same" or "these functions behave differently", and not "unknown") in general, as it is equivalent to the halting problem. Consider these two versions of a function, are they API compatible?
They're only truly API compatible if the program halts, but the program can be arbitrary, so proving that any 'foo' of this style are equivalent is solving the halting problem.
Of course, one can still likely get useful answers of "definitely incompatible" etc., with much more tractable analyses. AIUI, the Elm version ends up just looking at the function signature, and catches things like removing or adding arguments: for appropriately restricted languages, it is likely to even be possible to determine if downstream code will still compile, but that's not a guarantee it will behave correctly.
I'd imagine it isn't, at least depending on how you define API compatibility, and whether you're only looking at the API interfaces. Imagine two versions of a library that implement the function "add".
Version 1:
add Int -> Int -> Int
add x y = x + y
Version 2:
add Int -> Int -> Int
add x y = x * y
Both versions expose the same API interface, but the functions that conform to that interface are semantically different. A stronger type system could probably differentiate between the two functions, but I doubt you could generally compute whether both functions implement the same behavior.
Perhaps with some sort of functional extensionality you'd be able to compute compatibility perfectly, but I can't imagine that ever being feasible in practice.
That being said, what Elm does offer is still a huge improvement over humans trying to guess whether they made any breaking changes :)
Only as far as the type system supports, but I feel like constraining the minimum version unit to increment is far better than doing nothing. Disallowing incrementing the patch version when the types of existing API changes is great.
This seems like something solved decades ago with c header files, they're easy to do a diff on and the only false negative is from adding a function. Even that would be fine if you weren't export raw structs.
It seems like we gave up simple ways to do stuff like this because we hated header files and moved to tools like java and c# that eschewed them. Then they got reinvented and renamed to interfaces and we've come up with all sorts of other complicated tools (elm-package, mocking, IoC) just to recreate the functionality we lost in header files.
I love the MaxInboundMessageSize example. I've run into that many times.
Often there will be a note in the release notes about it, and I know I should read the release notes in detail when I upgrade dependencies, but like many people I don't always. Sometimes it's just laziness or complacency -- especially for "utility" libraries like for compression or encoding -- but other times it's a challenge with release notes:
* Each version has each release note published independently (or worse: only on the Releases tab in GitHub, and you have to click to expand to read each)
* The release notes are really long or dense, and breaking changes are easily missed
There's also worse problems:
* The release notes don't actually call out the breaking change (you have to read each ticket in detail)
* The release notes just say "Bug fixes" or there are no release notes
I think along with the suggestions in this article, library authors should also put effort into making good release notes. This includes realizing sometimes people are using from a couple major versions and/or years ago.
While it is a good example, you could also use the same example and conclude that the problem was inadequate tests. SemVer is great, but you can't count on dependencies that you do not control actually adhering to it, either intentionally or unintentionally.
The only thing that could have prevented something like this for sure was mentioned:
> And while nothing in our early testing sent messages larger than 256k, there were plenty of production instances that did.
To me, this was the clear failure; not the fact that some dependency broke semver. Their production system relied on being able to send messages larger than 256k, and their tests did not.
While it's easy to say, how far do you go? Do you test every bit of every upstream library you use? The ideal is probably yes, but the reality is this rarely happens.
Even with a test, you may not find this. In the IOException example, the author calls out why:
> When we upgraded, all our tests passed (because our test fixtures emulated the old behavior and our network was not unstable enough to trigger bad conditions)
The only way to catch this type of thing is to emulate the entire network side of things, and that's still only as good as your simulation of the real world. Again, reality is even if you test your upstream to this extent, you're probably mocking a bunch of things, and that may mask something in a way you won't see until possibly production use.
But isn't it impractical to test every feature of every library you are using? In an ideal world you would have everything tested in isolation as well as integration. But in practice there will always be a corner case that remains untested because you don't know all internals of the libraries you use.
God yes! For the apps that I maintain (and which have users outside my team), I enforce high-quality release notes like you describe. Representative example: https://github.com/sapcc/swift-http-import/blob/master/CHANG... (note that this also takes SemVer seriously)
A normal version number MUST take the form X.Y.Z where X, Y, and Z are non-negative integers, and MUST NOT contain leading zeroes. X is the major version, Y is the minor version, and Z is the patch version. Each element MUST increase numerically. For instance: 1.9.0 -> 1.10.0 -> 1.11.0.[1]
> Stop trying to justify your refactoring with the "public but internal" argument. If the language spec says it's public, it's public. Your intentions have nothing to do with it.
This is so wrong. APIs are for people, not tools, so intent is primary. When tools are not expressive enough to capture and enforce intent, you document it, but it's still primary. Someone using a "public" API that clearly says "for internal use only" is no different from a person that uses workarounds like reflection or direct memory access, and there is no obligation to keep things working for the.
> there is no obligation to keep things working for the.
You opened with the correct observation that APIs are mostly for people. Saying there is no obligation here contradicts the expected social norms. And even more importantly, intent does not tightly correspond with reality, and what can happen, tends to happen. The actual code actually existing always has the final say. If you intend to have the best outcome for everyone involve, conform to the unalterable realities as much as possible - if the interface should be public, make it public. If the interface should be private, make it private.
I specifically said "when tools are not expressive enough to capture and enforce intent".
Suppose you're writing a library in Python. Everything in it is public. Even the dunder class members are, because it's just name mangling, and the language spec even documents what exactly it does!
Now, is anyone going to seriously claim that every single identifier in every Python library is part of its public API, and any change that affects it is a breaking change? Because that's certainly not the "expected social norm".
Granted, Python is a somewhat extreme example. But in practice, this also comes up in languages like Java and C#, when dependencies are more intricate than what the access control system in those languages can fully express.
And then there are backdoors:
> What can happen, tends to happen. The actual code actually existing always has the final say.
You can use Reflection to access any private field of any object in Java. There's actual existing code doing that in practice, too. Does it have the final say, and does it mean that internal representation of any Java class in any shipped Java library has to be immutable, so as to not break the API clients?
The language I'm using doesn't let me express the public/private divide I wish to make correctly (e.g. "private" implementation functions for a public C macro.)
The API is 100% intended for internal use only, but someone insists on ignoring that and consuming the private API anyways. Instead of forcing them to write their own headers which silently break at runtime when function signatures change in certain calling conventions which don't check those signatures, I instead allow them to include headers with a few keywords like "private", "internal", "do_not_use", or "i_am_voiding_my_semver_warranty" in the path, perhaps only after they make some similarly scary #define s, so it's at least a build failure.
In my experience the more things are public the better. Very often a quick workaround turns into a monster bodge because some method is marked strict private instead of protected or public.
So I usually make most stuff public as such, but put internals in a namespace/scope that makes it clear that these are implementation details. Relying on implementation details always carries the risk of breaking when upgrading.
This allows for a lot of flexibility when needed, while also not polluting the "truly public" API.
Making an API public means that people can do whatever they want with it. If you are not sure if you want to allow the API in the future it should not be public. People will always look to do the laziest thing possible which might mean hooking into your "public internal API". Then you will never be able to change it and you will have to maintain it forever.
They can do whatever they want with it but you have no obligation to maintain nor support it if it’s not a documented public API, in my opinion.
It’s a bit like a house on a corner with a big front yard. People may cut through the grass to save time but you can’t blame the homeowner when he finally puts up a fence.
Django is my gold standard for this. They have great deprecation policies where they deprecate something in the same release that they add alternatives (allowing for you to fix things up before upgrading Django), they document these changes liberally and offer alternatives, make good use of the warnings system (meaning you can run tests in "deprecated functions not allowed" mode to catch stuff), and generally are careful.
I'm still shocked at the number of projects that make breaking changes without first releasing a "support both versions" release that lets people test their changes easily. Especially frustrating when you have really basic environment variable renames that could support the deprecated name as a one-liner so easily.
SemVer is a social construct, not a contract. It's nice when it applies, but you cannot rely on other developers to adhere to it.
One man's bugfix is another man's breaking change. If product A implements a workaround for a bug in product B, but the bug gets fixed in a patch version, it could break product A's code, so it becomes a breaking change. The only way to anticipate these changes is reading the change logs/release notes, and thorough automated regression testing. (Obviously unfeasible for every dependency.)
Maybe versions should be a single number, like a build number. It just gets tricky when you have multiple versions out there, each requiring patches.
Rich Hickey's take on SemVer makes for a really fantastic talk.
"Change isn't a thing. It's one of two things: Growth or Breakage." Growth means the code requires less, or provides more, or has bug fixes. Breakage is the opposite; the code requires more, provides less, or does something silly like reuse a name for something completely different.
While the distinction of construct vs contract is subtle, improved tooling will eventually elevate the "construct" to a contract.
semver needs to be combined with a package manager and strong version locking semantics for it to be useful.
Both npm and yarn in the node eco-system certainly provide this - with any remaining kinks are being ironed out fast.
Using micro-modules as dependencies is a rather pleasant experience in node/js - especially when the dependencies follow semver. This is even more true of popular modules, where authors take their versioning responsibility seriously.
I regularly use automated version updates (npm-check -u [1]/ npm audit --fix [2]). Coupled with good test coverage of my code, I've been really happy.
I've found that the golden rule of "everyone is lying to you" works well enough. Assume that every change will be a breaking change. Test everything, verify everything, and then continue to test and verify when it's in production.
I've never found standards to be all that standard.
I haven’t figured out how to implement this, but the key observation is that Semver is trying to delineate degrees of substitutability (LSP).
I think the right solution here is to build version numbers off of your black box tests.
The devil is in the details though. What’s a change to the tests mean exactly? Adding new tests is probably a patch release. Modifying tests demands at least a minor version number, but what makes it a major version number? Deleting tests probably qualifies. Removing assertions probably does too.
But now what if the author is bad at tests too? Can I substitute my own? (I could do that right now for regression testing, and in fact I do occasionally for interdepartmental issues).
> Imagine if we had that same attitude with regards to HTTP.
Well, many do. You wouldn't believe the number of broken HTTP clients/servers out there.
The difference is that most HTTP clients/servers have to work with some significant subset of the pre-existing HTTP infrastructure (otherwise why would anyone use them?) and so that constrains their implementation to be "mostly correct". It's literally network effects :).
Libraries on the other hand usually start out with a single consumer and have no such constraints on their implementation so you end up with various levels of ossified brokenness or breaking non-ossification.
SemVer is a starting point. If it's a major release you can prepare yourself mentally for a lot of breaking changes. A point release... probably not
In any case you need to read through the changelog or (absent that) the actual code diff and think about how you use the application. It's not 100%, but no versioning system will be able to identify how you use code and how you expect the semantics of an application to be.
The issue is that because not everyone follows SemVer, you have to assume that no-one follows SemVer and act defensively, otherwise you will have problems.
Ideally, there would be a tool that can inspect your entire codebase and determine if the change is "breaking". This still has issues if the change lies outside of your codebase (perhaps such as changing the configuration of your AWS services).
A problem with SemVer I often see is that it's unclear whether a project adheres to it. You just can't assume every project having x.y.z version numbers uses semantic versioning.
Build numbers is engineering, semver is marketing.
Private processes vs what you tell the world.
Build numbers unlocks delta debugging achievement. Add 'last known good' and 'found' build numbers to tickets, along with repo steps, then use diff to find bug.
Build numbers also unlocks QA/testing achievement. Add 'found', 'fixed', 'verified' fields to tickets. Now your team is certain when individual changes are ready to merge, ship/deploy.
In programming, type contracts are not. Wouldn't it be nice if we had some sort of infallible static analysis tool that absolutely determines what constitutes a breaking change in your codebase?
This is all great, but I feel like all these problems could be caught just with properly written tests. If your tests correctly cover the API usage of your code, and I mean both your code complying with the intended API and the API complying with the intended usage, then the implementation behind that API should be totally transparent. No need to check versions, release notes, or any of that, just run the API compliance tests on the new version, and if it works then your code should work too.
They built tests that explicitly assumed that the library's interface wasn't going to change:
>When we upgraded, all our tests passed (because our test fixtures emulated the old behavior)
Doing that, upgrading your dependencies and expecting everything to work just because those tests passed? That's naivete.
If they'd built decent integration tests that used the actual library (instead of "assume nothing changes" fixtures) and made more of an effort to simulate realistic scenarios then their tests probably would have flagged up most of the issues they had.
Alas, this seems to be one of the side effects of following the "test pyramid" "best practice".
Treating the test suite as your contract and having a well structured + documented test suite also makes SemVer considerations a matter of what tests changed and how did they change (new cases, new features, changes to existing features, removed features, etc.)?
Wouldn't it be easier to write optional variables, allowing you to keep the same method name? This results in cleaner code that doesn't break existing usage.
For example:
func ListItems(query Query, limit int = 0, offset int = 0) Items {
// ....
}
> But for less intrusive changes, I personally feel like you can make some minor SemVer transgressions provided...
Kind of contradicts "Often, it seems that version numbers are incremented by "gut feel" instead of any consistent semantic: "This feels like a minor version update."
> The value of MaxBufferSize was adjusted downward to 2048 because we discovered a buffer overflow in a lower level library for any larger buffer size. See issue #4144
Technically it's a major version bump, as I understand it. But security is important, so what should we do here in addition to writing it down as first sentence on release notes? Perhaps having an excuse to potentially break downstream code in the name of security should be OK and well communicated (i.e in readme)?
Elm's package manager, elm-package, actually tries to remove humans from the equation, by automatically choosing the next version number, based on a diff of the API and the exported types of a package: https://github.com/elm-lang/elm-package#publishing-updates
It's not perfect, but it's better than anything else I've seen.
The only things that protect elm from knowing dependency hell are the age of the ecosystem and the fact that there are not that many common package publishers outside of the core team.
So I've setup an infrastructure where you just click a button and it performs a release with _proper_ version number in accordance with semver, simply does the right thing. Works like a charm.
I don't understand complains about human factor when you eliminate it with ease.
But yes, tools like this would eliminate most of the semver issues.
I can't believe such logic isn't built-in to more package managers.
Of course, one can still likely get useful answers of "definitely incompatible" etc., with much more tractable analyses. AIUI, the Elm version ends up just looking at the function signature, and catches things like removing or adding arguments: for appropriately restricted languages, it is likely to even be possible to determine if downstream code will still compile, but that's not a guarantee it will behave correctly.
Perhaps with some sort of functional extensionality you'd be able to compute compatibility perfectly, but I can't imagine that ever being feasible in practice.
That being said, what Elm does offer is still a huge improvement over humans trying to guess whether they made any breaking changes :)
But behavior testing is undecidable, easily follows from halting theorem. I doubt it would even be recognizable.
It seems like we gave up simple ways to do stuff like this because we hated header files and moved to tools like java and c# that eschewed them. Then they got reinvented and renamed to interfaces and we've come up with all sorts of other complicated tools (elm-package, mocking, IoC) just to recreate the functionality we lost in header files.
Often there will be a note in the release notes about it, and I know I should read the release notes in detail when I upgrade dependencies, but like many people I don't always. Sometimes it's just laziness or complacency -- especially for "utility" libraries like for compression or encoding -- but other times it's a challenge with release notes:
* Each version has each release note published independently (or worse: only on the Releases tab in GitHub, and you have to click to expand to read each)
* The release notes are really long or dense, and breaking changes are easily missed
There's also worse problems:
* The release notes don't actually call out the breaking change (you have to read each ticket in detail)
* The release notes just say "Bug fixes" or there are no release notes
I think along with the suggestions in this article, library authors should also put effort into making good release notes. This includes realizing sometimes people are using from a couple major versions and/or years ago.
The only thing that could have prevented something like this for sure was mentioned:
> And while nothing in our early testing sent messages larger than 256k, there were plenty of production instances that did.
To me, this was the clear failure; not the fact that some dependency broke semver. Their production system relied on being able to send messages larger than 256k, and their tests did not.
Even with a test, you may not find this. In the IOException example, the author calls out why:
> When we upgraded, all our tests passed (because our test fixtures emulated the old behavior and our network was not unstable enough to trigger bad conditions)
The only way to catch this type of thing is to emulate the entire network side of things, and that's still only as good as your simulation of the real world. Again, reality is even if you test your upstream to this extent, you're probably mocking a bunch of things, and that may mask something in a way you won't see until possibly production use.
Nitpick: you are not using SemVer.
A normal version number MUST take the form X.Y.Z where X, Y, and Z are non-negative integers, and MUST NOT contain leading zeroes. X is the major version, Y is the minor version, and Z is the patch version. Each element MUST increase numerically. For instance: 1.9.0 -> 1.10.0 -> 1.11.0.[1]
[1]https://semver.org/#spec-item-2
Some of your version numbers lack the patch version.
This is so wrong. APIs are for people, not tools, so intent is primary. When tools are not expressive enough to capture and enforce intent, you document it, but it's still primary. Someone using a "public" API that clearly says "for internal use only" is no different from a person that uses workarounds like reflection or direct memory access, and there is no obligation to keep things working for the.
You opened with the correct observation that APIs are mostly for people. Saying there is no obligation here contradicts the expected social norms. And even more importantly, intent does not tightly correspond with reality, and what can happen, tends to happen. The actual code actually existing always has the final say. If you intend to have the best outcome for everyone involve, conform to the unalterable realities as much as possible - if the interface should be public, make it public. If the interface should be private, make it private.
Suppose you're writing a library in Python. Everything in it is public. Even the dunder class members are, because it's just name mangling, and the language spec even documents what exactly it does!
Now, is anyone going to seriously claim that every single identifier in every Python library is part of its public API, and any change that affects it is a breaking change? Because that's certainly not the "expected social norm".
Granted, Python is a somewhat extreme example. But in practice, this also comes up in languages like Java and C#, when dependencies are more intricate than what the access control system in those languages can fully express.
And then there are backdoors:
> What can happen, tends to happen. The actual code actually existing always has the final say.
You can use Reflection to access any private field of any object in Java. There's actual existing code doing that in practice, too. Does it have the final say, and does it mean that internal representation of any Java class in any shipped Java library has to be immutable, so as to not break the API clients?
The API is 100% intended for internal use only, but someone insists on ignoring that and consuming the private API anyways. Instead of forcing them to write their own headers which silently break at runtime when function signatures change in certain calling conventions which don't check those signatures, I instead allow them to include headers with a few keywords like "private", "internal", "do_not_use", or "i_am_voiding_my_semver_warranty" in the path, perhaps only after they make some similarly scary #define s, so it's at least a build failure.
I see this a lot in Java libraries, for instance.
So I usually make most stuff public as such, but put internals in a namespace/scope that makes it clear that these are implementation details. Relying on implementation details always carries the risk of breaking when upgrading.
This allows for a lot of flexibility when needed, while also not polluting the "truly public" API.
It’s a bit like a house on a corner with a big front yard. People may cut through the grass to save time but you can’t blame the homeowner when he finally puts up a fence.
I'm still shocked at the number of projects that make breaking changes without first releasing a "support both versions" release that lets people test their changes easily. Especially frustrating when you have really basic environment variable renames that could support the deprecated name as a one-liner so easily.
Give people the space to upgrade please!
It's also amazing how long they stayed at 0.96 despite being very stable. Then 1.x for a long time too.
It has a cost though: django has a hard time going async since it breaks everything.
All in all, the python community has a good culture for this. Even the 2 to 3 migration was given more than a decade to proceed.
Yet, i feel like we still get a lot of complains, while in js or ruby land you can break things regularly and fans scream it's fine.
One man's bugfix is another man's breaking change. If product A implements a workaround for a bug in product B, but the bug gets fixed in a patch version, it could break product A's code, so it becomes a breaking change. The only way to anticipate these changes is reading the change logs/release notes, and thorough automated regression testing. (Obviously unfeasible for every dependency.)
Maybe versions should be a single number, like a build number. It just gets tricky when you have multiple versions out there, each requiring patches.
"Change isn't a thing. It's one of two things: Growth or Breakage." Growth means the code requires less, or provides more, or has bug fixes. Breakage is the opposite; the code requires more, provides less, or does something silly like reuse a name for something completely different.
I recommend the whole talk, but the specific beef about SemVer starts here: https://youtu.be/oyLBGkS5ICk?t=1792
semver needs to be combined with a package manager and strong version locking semantics for it to be useful.
Both npm and yarn in the node eco-system certainly provide this - with any remaining kinks are being ironed out fast.
Using micro-modules as dependencies is a rather pleasant experience in node/js - especially when the dependencies follow semver. This is even more true of popular modules, where authors take their versioning responsibility seriously.
I regularly use automated version updates (npm-check -u [1]/ npm audit --fix [2]). Coupled with good test coverage of my code, I've been really happy.
[1] https://www.npmjs.com/package/npm-check [2] https://docs.npmjs.com/getting-started/running-a-security-au...
Whats the solution here? Fuck standards? Imagine if we had that same attitude with regards to HTTP.
I've never found standards to be all that standard.
I think the right solution here is to build version numbers off of your black box tests.
The devil is in the details though. What’s a change to the tests mean exactly? Adding new tests is probably a patch release. Modifying tests demands at least a minor version number, but what makes it a major version number? Deleting tests probably qualifies. Removing assertions probably does too.
But now what if the author is bad at tests too? Can I substitute my own? (I could do that right now for regression testing, and in fact I do occasionally for interdepartmental issues).
Well, many do. You wouldn't believe the number of broken HTTP clients/servers out there.
The difference is that most HTTP clients/servers have to work with some significant subset of the pre-existing HTTP infrastructure (otherwise why would anyone use them?) and so that constrains their implementation to be "mostly correct". It's literally network effects :).
Libraries on the other hand usually start out with a single consumer and have no such constraints on their implementation so you end up with various levels of ossified brokenness or breaking non-ossification.
In any case you need to read through the changelog or (absent that) the actual code diff and think about how you use the application. It's not 100%, but no versioning system will be able to identify how you use code and how you expect the semantics of an application to be.
Ideally, there would be a tool that can inspect your entire codebase and determine if the change is "breaking". This still has issues if the change lies outside of your codebase (perhaps such as changing the configuration of your AWS services).
Or use ComVer
https://github.com/staltz/comver/blob/master/README.md
Private processes vs what you tell the world.
Build numbers unlocks delta debugging achievement. Add 'last known good' and 'found' build numbers to tickets, along with repo steps, then use diff to find bug.
Build numbers also unlocks QA/testing achievement. Add 'found', 'fixed', 'verified' fields to tickets. Now your team is certain when individual changes are ready to merge, ship/deploy.
>When we upgraded, all our tests passed (because our test fixtures emulated the old behavior)
Doing that, upgrading your dependencies and expecting everything to work just because those tests passed? That's naivete.
If they'd built decent integration tests that used the actual library (instead of "assume nothing changes" fixtures) and made more of an effort to simulate realistic scenarios then their tests probably would have flagged up most of the issues they had.
Alas, this seems to be one of the side effects of following the "test pyramid" "best practice".
edit: pasted here –
For example:
Kind of contradicts "Often, it seems that version numbers are incremented by "gut feel" instead of any consistent semantic: "This feels like a minor version update."
> The value of MaxBufferSize was adjusted downward to 2048 because we discovered a buffer overflow in a lower level library for any larger buffer size. See issue #4144
Technically it's a major version bump, as I understand it. But security is important, so what should we do here in addition to writing it down as first sentence on release notes? Perhaps having an excuse to potentially break downstream code in the name of security should be OK and well communicated (i.e in readme)?