Very relieved that they chose to back off the initial opt-out proposal. It’s always refreshing to see a language listening to its user base. This was the kind of decision that could instantly change the reputation of a PL, for purely political / psychological reasons.
This implies some sort of malicious intent on the part of the Go maintainers, which doesn't really seem fair given the way the original proposal was written and how feedback was incorporated.
I very much doubt that everyone would be okay with them "accidentally" turning telemetry on by default while continuing to say that it's opt-in. In fact, it would be very damaging for the Go project's reputation, especially since everyone already associates Google with "spying on users"...
I think adding telemetry to a compiler goes in the opposite direction of the current "minimal privileges, sandbox all the things" trend.
For instance, it would make sense to create a set of selinux rules (or something like it) to make sure that a compiler cannot do anything other than reading its input files (and system headers/libraries/etc) and writing to its output directory, even if for instance a buffer overflow triggered by a malicious source code file led to running shell code within the compiler. Having to allow access to the network for the telemetry would require weakening these rules.
That would actually already be difficult with the current go tool, since it's more than "just" a compiler but also fetches dependencies. If all dependencies are already in place it won't hit the network, so there are options, but you'd have to find another way to retrieve those.
The telemetry is opt-in,and failing to send them won't fail the compile (it won't even run on every compile). It's not really preventing you from applying your SELinux policy if you want, even if it would have been opt-out.
> to make sure that a compiler cannot do anything other than reading its input files (and system headers/libraries/etc)
Define "input files." Tools have to do a combination of reading, parsing, and sometimes even version unification/downloads just to get the complete set of inputs to feed to the compiler.
Of course you can define the compiler as the tool that parses text and writes machine code, but then you're just shoveling dirty water around.
Can someone explain how we managed to have programming languages and toolchain development for half a century without using telemetry, but somehow we need it today in our tools? Their Why Telemetry? blog, just doesn't cut it for me [1].
Humans managed to live for most of history without penicillin, or even boiling water, at the cost of most humans dying before making it to adolescence. People managed to have global communications with only steam ships and telegraph, at the cost of slower pace of information dissemination. NASA managed to make it to the moon with less computing power than the cellphone in your pocket, at high resource, monetary, human and time costs. Cars managed to work with more rudimentary design than today's, without any computers, at the cost of lower life-spans, lower efficiency and higher pollution.
You can make many arguments for and against telemetry in developer tools. Not acknowledging that telemetry helps with visibility into how those tools actually work in the wild, which in turn helps lower the incidence of bugs and speed up development, is disingenuous. You can arrive to the conclusion that even inert, opt-in telemetry is not worth it, but don't disregard out of hand the utility of it in helping their development as if it were some crazy idea.
You can say "we managed to do X without Y" for a lot of values of X and Y.
I think that's the wrong way to go about things; instead it's more useful to ask "will this be useful?"
There's a long list of real-world use cases in part 3 of that blog series.
I miss telemetry in my app sometimes too; there's some features where I wonder if anyone actually uses this, and I also don't really know what kind of things people run in to. Simply "ask people" is tricky, as I don't really have a way to contact everyone who cloned my git repo, and in general most people tend to be conservative in reporting feedback. I have found this a problem even in a company setting with internal software: people would tell me issues they've been frustrated at for months over beers in the pub, when this was sometimes just a simple 5 minute tweak that I would be happy to make.
Can I make my app without telemetry? Obviously, yes. And I have no plans to ever add it. But that doesn't mean it's not useful.
> instead it's more useful to ask "will this be useful?"
Well, that's also the wrong way to look at it. Because everything, no matter how broadly bad it might be, is useful to someone somewhere.
Of course telemetry can be useful to the developer of the application (if they look at the data and act on it). But at the same time it violates the privacy of all its users, who vastly outnumber (at least for most projects) its developers.
For any argument we need to look at pros & cons, not just the pros.
You can manage to have things without telemetry, but having telemetry is incredibly useful. I think the example of "how much of our user base actually uses these features" is a very good one, specially in a compiler where maintaining old features could be adding a lot of complexity to the code base. And, as they also explain, a lot of bugs and undesired behaviors are things that the users won't know they have to report and just accept as part of the normal behavior. Things like cache misses, slowed down compilation times in certain situations, sporadic crashes... All of those things could be improved if the developers knew about them.
A common concern I see is people being worried that the reaction to having "how much of our user base actually uses these features" answered will be to remove the feature entirely, when in practice it is more of "we think implementation A of the feature is no longer in use and we have migrated everyone to implementation B, is that the case?" or "we have implementation A produce user visible effects, while we have implementation B run parallel to it to detect divergences, have any been detected in the wild in the past X months?".
In Rust we've had large migrations like that (the "new" borrow checker comes to mind) and we had a really long periods of time where they were tested against the latest crate versions on crates.io until crater came back clean, and even then we had bug reports about regressions in the wild only after it was released on stable.
For me personally there's one big blind spot when testing only against published code, no matter how big the corpus is: humans are excellent fuzzers and the malformed code they write and try to compile is hard to replicate. Having visibility into uncommon cases that are only visible on users machines would be incredibly useful. An example of this could be the "botched" 1.52.0 Rust release[1], where invalid incremental compilations were changed from silent to visible Internal Compiler Errors in nightly for several releases to the point where the team felt all of the outstanding incr comp bugs related to them had been addressed, but when turned on in stable immediately hit users in the real world, making it necessary to do an emergency dot-release reverting that change. If we had telemetry on stable compilers, instead of turning on the silent to ICE change we could have added a metric for the silent error and know ahead of time that the feature wasn't ready for prime time. With more telemetry we could have known what was causing this. Snooping over a user's shoulder while they hit the case could make identifying the bug trivial, but of course then that user would then turn around and ask me in unfriendly terms "who are you and how did you get in?"
Arguments can be made against even implementing telemetry, and they can be compelling enough to elect against it, but shouting down the conversation from even happening is not helpful.
Among the telemetry-collecting applications and websites I use, or have used, I can't think of a single case where the software obviously improved due to this data being collected.
In fact, it seems to be the opposite in some cases. Firefox and Windows, for example, have generally become significantly worse for me over time, despite the telemetry that they're collecting.
In the "best" case, software like Visual Studio Code and Homebrew have merely remained mediocre.
I've seen much better results from developers who base decisions on feedback and bug reports that have been manually submitted by users, rather than trying to make assumptions based on automatically-collected telemetry data.
> Can someone explain how we managed to have programming languages and toolchain development for half a century without using telemetry, but somehow we need it today in our tools?
Back in the 90s in companies I worked for, attempting to add any kind of phone-home code was a fireable offense. Or at least, would get you a very stiff talking to from a few very high up people in the organization. You'd never even think of doing that again. Customer trust and privacy was paramount.
As we all know, spyware slowly started creeping into end user apps and later became a flood. Now it's difficult to find any consumer app that doesn't continuously leak everything the user does.
It's become so normalized that now even developers tools seem to think it's somehow ok to leak user data.
I think there is a whole generation of developers who have no experience with how to do these things in the absence of telemetry, so they genuinely believe it's not possible.
Two things happened in '00s in relation to this question. One, on demand computing infrastructure (cloud), and two, the scaling requirements of a new breed of networked services. The germinal change in response to these shifts was that processes replaced components, and system boundaries spanned devices.
When your code is running on application servers and your applications are composed of components, all the tools were already there, in the OS and as add ons, like dtrace, and in whatever monitoring tools came with your application server. Today, instead of components, we compose systems out of (lightweight) processes, and processes can be created on any device, and the replacement for the application server is the whole gamut of k8, terraform, elastic, ..., etc.
Nothing has changed in the abstract structure of our systems, its just that the current approach has the beast dismembered and spread out and loosely connected via protocols, instead of a linker or a dispatch mechanism of a platform.
It'll be interesting to see what effect telemetry has on the ongoing development of the tools and language, since we don't have another tool / language chain to compare it to.
I like that they changed their approach to opt-in. I also like how much effort they've put into making the data collection as anonymous as possible despite being a Google project.
Well done, golang team. Other companies with supposedly open languages (looking at you, Microsoft) can learn a thing or two from you.
Given how happy people seem to be sending large parts of their codebases to LLMs these days, privacy concerns over telemetry logging look quaint in comparison.
As long as they keep this opt-in, I see no reason to accuse them if anything. The telemetry collection design is clearly made to be as privacy preserving as possible.
There's always the risk that they'll roll out the telemetry setup now as an opt-in feature and then switch it to opt-out down the line, but I don't think this is the current team's intention.
It'll be interesting to see how this is accepted by the community at large. I think explicit opt-in, combined with having the discussion about which metrics to collect in public, would be enough assurance for most that this isn't a bad idea - but doesn't neccesarily mean that most will opt-in as a result.
That they made it opt-in means that I will no longer completely rule out using golang. I don't know if I'd actually opt in, though. I haven't evaluated that issue.
They get users accustomed to the idea that now telemetry is in the toolchain. Next step will be to "accidentally" turn it on for everyone.
Then it will be "oopsie daisy" everyone is okay? See. Nothing happened, so we'll leave it opt out.
Go already has a bad reputation for political reasons.
For instance, it would make sense to create a set of selinux rules (or something like it) to make sure that a compiler cannot do anything other than reading its input files (and system headers/libraries/etc) and writing to its output directory, even if for instance a buffer overflow triggered by a malicious source code file led to running shell code within the compiler. Having to allow access to the network for the telemetry would require weakening these rules.
It reminds me of the classic "confused deputy" article (https://css.csail.mit.edu/6.858/2015/readings/confused-deput...), which coincidentally also involved a compiler tracking statistics about its usage.
The telemetry is opt-in,and failing to send them won't fail the compile (it won't even run on every compile). It's not really preventing you from applying your SELinux policy if you want, even if it would have been opt-out.
Define "input files." Tools have to do a combination of reading, parsing, and sometimes even version unification/downloads just to get the complete set of inputs to feed to the compiler.
Of course you can define the compiler as the tool that parses text and writes machine code, but then you're just shoveling dirty water around.
[1] https://research.swtch.com/telemetry-intro
You can make many arguments for and against telemetry in developer tools. Not acknowledging that telemetry helps with visibility into how those tools actually work in the wild, which in turn helps lower the incidence of bugs and speed up development, is disingenuous. You can arrive to the conclusion that even inert, opt-in telemetry is not worth it, but don't disregard out of hand the utility of it in helping their development as if it were some crazy idea.
I think that's the wrong way to go about things; instead it's more useful to ask "will this be useful?"
There's a long list of real-world use cases in part 3 of that blog series.
I miss telemetry in my app sometimes too; there's some features where I wonder if anyone actually uses this, and I also don't really know what kind of things people run in to. Simply "ask people" is tricky, as I don't really have a way to contact everyone who cloned my git repo, and in general most people tend to be conservative in reporting feedback. I have found this a problem even in a company setting with internal software: people would tell me issues they've been frustrated at for months over beers in the pub, when this was sometimes just a simple 5 minute tweak that I would be happy to make.
Can I make my app without telemetry? Obviously, yes. And I have no plans to ever add it. But that doesn't mean it's not useful.
Well, that's also the wrong way to look at it. Because everything, no matter how broadly bad it might be, is useful to someone somewhere.
Of course telemetry can be useful to the developer of the application (if they look at the data and act on it). But at the same time it violates the privacy of all its users, who vastly outnumber (at least for most projects) its developers.
For any argument we need to look at pros & cons, not just the pros.
I think that's the wrong question as well. The right question is "does this provide benefits in excess of the costs"?
Deleted Comment
In Rust we've had large migrations like that (the "new" borrow checker comes to mind) and we had a really long periods of time where they were tested against the latest crate versions on crates.io until crater came back clean, and even then we had bug reports about regressions in the wild only after it was released on stable.
For me personally there's one big blind spot when testing only against published code, no matter how big the corpus is: humans are excellent fuzzers and the malformed code they write and try to compile is hard to replicate. Having visibility into uncommon cases that are only visible on users machines would be incredibly useful. An example of this could be the "botched" 1.52.0 Rust release[1], where invalid incremental compilations were changed from silent to visible Internal Compiler Errors in nightly for several releases to the point where the team felt all of the outstanding incr comp bugs related to them had been addressed, but when turned on in stable immediately hit users in the real world, making it necessary to do an emergency dot-release reverting that change. If we had telemetry on stable compilers, instead of turning on the silent to ICE change we could have added a metric for the silent error and know ahead of time that the feature wasn't ready for prime time. With more telemetry we could have known what was causing this. Snooping over a user's shoulder while they hit the case could make identifying the bug trivial, but of course then that user would then turn around and ask me in unfriendly terms "who are you and how did you get in?"
Arguments can be made against even implementing telemetry, and they can be compelling enough to elect against it, but shouting down the conversation from even happening is not helpful.
[1]: https://blog.rust-lang.org/2021/05/10/Rust-1.52.1.html
In fact, it seems to be the opposite in some cases. Firefox and Windows, for example, have generally become significantly worse for me over time, despite the telemetry that they're collecting.
In the "best" case, software like Visual Studio Code and Homebrew have merely remained mediocre.
I've seen much better results from developers who base decisions on feedback and bug reports that have been manually submitted by users, rather than trying to make assumptions based on automatically-collected telemetry data.
Many likely be doing a lot of switching away from Go intentionally, due to this.
Back in the 90s in companies I worked for, attempting to add any kind of phone-home code was a fireable offense. Or at least, would get you a very stiff talking to from a few very high up people in the organization. You'd never even think of doing that again. Customer trust and privacy was paramount.
As we all know, spyware slowly started creeping into end user apps and later became a flood. Now it's difficult to find any consumer app that doesn't continuously leak everything the user does.
It's become so normalized that now even developers tools seem to think it's somehow ok to leak user data.
When your code is running on application servers and your applications are composed of components, all the tools were already there, in the OS and as add ons, like dtrace, and in whatever monitoring tools came with your application server. Today, instead of components, we compose systems out of (lightweight) processes, and processes can be created on any device, and the replacement for the application server is the whole gamut of k8, terraform, elastic, ..., etc.
Nothing has changed in the abstract structure of our systems, its just that the current approach has the beast dismembered and spread out and loosely connected via protocols, instead of a linker or a dispatch mechanism of a platform.
Well done, golang team. Other companies with supposedly open languages (looking at you, Microsoft) can learn a thing or two from you.
There's always the risk that they'll roll out the telemetry setup now as an opt-in feature and then switch it to opt-out down the line, but I don't think this is the current team's intention.