Add opt-in transparent telemetry to Go toolchain

bsaul · 2 years ago

Very relieved that they chose to back off the initial opt-out proposal. It’s always refreshing to see a language listening to its user base. This was the kind of decision that could instantly change the reputation of a PL, for purely political / psychological reasons.

varispeed · 2 years ago

It's a classic case of Overton Window

They get users accustomed to the idea that now telemetry is in the toolchain. Next step will be to "accidentally" turn it on for everyone.

Then it will be "oopsie daisy" everyone is okay? See. Nothing happened, so we'll leave it opt out.

stanleydrew · 2 years ago

This implies some sort of malicious intent on the part of the Go maintainers, which doesn't really seem fair given the way the original proposal was written and how feedback was incorporated.

rob74 · 2 years ago

I very much doubt that everyone would be okay with them "accidentally" turning telemetry on by default while continuing to say that it's opt-in. In fact, it would be very damaging for the Go project's reputation, especially since everyone already associates Google with "spying on users"...

deafpolygon · 2 years ago

This will be the case. They're just moving the goal post to a safe distance for now.

barsonme · 2 years ago

This is a very unfair and unfounded criticism of the Go maintainers.

duped · 2 years ago

> This was the kind of decision that could instantly change the reputation of a PL, for purely political / psychological reasons.

Go already has a bad reputation for political reasons.

bsaul · 2 years ago

care to give more details ? I'm not sure what you're thinking about.

cesarb · 2 years ago

I think adding telemetry to a compiler goes in the opposite direction of the current "minimal privileges, sandbox all the things" trend.

For instance, it would make sense to create a set of selinux rules (or something like it) to make sure that a compiler cannot do anything other than reading its input files (and system headers/libraries/etc) and writing to its output directory, even if for instance a buffer overflow triggered by a malicious source code file led to running shell code within the compiler. Having to allow access to the network for the telemetry would require weakening these rules.

It reminds me of the classic "confused deputy" article (https://css.csail.mit.edu/6.858/2015/readings/confused-deput...), which coincidentally also involved a compiler tracking statistics about its usage.

arp242 · 2 years ago

That would actually already be difficult with the current go tool, since it's more than "just" a compiler but also fetches dependencies. If all dependencies are already in place it won't hit the network, so there are options, but you'd have to find another way to retrieve those.

The telemetry is opt-in,and failing to send them won't fail the compile (it won't even run on every compile). It's not really preventing you from applying your SELinux policy if you want, even if it would have been opt-out.

duped · 2 years ago

> to make sure that a compiler cannot do anything other than reading its input files (and system headers/libraries/etc)

Define "input files." Tools have to do a combination of reading, parsing, and sometimes even version unification/downloads just to get the complete set of inputs to feed to the compiler.

Of course you can define the compiler as the tool that parses text and writes machine code, but then you're just shoveling dirty water around.

kklisura · 2 years ago

Can someone explain how we managed to have programming languages and toolchain development for half a century without using telemetry, but somehow we need it today in our tools? Their Why Telemetry? blog, just doesn't cut it for me [1].

[1] https://research.swtch.com/telemetry-intro

estebank · 2 years ago

Humans managed to live for most of history without penicillin, or even boiling water, at the cost of most humans dying before making it to adolescence. People managed to have global communications with only steam ships and telegraph, at the cost of slower pace of information dissemination. NASA managed to make it to the moon with less computing power than the cellphone in your pocket, at high resource, monetary, human and time costs. Cars managed to work with more rudimentary design than today's, without any computers, at the cost of lower life-spans, lower efficiency and higher pollution.

You can make many arguments for and against telemetry in developer tools. Not acknowledging that telemetry helps with visibility into how those tools actually work in the wild, which in turn helps lower the incidence of bugs and speed up development, is disingenuous. You can arrive to the conclusion that even inert, opt-in telemetry is not worth it, but don't disregard out of hand the utility of it in helping their development as if it were some crazy idea.

hinkley · 2 years ago

And yet when NASA went to the moon, they had telemetry data.

arp242 · 2 years ago

You can say "we managed to do X without Y" for a lot of values of X and Y.

I think that's the wrong way to go about things; instead it's more useful to ask "will this be useful?"

There's a long list of real-world use cases in part 3 of that blog series.

I miss telemetry in my app sometimes too; there's some features where I wonder if anyone actually uses this, and I also don't really know what kind of things people run in to. Simply "ask people" is tricky, as I don't really have a way to contact everyone who cloned my git repo, and in general most people tend to be conservative in reporting feedback. I have found this a problem even in a company setting with internal software: people would tell me issues they've been frustrated at for months over beers in the pub, when this was sometimes just a simple 5 minute tweak that I would be happy to make.

Can I make my app without telemetry? Obviously, yes. And I have no plans to ever add it. But that doesn't mean it's not useful.

jjav · 2 years ago

> instead it's more useful to ask "will this be useful?"

Well, that's also the wrong way to look at it. Because everything, no matter how broadly bad it might be, is useful to someone somewhere.

Of course telemetry can be useful to the developer of the application (if they look at the data and act on it). But at the same time it violates the privacy of all its users, who vastly outnumber (at least for most projects) its developers.

For any argument we need to look at pros & cons, not just the pros.

JohnFen · 2 years ago

> I think that's the wrong way to go about things; instead it's more useful to ask "will this be useful?"

I think that's the wrong question as well. The right question is "does this provide benefits in excess of the costs"?

Deleted Comment

gjulianm · 2 years ago

You can manage to have things without telemetry, but having telemetry is incredibly useful. I think the example of "how much of our user base actually uses these features" is a very good one, specially in a compiler where maintaining old features could be adding a lot of complexity to the code base. And, as they also explain, a lot of bugs and undesired behaviors are things that the users won't know they have to report and just accept as part of the normal behavior. Things like cache misses, slowed down compilation times in certain situations, sporadic crashes... All of those things could be improved if the developers knew about them.

estebank · 2 years ago

A common concern I see is people being worried that the reaction to having "how much of our user base actually uses these features" answered will be to remove the feature entirely, when in practice it is more of "we think implementation A of the feature is no longer in use and we have migrated everyone to implementation B, is that the case?" or "we have implementation A produce user visible effects, while we have implementation B run parallel to it to detect divergences, have any been detected in the wild in the past X months?".

In Rust we've had large migrations like that (the "new" borrow checker comes to mind) and we had a really long periods of time where they were tested against the latest crate versions on crates.io until crater came back clean, and even then we had bug reports about regressions in the wild only after it was released on stable.

For me personally there's one big blind spot when testing only against published code, no matter how big the corpus is: humans are excellent fuzzers and the malformed code they write and try to compile is hard to replicate. Having visibility into uncommon cases that are only visible on users machines would be incredibly useful. An example of this could be the "botched" 1.52.0 Rust release[1], where invalid incremental compilations were changed from silent to visible Internal Compiler Errors in nightly for several releases to the point where the team felt all of the outstanding incr comp bugs related to them had been addressed, but when turned on in stable immediately hit users in the real world, making it necessary to do an emergency dot-release reverting that change. If we had telemetry on stable compilers, instead of turning on the silent to ICE change we could have added a metric for the silent error and know ahead of time that the feature wasn't ready for prime time. With more telemetry we could have known what was causing this. Snooping over a user's shoulder while they hit the case could make identifying the bug trivial, but of course then that user would then turn around and ask me in unfriendly terms "who are you and how did you get in?"

Arguments can be made against even implementing telemetry, and they can be compelling enough to elect against it, but shouting down the conversation from even happening is not helpful.

[1]: https://blog.rust-lang.org/2021/05/10/Rust-1.52.1.html

VancouverMan · 2 years ago

Among the telemetry-collecting applications and websites I use, or have used, I can't think of a single case where the software obviously improved due to this data being collected.

In fact, it seems to be the opposite in some cases. Firefox and Windows, for example, have generally become significantly worse for me over time, despite the telemetry that they're collecting.

In the "best" case, software like Visual Studio Code and Homebrew have merely remained mediocre.

I've seen much better results from developers who base decisions on feedback and bug reports that have been manually submitted by users, rather than trying to make assumptions based on automatically-collected telemetry data.

chaxor · 2 years ago

Telemetry is incredibly useful for ensuring that people don't use the product.

Many likely be doing a lot of switching away from Go intentionally, due to this.

jjav · 2 years ago

> Can someone explain how we managed to have programming languages and toolchain development for half a century without using telemetry, but somehow we need it today in our tools?

Back in the 90s in companies I worked for, attempting to add any kind of phone-home code was a fireable offense. Or at least, would get you a very stiff talking to from a few very high up people in the organization. You'd never even think of doing that again. Customer trust and privacy was paramount.

As we all know, spyware slowly started creeping into end user apps and later became a flood. Now it's difficult to find any consumer app that doesn't continuously leak everything the user does.

It's become so normalized that now even developers tools seem to think it's somehow ok to leak user data.

JohnFen · 2 years ago

I think there is a whole generation of developers who have no experience with how to do these things in the absence of telemetry, so they genuinely believe it's not possible.

eternalban · 2 years ago

Two things happened in '00s in relation to this question. One, on demand computing infrastructure (cloud), and two, the scaling requirements of a new breed of networked services. The germinal change in response to these shifts was that processes replaced components, and system boundaries spanned devices.

When your code is running on application servers and your applications are composed of components, all the tools were already there, in the OS and as add ons, like dtrace, and in whatever monitoring tools came with your application server. Today, instead of components, we compose systems out of (lightweight) processes, and processes can be created on any device, and the replacement for the application server is the whole gamut of k8, terraform, elastic, ..., etc.

Nothing has changed in the abstract structure of our systems, its just that the current approach has the beast dismembered and spread out and loosely connected via protocols, instead of a linker or a dispatch mechanism of a platform.

shadowgovt · 2 years ago

It'll be interesting to see what effect telemetry has on the ongoing development of the tools and language, since we don't have another tool / language chain to compare it to.

metaltyphoon · 2 years ago

You can somewhat compare to dotnet where it has opt-out telemetry.

jeroenhd · 2 years ago

I like that they changed their approach to opt-in. I also like how much effort they've put into making the data collection as anonymous as possible despite being a Google project.

Well done, golang team. Other companies with supposedly open languages (looking at you, Microsoft) can learn a thing or two from you.

liampulles · 2 years ago

Given how happy people seem to be sending large parts of their codebases to LLMs these days, privacy concerns over telemetry logging look quaint in comparison.

potamic · 2 years ago

Programs running on your machine have access to much more than the code they are working with.

JohnFen · 2 years ago

The actions of a few shouldn't be taken as representative of the whole.

laputan_machine · 2 years ago

Google doing more spying, unsurprising. At least it's opt-in (for now)

jeroenhd · 2 years ago

As long as they keep this opt-in, I see no reason to accuse them if anything. The telemetry collection design is clearly made to be as privacy preserving as possible.

There's always the risk that they'll roll out the telemetry setup now as an opt-in feature and then switch it to opt-out down the line, but I don't think this is the current team's intention.

parabyl · 2 years ago

It'll be interesting to see how this is accepted by the community at large. I think explicit opt-in, combined with having the discussion about which metrics to collect in public, would be enough assurance for most that this isn't a bad idea - but doesn't neccesarily mean that most will opt-in as a result.

JohnFen · 2 years ago

That they made it opt-in means that I will no longer completely rule out using golang. I don't know if I'd actually opt in, though. I haven't evaluated that issue.

TDiblik · 2 years ago

I might be wrong, but isn't dotnet telemetry opt-out by default?