Readit News logoReadit News
tylerl commented on Reproducible Builds in January 2022   reproducible-builds.org/r... · Posted by u/pabs3
dlor · 4 years ago
Hey Tyler!

Funny to see you here. Matt and I haven't given up on this, we're giving a lot of that another try at Chainguard.

tylerl · 4 years ago
Sweet. Glad to hear someone's working on it who knows what they're doing. :-P
tylerl commented on Reproducible Builds in January 2022   reproducible-builds.org/r... · Posted by u/pabs3
dub · 4 years ago
It's no accident that the Google Open Source Security Team is sponsor of reproducible-builds.org: they'd like to get the open source world up to speed with best practices that have been applied widely internally at Google for well over a decade.

Blaze (a.k.a. Bazel) rolled out at Google around 2007-ish, and the idea that consistent inputs should produce consistent outputs was fundamental to its philosophy.

Build rules like https://github.com/GoogleContainerTools/distroless to create minimal, reproducible docker images would seem radical and new most people building docker containers these days (almost everyone uses Dockerfile format) but it'd seem perfectly ordinary and very old-fashioned to any Googler.

tylerl · 4 years ago
I spent a fair amount of time at work five or six years ago trying to figure out how to make supply chain security actually possible in the general case with standard open-source tools. And I can tell you that the fact that Docker builds are fundamentally non-deterministic caused me no end of frustration and difficulty.

This was about the time that Bazel was being open-sourced, and Matt's rules_docker extension was already in there. A solution existed, so to speak, but it would have been nutty to assume that the average project would switch from the straightforward-looking Dockerfile format to using Bazel and BUILD files to construct docker containers. And Docker Inc wasn't going to play along; they were riding a high valuation that depended on them being the final word about containerization, so vocally pretending the problem didn't exist was their safest way forward.

At one point I put together a process and POC for porting the concept of reproducible builds to docker in a user-friendly format -- essentially you'd define a spec that listed your dependencies with no more specificity than you needed. Then tooling would dep-solve that spec and freeze it into a fully-reproducible manifest that encoded all the timestamps, package versions, and other bits that would otherwise have been determined at build time. Then the _actual_ build process left nothing to chance: grab the identified sources and build and assemble in a hermetic environment. You'd attach the manifest to the container, and it gave you a precise bill of materials in a format that you could confidently use for identifying vulnerabilities. Since the builds were fully hermetic, a given manifest would only ever produce one set of bits, which could be reproduced in an automated fashion, allowing you to spot supply chain inconsistencies.

In my tooling, I leaned heavily on package providers like Debian as "owning" the upstream software dependency graph, since this was a problem they'd already solved, and Debian in particular was already serious about reproducibility in their packages.

In the end, it didn't go anywhere. There were a LOT of hacks to make it work since the existing software wasn't designed to allow this kind of integration. For example, the dependency resolution step required splicing in a lot of internal code from package managers, and and the docker container format was (and probably still is) a mess that didn't allow the end products to be properly identified as reproducible without breaking other things.

Plus, this is a problem that only people trying to do security at scale even care about. We needed a sea-change of industry thought around verifiability before my solution would seem at all valuable to people outside a few huge tech companies.

tylerl commented on Pyflow – Visual and modular block programming in Python   github.com/Bycelium/PyFlo... · Posted by u/amai
tylerl · 4 years ago
Does anyone know of a more generalized framework for doing this kind of thing? I'd been meaning to write a framework kind of like this for some time, but never got around to it, and was hoping someone else would. This one unfortunately doesn't really check the important boxes, but it's a good start. I was hoping more for:

* Target language agnostic (this one seems to get mostly there) -- the nodes communicate data/logic flow, you then serialize/deserialize accordingly. * Focus on data flow, not just execution -- IO from nodes, visual indicators of data types (colors or something) * Capable of visually encapsulating complexity -- define flows within flows * Ideally embeddable in web apps (e.g a browser/electron frontend or something)

These are pretty popular to embed in complex "design" oriented applications, especially ones that involve crafting procedures by non-programmers (e.g. artists, data scientists, etc). Examples where this is implemented that come to mind include Blender, Unity, and Unreal.

A core part of the fundamental design of each one of the successful implementations is that they allow efficient code to be crafted by people who don't think they understand code. Making it visual helps engage the brains of certain kinds of people. The "code as text" paradigm is spatially efficient, but it's like a brick wall for some people.

Dead Comment

tylerl commented on About Google's approach to research publication – Jeff Dean   docs.google.com/document/... · Posted by u/yigitdemirag
theobeers · 5 years ago
And the examples of issues flagged in review that Jeff keeps highlighting—like Timnit’s alleged failure to mention recent work to reduce the environmental impact of large models—are themselves a bit worrisome. Jeff gives the impression that they demanded retraction (!) because they wanted Timnit and her coauthors to soften their critique. The more I read about this, the worse it looks.
tylerl · 5 years ago
The communication doesn't give that impression; instead it says that the paper makes claims that ignore significant and credible challenges to those claims. Dean said that these factors would need to be addressed, not agreed with.

Publishing a transparently one-sided paper in Google's name would be a problem, not because of the side it picks, but because it suggests the researchers are too ideologically motivated to see the see the problem clearly.

Ironically, it indicates systemic bias on the part of the researchers who are explicitly trying to eliminate systemic bias. That's just a bit too relevant to ignore.

tylerl commented on Apple's M1 SoC Shreds GeForce GTX 1050 Ti in New Graphics Benchmark   tomshardware.com/news/app... · Posted by u/jbergstroem
tylerl · 5 years ago
Apple's new chip "shreds" a 2016-era GPU?

Wow.

tylerl commented on More than 1/3 of all access to Google is now over IPv6   google.com/intl/en/ipv6/s... · Posted by u/AndrewDucker
dijit · 5 years ago
And yet, when I beg my google cloud rep for IPv6 addresses on instances (or on anything that isn’t the load balancer) I get told that it is not on the immediate roadmap.

The cloud providers have pushed back ipv6 adoption so hard imo. At least native ipv6 access.

I know they’ve thrown in some token support and you /can/ make something work; but compared to VPS providers which consistently deliver machines with IPv6 addresses by default- it’s a huge barrier to adoption. You have to really /want/ it, and most people don’t see the value. Unless it’s a backend for an iOS App.

It really frustrates me.

tylerl · 5 years ago
IIRC, all GCPs IPv6 support is complicated by the fact that they adopted IPv6 from the get-go for internal routing, and layer the user-visible virtual address space on top of it, embedding the user-visible addresses inside the invisible "actual" VM addresses, and that layering strategy allows for something super amazing or fast or something. Something like that.

So then you ask the engineers, "when are you going to adopt IPv6?" And they're like: "What do you mean? We've never NOT used IPv6 for everything important."

On the one had my GCP server's "native" IP address that the OS sees is always an IPv4 address. On the other hand, it's always in the 10.x.x.x/8 range. Everything else is NAT and LB.

tylerl commented on Alarmed as Covid patients' blood thickened, New York doctors try new treatments   reuters.com/article/us-he... · Posted by u/adventured
tylerl · 5 years ago
My brother is an ER doc in a well-known facility, and he says this covid thing is freaking the everliving shit out of the front-line medical profession. This virus is just not behaving like a normal disease should.

The doctors who have been around long enough say that the feeling in the hospitals is just like the early days of AIDS. All you knew is that patients were dying from a disease that doesn't follow any of the normal rules, and nobody's sure why, and all the healthcare workers are nervous AF that they're going to get it too, but everyone is trying to be brave because the patients and family are scared out of their minds, and calm needs to start somewhere, right?

tylerl commented on Our data centers now work harder when the sun shines and wind blows   blog.google/inside-google... · Posted by u/martincollignon
mtrovo · 5 years ago
I think this is much more useful to a cloud provider than to a customer.

As a customer I think you could configure somethings like you said using spot instances on AWS but that’s it, you’re going to save some small amount of dollars in a year but if you account for the engineering hours needed to set this up maybe is not really worth it.

As a cloud provider you could juggle your client between datacenters depending on the load and price of energy over there. A flat rate for a cloud region means that there’s an opportunity for arbitration between datacenters that could be an increase of thousands in profits on their side.

tylerl · 5 years ago
Thing is, Google requires an absolutely stupid amount of computing resources for running their core business. YouTube transcoding is a great example and a big one for sure, but I bet they have even bigger ones in there somewhere. I have no real data to base this on (and I'm sure nobody does), but I'd bet 5:1 odds that if Google were an AWS customer, they'd be bigger than all the others combined.

So in that case, optimizing for a single customer makes perfect sense if it's the right customer.

Dead Comment

u/tylerl

KarmaCake day1047August 17, 2015View Original