Sirened (u/Sirened) - Readit News

Sirened commented on Buying an iPad Pro for coding was a mistake technicallychallenged.sub... · Posted by u/koinedad

Sirened · 2 years ago

There's definitely a funky curve of what you can and cannot comfortably do on an iPad. If you're pinned to IDEs and need lots of local graphical tools to support development, an iPad is unusable. If you already have to run all your work remotely since the tools are too heavy even for a laptop (like me with EDA tools), it turns out that the iPad makes for a great little client. I use mine a lot with iSH. I can do work locally in vim and then submit jobs to the compute cluster, it's the exact same workflow I'd use on a laptop.

Sirened commented on I don't expect to see competitive RISC-V servers any time soon utcc.utoronto.ca/~cks/spa... · Posted by u/goranmoomin

panick21_ · 2 years ago

Some timing difference are inherent but if they are exploitable is the real question. There are paper and tools produced that can give you a high confidence that you are not leaking.

Sirened · 2 years ago

Much of transient execution research over the years has been invalidated or was complete bogus to begin with. It was extremely easy to get a paper into a conference for a while (and frankly still is) just by throwing in the right words because most people don't really understand the issue well enough to tell what techniques are real and practical or just totally non-functional.

You have to stop the leak into side channels in the first place, it's simply not practical to try to prevent secrets from escaping out of side channels. This is, unfortunately, the much harder problem with much worse performance implications (and indeed the reason why Spectre v1 is still almost entirely unmitigated).

Sirened commented on I don't expect to see competitive RISC-V servers any time soon utcc.utoronto.ca/~cks/spa... · Posted by u/goranmoomin

willis936 · 2 years ago

Why hits to DRAM? Just use cache for speculated branches. The performance gain of the difference between the length of the speculated branch and the length of the bookkeeping is still there. There are workloads with short branches that would have a performance penalty. In those cases it would be helpful to have a flag in the instruction field to stop speculative execution.

Sirened · 2 years ago

It's not that simple. The problem is not just branches but often the intersection of memory and branches. For example, a really powerful technique for amplification is this:

ldr x2, [x2]

cbnz x2, skip

/* bunch of slow operations */

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

ldr x1, [x1]

add x1, x1, CACHE_STRIDE

skip:

Here, if the branch condition is predicted not taken and ldr x2 misses in the cache, the CPU will speculatively execute long enough to launch the four other loads. If x2 is in the cache, the branch condition will resolve before we execute the loads. This gives us a 4x signal amplification using absolutely no external timing, just exploiting the fact that misses lead to longer speculative windows.

After repeating this procedure enough times and amplifying your signal, you can then direct measure how long it takes to load all these amplified lines (no mispredicted branches required!). Simply start the clock, load each line one by one in a for loop, and then stop the clock.

As I mentioned earlier, unless your plan is to treat every hit as a miss to DRAM, you can't hide this information.

The current sentiment for spectre mitigations is that once information has leaked into side channels you can't do anything to stop attackers from extracting it. There are simply too many ways to expose uarch state (and caches are not the only side channels!). Instead, your best and only bet is to prevent important information from leaking in the first place.

Sirened commented on I don't expect to see competitive RISC-V servers any time soon utcc.utoronto.ca/~cks/spa... · Posted by u/goranmoomin

willis936 · 2 years ago

Yes but how are the timing attacks an inherent vulnerability and not the result of an accounting error? If the bookkeeping is done properly and the correct amount of cleanup time on a speculated branch is done I don't see how adjacent processes would leak data.

Sirened · 2 years ago

This is an idea many have had before but it doesn't quite work. When you do this, you tend to lose all the performance gained from speculative execution. It's essentially data-independent-timing as applied to loads and stores, so you have to treat all hits as if they were misses to DRAM, which is not particularly appealing from a performance standpoint.

This is not to mention the fact that you can use transient execution itself (without any side channels) to amplify a single cache line being present/not present into >100ms of latency difference. Unless your plan is to burn 100ms of compute time to hide such an issue (nobody is going to buy your core in that case), you can't solve this problem like this.

Sirened commented on Ask HN: Risk of unsafe software in automobiles? · Posted by u/jacobevelyn

someweirdperson · 3 years ago

> Some examples I've seen in code

You are not reading it correctly. It is not code as everyone knows it. It's like an electrical circuit with variable names attached to each conductor, and the code propagates information like electricity would.

There's tools dedicated to this, able to draw pictures of such code circuits (e.g. Simulink, Ascet). And such pictures can be automatically translated into c-code, that looks even worse than anything translated manually.

In the end, of course the tests prove that the code works like the picture of the circuit shows, and therefore the car must work correctly! This avoids the need for anyone working on only the code to understand a car.

In reality, things usually work in the end only because of how simple everything is and high number of iterations.

Sirened · 3 years ago

Modeling programs as circuits also makes them significantly easier to formally verify too! These sorts of synthesis tools are really cool, though writing traditional software in them is extremely painful.

Sirened commented on 2022 was the year of Linux on the Desktop? justingarrison.com/blog/y... · Posted by u/xrayarx

pid-1 · 3 years ago

Article says that doesn't include WSL / Docker

Sirened · 3 years ago

It's a joke about people moving to Linux not because it's gotten any better but because Windows has been getting worse

Sirened commented on How to write unmantainable code (2015) github.com/Droogans/unmai... · Posted by u/diegolo

Sirened · 3 years ago

My favorite is when the codebase is so deeply buried in macros and headers that send you on a wild goose chase to find any actual code that it becomes much easier to just dump the binary in ida/binja. The source code can lie but at least the compiled binary directly does what it says

Sirened commented on California passes law banning Tesla from calling software FSD teslarati.com/califonia-b... · Posted by u/perihelions

rootusrootus · 3 years ago

> Driving is not safe

This only appears true because we've made our entire world so safe that we can call one fatality every 100 million miles dangerous. Given everything we use cars for and the immense utility of them, being driven almost exclusively by amateurs ... cars are remarkably safe.

And if you could find a way to reliably remove the 1% that cause most of the problems, it would be even safer.

Sirened · 3 years ago

My sad related factoid to this is that suicide overtook car accidents as the leading cause of death of teenagers in Colorado. This is because although suicide is a worsening problem, crash safety and driver education have improved much faster than the suicide rate has been rising, causing them to flip for the first time.

Sirened commented on RISC-V Pushes into the Mainstream semiengineering.com/risc-... · Posted by u/PaulHoule

solarkraft · 3 years ago

Some argue it has actually become a CISC (special instructions for JavaScript and so on), but I don't know how RISC-V compares.

Sirened · 3 years ago

RISC is not about the number of instructions but rather what the instructions do. The famous example of CISC gone to its logical extreme is the VAX's polynomial multiply instruction, which ended up being almost a full program in a single instruction. RISC tends to go the other way, focusing on things that are easy for hardware to do and leaving anything else to software.

Sirened commented on RISC-V Pushes into the Mainstream semiengineering.com/risc-... · Posted by u/PaulHoule

zackmorris · 3 years ago

Does anyone know of a good wiki for doing multicore RISC-V on FPGA? Something more substantial than:

https://www.reddit.com/r/RISCV/comments/z6xzu0/multi_core_im...

When I got my ECE degree in 1999, I was so excited to start an open source project for at least a 256+ core (MIPS?) processor in VHDL on an FPGA to compete with GPUs so I could mess with stuff like genetic algorithms. I felt at the time that too much emphasis was being placed on manual layout, when even then, tools like Mentor Graphics, Cadence and Synopsys could synthesize layouts that were 80% as dense as what humans could come up with (sorry if I'm mixing terms, I'm rusty).

Unfortunately the Dot Bomb, 9/11 and outsourcing pretty much gutted R&D and I felt discouraged from working on such things. But supply chain issues and GPU price hikes for crypto have revealed that it's maybe not wise to rely on the status quo anymore. Here's a figure that shows just how far behind CPUs have fallen since Dennard scaling ended when smartphones arrived in 2007 and cost/power became the priority over performance:

https://www.researchgate.net/figure/The-Dennard-scaling-fail...

FPGA performance on embarrassingly parallel tasks scales linearly with the number of transistors, so more closely approaches the top line.

I did a quick search and found these intros:

https://www.youtube.com/watch?v=gJno9TloDj8

https://www.hackster.io/pablotrujillojuan/creating-a-risc-v-...

https://en.wikipedia.org/wiki/Field-programmable_gate_array

https://www.napatech.com/road-to-fpga-reconfigurable-computi...

Looks like the timeline went:

2000: 100-500 million transistors

2010: 3-5 billion transistors

2020: 50-100 billion transistors

https://www.umc.com/en/News/press_release/Content/technology...

https://www.design-reuse.com/news/27611/xilinx-virtex-7-2000...

https://www.hpcwire.com/off-the-wire/xilinx-announces-genera...

I did a quick search on Digi-Key, and it looks like FPGAs are overpriced by a factor of about 10-100 with prices as high as $10,000. Since most of the patents have probably run out by now, that would be a huge opportunity for someone like Micron to make use of Inflation Reduction Act money and introduce a 100+ billion transistor 1 GHz FPGA for a similar price as something like an Intel i9, say $500 or less.

Looks like about 75 transistors per gate, so I'm mainly interested in how many transistors it takes to make a 32 or 64 bit ALU, and for RAM or DRAM. I'm envisioning an 8x8 array of RISC-V cores, each with perhaps 64 MB of memory for 16 GB total. That would compete with Apple's M1, but with no special heterogenous computing hardware, so we could get back to generic multicore desktop programming and not have to deal with proprietary GPU drivers and function coloring problems around CPU code vs shaders.

Sirened · 3 years ago

What sort of wiki are you envisioning here? There is some decent tooling and docs around generating SoCs [1] but, as the article mentions, the most difficult part is not creating a single RISCV core but rather creating a very high performance interconnect. This is still an open and rich research area, so you're best source of information is likely to just be google scholar.

But, for what it's worth, there do seem to be some practical considerations why your idea of a hugely parallel computer would not meaningfully rival the M1 (or any other modern processor). The issue that everyone has struggled with for decades now is that lots of tasks are simply very difficult to parallelize. Hardware people would love to be able to just give software N times more cores and make it go N times faster, but that's not how it works. The most famous enunciation of this is Amdahl's Law [2]. So, for most programs people use today, 1024 tiny slow cores may very well be significantly worse than the eight fast, wide cores you can get on an M1.

[1] https://chipyard.readthedocs.io/en/stable/Chipyard-Basics/in...

[2] https://en.wikipedia.org/wiki/Amdahl's_law