An ex-ARM engineer critiques RISC-V

pizlonator · 5 years ago

The lack of condition codes is a big deal for anyone relying on overflow checked arithmetic, like modern safe languages that do this for all integer math by default, or dynamic languages where it’s necessary for the JIT to speculate that the dynamic “number” type (which in those languages is either like a double or like a bigint semantically) is being used as an integer.

RISC-V means three instructions instead of two in the best case. It requires five or more instead of two in bad cases. That’s extremely annoying since these code sequences will be emitted frequently if that’s how all math in the language works.

Also worth noting, since this always comes up, that these things are super hard for a compiler to optimize away. JSC tries very aggressively but only succeeds a minority of the time (we have a backwards abstract interpreter based on how values are used, a forward interpreter that uses a simplified octagon domain to prove integer ranges, and a bunch of other things - and it’s not enough). So, even with very aggressive compilers you will often emit sequences to check overflow. It’s ideal if this is just a branch on a flag because this maximizes density. Density is especially important in JITs; more density means not just better perf but better memory usage since JITed instructions use dirty memory.

rwmj · 5 years ago

The idea is that high end processors will recognise these sequences of instructions and optimize them (something called macro-op fusion). Whether this is a good idea is an open question because we don't yet have such high performance RISC-V chips, but that's what the RISC-V designers are thinking. At the same time it permits very simple implementations which wouldn't be possible if the base instruction set contained every pet instruction that someone thought was a good idea.

Note macro op fusion is widely used for other architectures already, particularly ones like x86 where what the processor actually runs looks nothing like the machine code.

pizlonator · 5 years ago

Two words: instruction density.

It doesn’t matter if they’re fused or not if the reduced instruction density increases memory usage and puts more pressure on I$.

Also, I don’t buy the whole fusion argument on the grounds that having to fuse super complex (5 instruction or more) sequences adds enough complexity that you’ve got opportunity cost. Much better for everyone if the CPU doesn’t have to do that fusion. That’s the whole point of good ISA design - to prevent the need for fusing in cases you’re doing something super common.

CalChris · 5 years ago

I don't think the instruction sequence from the article would qualify for macro-op fusion. Berkeley looked at this for the simplest case of LEA [1]:

  // &(array[offset])
  slli rd, rs1, {1,2,3}
  add  rd, rd,  rs2

The sequence in the article uses what Intel calls the fast case but it still wouldn't qualify for Berkeley's two instruction fusion. Dunno if anyone does three instruction fusion.

As an aside, LEA is never getting added to the base RISC-V nor should it be. But I'm surprised it isn't considered for an extension.

[1] https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-...

Deleted Comment

IshKebab · 5 years ago

> The idea is that high end processors will recognise these sequences

I'm not sure that is the idea, given that RISC-V is targeted at processors so low-end that they don't even implement multiply.

spacenick88 · 5 years ago

Instead of fusing them, shouldn't it be possible to speculate that it will not overflow, process the check on a separate slow path and do a roll back in case it did overflow?

8note · 5 years ago

I'm not super familiar with compiler and processor design, but why do I want the processor to do this optimization instead of the compiler?

brandmeyer · 5 years ago

> Also worth noting, since this always comes up, that these things are super hard for a compiler to optimize away. JSC tries very aggressively but only succeeds a minority of the time (we have a backwards abstract interpreter based on how values are used, a forward interpreter that uses a simplified octagon domain to prove integer ranges,

RISC-V has some closely-related sharp corners in indexed address arithmetic as well. Some choices for the type of the index variable perform much worse on rv64.

Consider: an LP64 machine uses 32-bit integers for 'int' and 'unsigned', but 64-bit integers for `long`, `size_t`, `ptrdiff_t` and so on.

If you use an array index variable of type `unsigned`, then the compiler must prove that wraparound doesn't happen. That's pretty weird considering that half the point of using unsigned is to elide such proofs of correctness. If it cannot prove the absence of unsigned wraparound, then it will be forced to emit zero-extension sequences prior to using the index variable to generate the addresses.

ARMv8 side-steps the whole problem by providing indexed memory addressing modes that include the complete suite of zero and sign extension of a narrow-width index in the load or store instruction itself.

So here we have an example of a three-way system engineering choice.

  - Provide a small amount of hardware that performs the operation on-demand.
  - Provide new and inventive forms of value-range analysis in the compiler.  Despite decades of research into this problem, the world's best solutions still frequently saturate at "the entire width of the type the programmer requested".
  - Change the habits of the world's C programmers.

RISC-V chose options 2 and 3.

saagarjha · 5 years ago

> If you use an array index variable of type `unsigned`

This is usually why your array indexing should be done with an iterator or size_t :)

fulafel · 5 years ago

Nitpick: s/LP64 machine/LP64 C implementation/

But isn't size_t (or ptrdiff_t) the preferred indexing type in C for this reason (among others)? Sometimes you of course do want wrap around modulo semantics but that's much rarer, right?

_chris_ · 5 years ago

The problem is that providing extra bits for "sign-extension mode" and "read 32b or 64b" blows through the opcode space very quickly.

zozbot234 · 5 years ago

The RISC-V spec includes recommended code sequences to check for overflow, so that the hardware can potentially use insn fusion as an optimization. The "bad" cases you mention can be a bit clunky, but they should also be rare.

pizlonator · 5 years ago

I’m aware of those sequences and it’s a myth that they will be rare. For dynamic languages they will be super common.

jhallenworld · 5 years ago

Maybe overflow checking could included as an ISA extension. If it is included, what is the least impactful design?

Overflow is part of the result, so maybe include extra bits to each register that can be arithmetic destination. These bits are not included in moves, but could be tested with new instructions.

Another way that avoids flags is new arithmetic instructions: add but jump on overflow. Maybe this is reduced to add and skip next instruction except for overflow, but maybe things are simplified if the only allowed next instruction is a jump, so the result is a single longer instruction.

jhallenworld · 5 years ago

After thinking about this some more: I think the extension instruction should work like "slt" (set on less than). So we have "sov"- set if add would overflow:

    add t2, t1, t0
    sov t3, t1, t0
    bnez t3, overflow

Why this way? "extra bits on destination registers"- this is really flags. The flags have to be preserved during interrupts, so extending the registers is not so easy (I think it just reduces to classic flags).

"add but jump on overflow" or "add and skip on no overflow"- I don't like this because you can not break it into separate operations without flags. I think you might have to add hidden flags in a real implementation.

An add followed by an sov could be fused, but requires an expensive multi-register write. Fusing maybe could be more likely if the destination is always to a fixed destination register:

    add t2, t1, t0
    sov tflags, t1, t0
    bnez tflags, overflow

pizlonator · 5 years ago

The best design is control bits which RISC-V doesn’t have but x86 and ARM do have.

The best design is part of the core ISA and not and extension since overflow checking is fundamental to modern languages.

tom_mellior · 5 years ago

> JSC [...] a forward interpreter that uses a simplified octagon domain to prove integer ranges

Off-topic, but could you point me to more details on this? Someone (else?) recently mentioned octagon analysis in JSC in a HN thread. I grepped through the sources at the time but didn't find any indication that it exists. At least not under the name "octagon".

pizlonator · 5 years ago

I didn’t call it octagon when I wrote it because I didn’t know that I reinvented a shitty/awesome version (shitty because it misses optimization opportunities, awesome because by missing them it converges super fast while still nailing the hard bits).

Look at DFGIntegerRangeOptimizationPhase.cpp

spacenick88 · 5 years ago

I wonder how this interacts with branch prediction. Since overflows should happen very rarely I guess the branch on overflow should almost always predict as non taken. So wouldn't it be possible to have a "branch if add would overflow" instruction or even canonical sequence that a higher end CPU can completely speculate around and just use speculation rollback if it overflows?

I think an important design point here is that the languages that need a lot of dynamic overflow checks are primarily used on beefier CPUs so if you can get around the code size issue, making it performant only on more capable designs is fine since the overflow check will be rare on simpler CPUs.

pizlonator · 5 years ago

I don’t think that beefier cpu and overflow checks are that related. I mean, you’re right, I just want to place some limits on how right you are.

1. Folks totally run JS and other crazy on small CPUs.

2. Other safe languages (rust and swift I think?) also use overflow checks. It’s probably a good thing if those languages get used more on small cpus.

3. The C code that normally runs on small cpus is hella vulnerable today and probably for a long time to come. Compiling with sanitizer flags that turn on overflow checks is a valuable (and oft requested) mitigation. So theres a future where most arithmetic is checked on all cpus and with all languages.

And yeah, it’s true that the overflow check is well predicted. And yeah, it’s true that what arm and x86 do here isn’t the best thing ever, just better than risc-v.

brandmeyer · 5 years ago

The current world record holder (in the published literature) for branch prediction is TAGE and its derivatives. The G stands for Geometric. It is composed of a family of global predictors that increase in length with a geometric progression. That's somewhat relieving since it means that the storage growth is not unlike that of mipmapping in computer graphics. A small constant k times maximum history length N.

But to a first approximation, if you double the density of conditional branches in the program, then you will need to roughly double the size of the branch prediction tables to get the same performance, even if all of them are correctly predicted 100% of the time.

avianes · 5 years ago

RISC-V spec is not yet finished.

Currently only the most basic extensions are available. But nothing prevents RISC-V from introducing in the future an extension that extends the conditional code or an extension for integer/float overflow.

bertr4nd · 5 years ago

I’d be curious to see the instruction sequences for handling overflow without condition codes. I’m not even sure I see how to do it as efficiently as 3 or 5 instructions :-/

pizlonator · 5 years ago

One example of 3 is branching on 32-bit add overflow on a 64-bit cpu where you do a 32-bit add, a 64-bit add, and compare/branch on the result.

Dead Comment

simias · 5 years ago

I enjoyed reading this a lot, I keep seeing RISC-V being touted as a potential replacement for ARM but I had yet to read a good critique of the ISA by people who know what they're talking about.

This point I didn't quite understand:

>Highly unconstrained extensibility. While this is a goal of RISC-V, it is also a recipe for a fragmented, incompatible ecosystem and will have to be managed with extreme care.

Most successful ISAs (including ARM) have their share of extensions, coprocessors, optional opcodes etc... ARM has the various Thumb encodings, Jazelle, VFP, NEON and more. Toolchains and embedded developers are used to dealing with optional features of computers, I'm not sure why RISC-V would fare worse here.

Beyond that I notice that many of the ascribed weaknesses are shared with other RISC ISAs like MIPS (but not ARM):

- No condition codes

- Less powerful, simpler instructions that require more opcodes to do the same thing but can potentially run faster.

- No MOV instruction

- The "unconstrained extensibility" is arguably a thing on MIPS too, with the four coprocessors that can be used to implement all sorts of custom logic.

Of course ARM has been more successful than MIPS, so maybe it's a sign that those things are indeed bad idea but given that this comes from an ARM dev I wonder if part of it is not just "that's now how ARM does it".

On the other hand I must say that I was surprised that RISC-V made multiplication optional, in this day and age it seems like such a useful instructions that it's well worth the die area. Optional DIV I can understand, but an ISA without MUL? That's rough, even for small microcontroller-type scenarios.

tom_mellior · 5 years ago

> ARM has the various Thumb encodings, Jazelle, VFP, NEON and more.

Having done just a tiny bit of compiler development for ARM, I can assure you that having all of these variants is a pain. Making compiler writers' lives harder means you're less likely to get optimal performance. At least on the more exotic variants, but possibly even on the most common ones.

simias · 5 years ago

>Having done just a tiny bit of compiler development for ARM, I can assure you that having all of these variants is a pain. Making compiler writers' lives harder means you're less likely to get optimal performance. At least on the more exotic variants, but possibly even on the most common ones.

I can empathize, but isn't that just part of the job of making a compiler? Any successful, long-lived ISA is going to have extensions and revisions that will need to be handled in the toolchain. I guess my point is not so much that it isn't painful, it's more that I don't really see what makes RISC-V really different besides the fact that it's a younger ISA and therefore we don't already know for sure which extensions are going to become de-facto standard and which ones will be less common.

>I believe the author doesn't identify as a "guy".

Arg, of course the one time I don't use gender-neutral language I manage to mess it up. Edited, thanks.

pizlonator · 5 years ago

As a compiler pro, I view availability of better instructions to select as an asset rather than as a liability. Sure it’s more work for me and my team but if it makes shit fast then who cares how much work it was.

One of the best lessons I got when I was being inducted into the compiler club was: compilers are hard. It’s a hard job so other people can have easier jobs. It’s ok if compilers turn complex and managing that complexity is just something you have to learn to do. I don’t think it’s true that the need for that complexity leads to lower perf.

blueflow · 5 years ago

> I believe the author doesn't identify as a "guy".

I don't think this was meant as assumption about the authors gender. The same way i wouldn't assume that there is physical, actual pain involved when you said "having all of these variants is a pain" even when you literally wrote it.

floatboth · 5 years ago

All this Thumb etc. stuff is not relevant to the 64-bit world though. AArch64 is the least fragmented of the big ISAs, with a solid list of base functionality — e.g. NEON is guaranteed to exist on everything.

notanotherycom1 · 5 years ago

I don't understand why is it a pain? Are you saying it makes choosing what encoding/extension more difficult? I would only see this as a pain if you had to mix/match these extensions and one isn't guaranteed to exist when another does making you have to write several versions for every mix and match.

But I kind of assumed most ops perform well enough and important optimizations have a test case for and will get done

leeter · 5 years ago

Example: The Commodore 128 (or the Plus/4 works too). The C128 had a more powerful CPU capable of 16-bit work, accessing more RAM, faster Disk access etc. But in general it was rarely targeted. Why? Because developers were looking for the widest degree of compatibility and that meant restricting themselves to the minimal possible subset of compatibility between the C128 and the C64. This meant that by and large the 128KB of ram went unused as most applications ran in C64 mode.

The same applies here: the more extensions and things you pile on... the less likely they are to get used unless they are de-facto mandatory. Even today you can see games getting released that won't touch AVX instructions on both Intel and AMD because of compatibility reasons.

Valve provides data to developers on penetration of various ISA extensions via the hardware survey ( https://store.steampowered.com/hwsurvey/Steam-Hardware-Softw... ) but for RISC-V there is no way to do the same. So most utility writers will be highly constrained in what they will use in terms of expected extension use, that will have significant harms in terms of performance. Alternatively it requires recompiling for every single different target, which is also likely.

In essence: you need a good baseline of compatibility for people to expect to use. It makes moving software easier between targets. A piece of software might certify for example on R64GC but not on R32IF because the double precision emulation might not work as expected or the lack of carry could be an issue etc.

simonh · 5 years ago

For embedded applications, which are RISC-V’s bread and butter, the hardware and software are designed hand in glove. Eg if you are designing a custom RISC-V chip for a media encoding system you will want extensions for that application, and make use of those in your development tool chain. The software for your Hardware, or at least the performance and functionality critical part, is often only targeted at your hardware, not any arbitrary RISC-V system.

devit · 5 years ago

This is just a tooling problem that can be solved trivially by having release builds build multiple binaries by default, for all the major extension profiles.

klelatti · 5 years ago

The point on unconstrained extensibility I think is largely around the fact that anyone can add extensions wheras with ARM, the company retains control.

There are advantages to the RISC-V approach it is likely to lead to more fragmentation - and worse gives the ability for a major implementation to add proprietary extensions that are not licensed to anyone else putting smaller players at a disadvantage and leading to fragmentation, not only in the hardware but also in the software ecosystems.

Whilst you may not like ARM having control at least everyone (for a fee) has full access to the ISA and implementations.

Symmetry · 5 years ago

The extensibility that leads to the danger of fragmentation for general purpose computing is a great advantage for embedded computing where your software or firmware is targeting one particular piece of hardware and doesn't have to be compatible with anyone else. Western Digital is free to put in the mix instructions they need for their hard driver controllers and NVidia is free to put in the instructions they need to control their video cards and the incompatibility between them just doesn't matter.

notanotherycom1 · 5 years ago

> I had yet to read a good critique of the ISA by people who know what they're talking about.

I still wonder about RISC-V. To me, it seems pointless. But a lot of companies are buying into it so I'm wrong

Why would you ever want a standard ISA? If you're buying chips you either want a cheap standard one or a powerful efficient one. To be efficient (or cheap) you'd want to only support what's required and what works best with the implementation.

I don't really understand the point of a generic ISA. Why not have some kind of bytecode or standard format (like llvm-ir) that gets optimized for the CPU and gets a native binary that doesn't need interpretation.

Like how the f* is it easier to make something regular+generic fast rather than something custom for your hardware/chip/cpu fast?

Do you want to know how many times I used XML when it's not required? 0. Do you know how many times I used SQLite or my own binary file? I lost count. SQLite has far more constraints than XML and custom binary files/formats aren't hard after you done than a few times.

eddyb · 5 years ago

Betting on smart compilers without putting in the effort to build them is how the Itanic happened.

Veedrac · 5 years ago

I've never really been a fan of this take. It seems to rest a lot on the ideas that:

1) Fusion is hard. While it can be hard (fusing x86 will be), I do not see why it would be meaningfully difficult for RISC-V hardware (except on devices so small it's better to have the simpler base ISA anyway), or compilers, who can in the worst case just treat fused pairs as their own instructions.

2) There's anything wrong with just most software assuming a fairly fixed set of extensions, as seems to have happened. If microcontrollers want to use a subset without multipliers, that doesn't mean anyone else has to care. If bitmanip is stabilized before RISC-V breaks into more common consumer use, why not assume it when writing code? It's only a problem if people make it one.

Most of the rest don't matter much in a global sense. The arguments about which operations go in which extensions might have meaningful merit, but it seems not very important to me.

Jasper_ · 5 years ago

Fusion is hard. At least, unconstrained fusion is hard. If you have a couple instructions pairs to fuse together, that's fine, but fusing any possible compiler output together is going to make decode even more complicated in practice.

This is why Intel publishes software optimization guides that go over what their fusions are, but it doesn't seem like the RISV-C spec is going to do that yet for many cases. And compiler authors need to know which instruction stream to generate to ensure fused execution.

Veedrac · 5 years ago

Mostly the concern around the lack of instructions in RISC-V revolves around a few well-known cases (eg. indexed loads) where the instructions to fuse are pretty canonical.

There is always room for creativity, but that would be the same with or without indexed loads in the base instruction set. Any non-monopolistic hardware ecosystem has this problem; we've been able to ignore it largely on x86 since Intel had had a performance monopoly for so long, but once you have multiple competing core implementations compilers will have to worry about the edge-case performance differences.

What I'm talking about is more specific to groups of instructions that are safe to treat as fused by default. Note that even if the compiler outputs a pair of instructions but the hardware running the code doesn't fuse it, out-of-order execution means the penalty will generally be extremely small versus the best unfused instruction schedule.

RISC-V does give guidelines on which instructions are good fusion candidates. See for example section 2.13 in the bitmanip extension document.

Hardware, naturally, just has a fixed set of fusions it does.

pizlonator · 5 years ago

fusion is not just hard, it’s opportunity cost. Let’s say you have budget to implement the top 10 most important fusions. On other ISAs you’d use that budget on things that aren’t about condition codes or array access.

Veedrac · 5 years ago

Opportunity cost in what sense? If the ISA is simple and regular, then the silicon costs should be miniscule, and the engineering costs not meaningfully larger than if those instructions were separate unfused instructions that had separate encodings instead.

saagarjha · 5 years ago

Some discussion back when it was written in 2019: https://news.ycombinator.com/item?id=20541144

bonzini · 5 years ago

The worst issue, at least for the versions of the ISA that will run a "real" OS, are the lack of conditional move instructions and lack of bitwise rotation instructions. Lack of shift-and-sum instructions or equivalently addresses with shifted indexes is usually mitigated by optimization of induction variables in the compiler. They are nice to have (I have written code where I took advantage of x86's ability to compute a+b*9 with a single instruction) but not particularly common with the massive inlining that is common in C++ or Rust.

The ugly parts are indeed all ugly, though they have now added hint instructions.

fulafel · 5 years ago

Have decent speedups been gotten by previous CPUs by the addition of conditional moves? IIRC for some the SPECcpu impact was negligible, amd many RISCs don't have it. RISC is about quantifying this kind of thing and skipping marginal additions after all.

gergo_barany · 5 years ago

> Have decent speedups been gotten by previous CPUs by the addition of conditional moves?

This is not a direct answer to your question, but: I recently had to tune the conditional move generation heuristics in the GraalVM Enterprise Edition compiler. My experience has been that you can absolutely get decent speedups of 10-20% or more with a few well-placed conditional moves. The cases where this matters are rare, but they do occur in some real-world software, where sticking a conditional move in some very hot place will have such an impact on the entire application. Conversely, you can get slowdowns of the same magnitude with badly placed conditional moves.

It's a difficult trade-off, since most branches are fairly predictable, and good branch prediction and speculative execution can very often beat a conditional move.

PeCaN · 5 years ago

I'm not sure about this "RISC way" stuff. From a uarch standpoint the RISC vs CISC distinction is moot and from an ISA standpoint the only real quantifiable difference seems to be being a load-store architecture.

ISAs without conditional moves tend to have predicated instructions which are functionally the same thing. I'm not actually aware of any traditionally RISC architectures that have neither conditional moves or predicated instructions. While ARMv7 removed predicated instructions as a general feature ARMv8 gained a few "conditional data processing" instructions (e.g. CSEL is basically cmov), so clearly at least ARM thinks there's a benefit even with modern branch predictors.

Conditional instructions are really, really handy when you need them. It's an escape hatch for when you have an unbiased branch and need to turn control flow into data flow.

pizlonator · 5 years ago

Yes. There are cases where cmov is a killer beast and for example it makes your browser faster.

JSC goes to great efforts to select it in certain cases where it’s a statistically significant overall speed up. I think the place where it’s the most effective for us is converting hole-or-undefined to undefined on array load. Even on x86 where cmov is hella weird (two operands, no immediates) it ends up being a big win.

ncmncm · 5 years ago

You get 2x speedup on Quicksort and all related algorithms using CMOV instructions, so: yes.

https://cantrip.org/sortfast.html

saagarjha · 5 years ago

Yeah, IDK about the RISC ISAs–they seem to be designed around being architecturally simple (and I guess easy to teach?) but they really don't seem to map back to actual code at all, nor do they seem particularly grounded in hardware design either. (Or sometimes they're too close to the hardware and burn themselves…)

d33 · 5 years ago

Could it be because of some patents that made it impossible to do it properly?

vardump · 5 years ago

> I have written code where I took advantage of x86's ability to compute a+b*9 with a single instruction...

Didn't check, but I suspect that decodes at least into two microinstructions.

pbsd · 5 years ago

Not only is it a single uop for the last 10 years of Intel chips, you can also run 2 of them per cycle.

pizlonator · 5 years ago

I’m assuming a is a constant in your example and that you’re doing a(b, b, 8). That’s one cycle on modern intels I believe (I think the manual promised this for Nehalam). OP also alludes right this fact when talking about fusion.

Dead Comment

varispeed · 5 years ago

Can you not add these instructions? At least when you use FPGA IP, it can be done. But you would have to update the toolchain etc to support these new instructions.

Veedrac · 5 years ago

They have added them, it's just they're in bitmanip, which isn't finalized, nor is the extension mandatory.

tralarpa · 5 years ago

I thought the scale factor is either 1,2,4 or 8?

saagarjha · 5 years ago

You can combine them. For example, [rax+rax*8+1] (base register, register shifted by 8, constant).

dathinab · 5 years ago

Having taken a look at the RISC-V ISA spec I'm wondering if they did cripple LL/SC (LR/SC in RISC-V).

Basically:

- LL/SC can prevent ABA if the ABA-prone part is in-between a LL and SC instruction

- To have a ABA prone problem you need some state implicitly dependent on the atomic state but not encoded in it. Normally (always?) the atomic state is a pointer and we depend on some state behind the pointer not changing in a context of a ABA situation (roughly ~ switch out ptr, change ptr target, switch back in ptr, through often more complex to prevent race conditions).

This means in all situations I'm aware of LL/SC only prevents the ABA problem if you at least can do one atomic (relaxed ordering) load "somehow" depending on the LL load. (LL load pointer, offset or similar).

But the RISC-V spec doesn't only not guarantee forward process in this cases (which I guess is fine) but goes as far as explicitly stating that guaranteed not having forward provess is ok, e.g. doing any load between the load reserved and store conditional is allowed to make the store conditional fail>

Doesn't that mean that if you target RISC-V you will not benefit from LL/SC based ABA workaround and instead it's just a slightly more flexible and potential faster compare exchange which can spuriously fail?

The spec says you are supposed to detect if it work and potentially switch implementations. But how can you do that reasonable if it means that you have to switch to fundamentally different data structures, which isn't something easily and reasonably done at runtime.

Or do I miss something fundamental?

souprock · 5 years ago

The use of LL/SC for atomics is a common mistake. It makes replay debuggers like rr impossible to implement.

EE84M3i · 5 years ago

For anyone confused like me, see https://en.wikipedia.org/wiki/Load-link/store-conditional

saagarjha · 5 years ago

I am surprised that the link you posted works…

any1 · 5 years ago

LL/SC is superior to CAS in that a modification to the memory will be detected even though the value has since been set back to the original value. This avoids the ABA problem.

souprock · 5 years ago

No. CAS is superior to LL/SC. There is no possible undetected modification to the memory. That's how atomic operations work. That's the whole point of an atomic operation. It's atomic.

Botching the code can be done with either mechanism. Don't do that.

dathinab · 5 years ago

This only avoided the ABA problem if the code which is ABA prone is run in-between the LL and SC instruction.

But this is where the problem starts. E.g. for RISC when using LR/SC in a way which prevents ABA your are always losing all forward guarantees and it's totally valid for a implementation to be done in a way which will just never complete in such cases...

saagarjha · 5 years ago

Looks like someone read the Wikipedia link posted in a sibling comment ;)

saagarjha · 5 years ago

Last I heard, adding a hardware counter for failed SCs may help work around this on ARM–presumably RISC-V could do the same thing here?

garaetjjte · 5 years ago

Not necessarily, it needs to allow trap on failures.

mark_l_watson · 5 years ago

Sorry in advance, this is going to be a little off topic: while I agree with the technical points in the article and the comments here about RISC-V deficiencies, I hope for a world where Free Software licenses and open hardware rule. (And I admit to some hypocrisy since my daily drivers are the Apple closed ecosystem and Google Workplace.)

What gives me some hope is that an open hardware and Free Software world would help so many people, businesses, and governments. I think it would be a rising tide that lifted all boats except for specific tech industries.

That said, good article.

nsajko · 5 years ago

The misconception you present here seems to be widely held, so I'm actually going to upvote you; RISC-V being an open ISA just means that the standards describing the ISA are open, there is no relation with open source hardware. I.e., proprietary RISC-V cores exist, as do open source ARM cores.