Readit News logoReadit News
timhh commented on GTP Blind Voting: GPT-5 vs. 4o   gptblindvoting.vercel.app... · Posted by u/findhorn
vbezhenar · 19 days ago
I don't like this test, because the very first question I was present with, had both answers looked equivalently good. Actually they were almost the same, just with different phrasing. So my choice would be absolute random. It means, that end score will be polluted by random. They should have added things like "both answers good" and "both answers bad".
timhh · 19 days ago
I have a lot of experience with pairwise testing so I can explain this.

The reason there isn't an "equal" option is because it's impossible to calibrate. How close do the two options have to be before the average person considers them "equal"? You can't really say.

The other problem is when two things are very close, if you provide an "equal" option you lose the very slight preference information. One test I did was getting people to say which of two greyscale colours is lighter. With enough comparisons you can easily get the correct ordering even down to 8 bits (i.e. people can distinguish 0x808080 and 0x818181), but they really look the same if you just look at a pair of them (unless they are directly adjacent, which wasn't the case in my test).

The "polluted by randomness" issue isn't a problem with sufficient comparisons because you show the things in a random order so it eventually gets cancelled out. Imagine throwing a very slightly weighted coin; it's mostly random but with enough throws you can see the bias.

...

On the other hand, 16 comparisons isn't very many at all, and also I did implement an ad-hoc "they look the same" option for my tests and it did actually perform significantly better, even if it isn't quite as mathematically rigorous.

Also player skill ranking systems like Elo or TrueSkill have to deal with draws (in games that allow them), and really most of these ranking algorithms are totally ad-hoc anyway (e.g. why does Bradley-Terry use a sigmoid model?), so it's not really a big deal to add more ad-hocness into your model.

timhh commented on The Sail instruction-set semantics specification language   alasdair.github.io/manual... · Posted by u/weinzierl
Cieric · a month ago
I really like the idea of this. I wonder if I can convince my work to use it for our hardware. Are things like SIMD, SIMT, and other weird formats easy to represent in this kind of language? Or should I just assume anything describable in Verilog/HDL can be described in this language.

This also brings up another question if anyone knows. Is there a term for hardware description languages similar to turning complete for programming languages, or is there a different set of common terms?

timhh · a month ago
Yeah you can describe basically any ISA including SIMD. The RISC-V model doesn't support packed SIMD (the P extension) but it does support Vector.
timhh commented on The Sail instruction-set semantics specification language   alasdair.github.io/manual... · Posted by u/weinzierl
Y_Y · a month ago
I see the RISC-V Sail repo mentions compiling to SystemVerilog. That would be amazing, if you could specify instruction semantics and have that transformed all the way into silicon.
timhh · a month ago
It's still kind of experimental. Also the purpose is more for formal verification against a real design. The RISC-V model doesn't have any microarchitectural features you'd need for a real chip - not even pipelining - so it would be very slow.

Still... it is tantalisingly close to a really nice HDL for design purposes. I have considered trying to make a pipelined RISC-V chip in Sail with all the caches, branch predictor etc.

One feature that makes it a little awkward though is that there isn't really anything like a class or a SV module that you can reuse. If you want to have N of anything you pretty much have to copy & paste it N times.

timhh commented on The Sail instruction-set semantics specification language   alasdair.github.io/manual... · Posted by u/weinzierl
timhh · a month ago
I've used this a lot via the RISC-V Sail model: https://github.com/riscv/sail-riscv

It's a really nice language - especially the lightweight dependent types. Basically it has dependent types for integers and bit-vector lengths so you can have some really nice guarantees. E.g. in this example https://github.com/Timmmm/sail_demo/blob/master/src/079_page... we have this function type

  val splitAccessWidths : forall 'w, 0 <= 'w . (xlenbits, int('w)) ->
    {'w0 'w1, 'w0 >= 0 & 'w1 >= 0 & 'w0 + 'w1 == 'w . (int('w0), int('w1))}
Which basically means it returns a tuple of 2 integers, and they must sum to the input integer. The type system knows this. Then when we do this:

  let (width0, width1) = splitAccessWidths(vaddr, width);
  let val0 = mem_read_contiguous(paddr0, width0);
  let val1 = mem_read_contiguous(paddr1, width1);
  val1 @ val0
The type system knows that `length(val0) + length(val1) == width`. When you concatenate them (@ is bit-vector concatenation; wouldn't have been my choice but it's heavily OCaml-inspired), the type system knows `length(val1 @ val0) == width`.

If you make a mistake and do `val1 @ val1` for example you'll get a type error.

A simpler example is https://github.com/Timmmm/sail_demo/blob/master/src/070_fanc...

The type `val count_ones : forall 'n, 'n >= 0. (bits('n)) -> range(0, 'n)` means that it's generic over any length of bit vector and the return type is an integer from 0 to the length of the bit vector.

I added it to Godbolt (slightly old version though) so you can try it out there.

It's not a general purpose language so it's really only useful for modelling hardware.

timhh commented on Code execution through email: How I used Claude to hack itself   pynt.io/blog/llm-security... · Posted by u/nonvibecoding
jcelerier · a month ago
> These types of vulnerabilities

I don't understand why it's called a vuln. It's, like, the whole point of the system to be able to do this! It's how it's marketed!

timhh · a month ago
Yeah I also don't understand how this is unexpected. You gave Claude the ability to run arbitrary commands. It did that. It might unexpectedly run dangerous commands even if you don't connect it to malicious emails.
timhh commented on Retro gaming YouTuber Once Were Nerd sued and raided by the Italian government   androidauthority.com/once... · Posted by u/BallsInIt
user_7832 · a month ago
> Authorities believe Once Were Nerd's activities may still run afoul of Article 171 in Italy's copyright law, which allows for up to three years imprisonment for violations. (Emphasis mine)

That seems... very excessive? Who's actually being hurt here? No one is buying 20 year old consoles and games that probably aren't even sold by the original company anymore. Seems pretty much like a classic victimless crime IMO.

> Agents accused the creator of promoting pirated copyrighted materials stemming from his coverage of Anbernic handheld game consoles.

Seems hardly something worthy of arresting, let alone jailing someone.

> Italy has a history of heavy-handed copyright enforcement—the country's Internet regulator recently demanded that Google poison DNS to block illegal streams of soccer. So it's not hard to believe investigators would pursue a case against someone who posts videos featuring pirated games on YouTube.

Oh well... didn't realize Italy was like that

timhh · a month ago
That's a maximum. It's extremely unlikely that they would actually get jail time. Maybe a suspended sentence at worst.
timhh commented on Show HN: Improving search ranking with chess Elo scores   zeroentropy.dev/blog/impr... · Posted by u/ghita_
timhh · a month ago
Explanation of Bradley-Terry here: https://stats.stackexchange.com/a/131270/60526

It's such a great and simple algorithm. I feel like it deserves to be more widely known.

I used it at Dyson to evaluate really subjective things like how straight a tress of hair is - pretty much impossible to say if you just look at a photo, but you can ask a bunch of people to compare two photos and say which looks straighter, then you can get an objective ranking.

timhh commented on Deno 2.4   deno.com/blog/v2.4... · Posted by u/hackandthink
blinkingled · 2 months ago
Crazy that Deno is still not workable on FreeBSD because of the Rust V8 bindings not being ported.
timhh · 2 months ago
I mean... you can probably see why they don't spend any effort on that.
timhh commented on Building the Rust Compiler with GCC   fractalfir.github.io/gene... · Posted by u/todsacerdoti
saagarjha · 2 months ago
> Normally, debuing the compiler is fairly straightforward: it is more or less a run of the mill executable.

> In the bootstrap process, the entire thing becomes way more complex. You see, rustc is not invoked directly. The bootstrap script calls a wrapper around the compiler.

> Running that wrapped rustc is not easy to run either: it requires a whole lot of complex, environment flags to be set.

> All that is to say: I don’t know how to debug the Rust compiler. I am 99.9 % sure there is an easy way to do this, documented somewhere I did not think to look. After I post this, somebody will tell me "oh, you just need to do X".

> Still, at the time of writing, I did not know how to do this.

> So, can we attach gdb to the running process? Nope, it crashes way to quickly for that.

It's kind of funny how often this problem crops up and the variety of tricks I have in my back to deal with it. Sometimes I patch the script to invoke gdb --args [the original command] instead, but this is only really worthwhile if it's a simple shell script and also I can track where stdin/stdout are going. Otherwise I might patch the code to sleep a bit before actually running anything to give me a chance to attach GDB. On some platforms you can get notified of process execs and sometimes even intercept that (e.g. as an EDR solution) and sometimes I will use that to suspend the process before it gets a chance to launch. But I kind of wish there was a better way to do this in general…LLDB has a "wait for launch" flag but it just spins in a loop waiting for new processes and it can't catch anything that dies too early.

timhh · 2 months ago
I have a C library (I've also done a Python one in the past) that you load into the executable you want to debug. It activated based on an environment variable so normally I just permanently link it.

When it is loaded it will automatically talk to VSCode and tell it to start a debugger and attach to it & it waits for the debugger to attach.

End result is you just have to run your script with an environment variable set and it will automatically attach a nice GUI debugger to the process no matter how deeply buried in scripts and Makefiles it is.

https://github.com/Timmmm/autodebug

I currently use this for debugging C++ libraries that are dynamically loaded into Questa (a commercial SystemVerilog simulator) that is started by a Python script running in some custom build system.

In the past I used it to debug Python code running in an interpreter launched by a C library loaded by Questa started by a Makefile started by a different Python interpreter that was launched by another Makefile. Yeah. It wasn't the only reason by a long shot but that company did not survive...

timhh commented on Why Use Structured Errors in Rust Applications?   home.expurple.me/posts/wh... · Posted by u/todsacerdoti
arccy · 3 months ago
Having had to work with various application written in rust... i find they have some of the most terrible of errors. "<some low level operation> failed" with absolutely no context on why the operation was invoked in the first place, or with what arguments.

This is arguably worse than crashing with a stack trace (at least i can see a call path) or go's typical chain of human annotated error chains.

timhh · 3 months ago
Yeah I agree - in fact I ran into this very issue only hours ago. The entire error message was literally "operation not supported on this platform". Yeay.

https://github.com/rust-lang/rust/issues/141854

> I'll also note that this was quite annoying to debug since the error message in its entirety was `operation not supported on this platform`, you can't use RUST_BACKTRACE, rustfmt doesn't have extensive logging, and I don't fancy setting up WASM debugging. I resorted to tedious printf debugging. (Side rant: for all the emphasis Rust puts on compiler error messages its runtime error messages are usually quite terrible!)

Even with working debugging it's hard to find the error since you can't set a breakpoint on `Err()` like you can with `throw`.

u/timhh

KarmaCake day479December 24, 2017View Original