torusle (u/torusle) - Readit News

torusle commented on RoboPianist: Dexterous Piano Playing with Deep Reinforcement Learning (2023) kzakka.com/robopianist/#d... · Posted by u/bemmu

torusle · 6 months ago

Honestly,

this is really bad. It might be a breakthrough of what you are doing, but when I listen to the output all of the timing and phrasing is aweful.

torusle commented on Reciprocal Approximation with 1 Subtraction · Posted by u/mbitsnbites

torusle · 8 months ago

There are couple of tricks you can do if you fiddle with the bits of a floating point value using integer arithmetic and binary logic.

That was a thing back in the 90th..

I wonder how hard the performance hit from moving values between integer and float pipeline is nowadays.

Last time I looked into that was the Cortex-A8 (first I-Phone area). Doing that kind of trick costed around 26 cycles (back and forth) due to pipeline stalls back then.

torusle commented on Show HN: A1 – Faster, Dynamic, More Reliable NFC Transactions github.com/overlay-paymen... · Posted by u/AffableSpatula

AffableSpatula · a year ago

Hi there, author here! I think you've highlighted a couple of things worth clarifying in the doc:

1. Apple having just announced it is opening up NFC to developers means that both major mobile platforms can now act as responding devices; so widely distributing new NFC protocols to consumer devices has become very fast and inexpensive through an update to the OS or installing a third party app.

2. Mobile consumer hardware is sufficiently fast for the application operations (eg. Cryptographic operations) so that these roundtrip and metadata overheads of APDU do actually make a meaningful contribution to the total time it takes to complete a transaction. Experiencing this in my development efforts here was the motivation for designing this alternative.

3. A1 is interoperable with APDU infrastructure and can therefore be adopted by terminals immediately, since reader terminals can attempt an A1 initiation and any APDU response from a legacy device is considered a failure; at which point the terminal can fall back to its default APDU message exchange.

I will update the doc to clarify these points, what do you think?

Given your experience I'd be interested in your detailed feedback, maybe we could jump on a call soon if you have time?

torusle · a year ago

> 1. Apple having just announced it is opening up NFC > to developers means that both major mobile > platforms can now act as responding devices;

> 2. Mobile consumer hardware is sufficiently fast for the application > operations (eg. Cryptographic operations)

You are right here. It is possible to emulate a card using mobile phones. We've been able to shim/emulate any card for much longer.

The thing is: To connect to the payment system you need a certificate. And you simply don't get it unless you can prove that you have all kinds of security measures applied.

For android and apple, the actual payment stuff runs inside a isolated little micro-controller which has been certified and is temper proof/protected. This little thing is fortified so far that it will destruct itself when you try to open it up. There are alarm meshes, light sensors and much more inside the chip to detect any intrusion just to protect the certificate.

If you don't have that security, the payment providers will deny you their certificate, plain and simple.

You can build your own thing using card emulation via apps, but you will take all the risks.

How it works in practice is the following: These temper proof micro-controllers can run java-card code. You can write an applet for them, get it installed (if you have the key from apple/google - not easy). Then install it and you have a two-way communication: Your applet inside the secure world communicating with the payment world, and your ordinary mobile app showing things on the screen.

torusle commented on Show HN: A1 – Faster, Dynamic, More Reliable NFC Transactions github.com/overlay-paymen... · Posted by u/AffableSpatula

torusle · a year ago

I've worked in the payment industry and among other things built a payment terminal, so I know a thing or two about it.

1st: The message overhead during communication is not an issue. It is tiny compared to the time it takes for the credit card to do it's crypto thing.

2nd: This won't be adapted ever. There is simply way to much working infrastructure out there build on the APDU protocol. And there are so many companies and stakeholders involved that doing any change will take years. If they start aligning on a change, it would be something that makes a difference, not something that safes time in the order of 5 to 20 milliseconds. per payment.

torusle commented on A good day to trie-hard: saving compute 1% at a time blog.cloudflare.com/pingo... · Posted by u/eaufavor

hooli42 · a year ago

Hash functions can be as simple as a single modulo.

torusle · a year ago

Or as simple as using the hardware accelerated CRC32 that we have in our x86 CPUs.

Last time I checked, CRC32 worked surprisingly well as a hash.

torusle commented on Things I learned while writing an x86 emulator (2023) timdbg.com/posts/useless-... · Posted by u/fanf2

aengelke · a year ago

Bonus quirk: there's BSF/BSR, for which the Intel SDM states that on zero input, the destination has an undefined value. (AMD documents that the destination is not modified in that case.) And then there's glibc, which happily uses the undocumented fact that the destination is also unmodified on Intel [1]. It took me quite some time to track down the issue in my binary translator. (There's also TZCNT/LZCNT, which is BSF/BSR encoded with F3-prefix -- which is silently ignored on older processors not supporting the extension. So the same code will behave differently on different CPUs. At least, that's documented.)

Encoding: People often complain about prefixes, but IMHO, that's by far not the worst thing. It is well known and somewhat well documented. There are worse quirks: For example, REX/VEX/EVEX.RXB extension bits are ignored when they do not apply (e.g., MMX registers); except for mask registers (k0-k7), where they trigger #UD -- also fine -- except if the register is encoded in ModRM.rm, in which case the extension bit is ignored again.

APX takes the number of quirks to a different level: the REX2 prefix can encode general-purpose registers r16-r31, but not xmm16-xmm31; the EVEX prefix has several opcode-dependent layouts; and the extension bits for a register used depend on the register type (XMM registers use X3:B3:rm and V4:X3:idx; GP registers use B4:B3:rm, X4:X3:idx). I can't give a complete list yet, I still haven't finished my APX decoder after a year...

[1]: https://sourceware.org/bugzilla/show_bug.cgi?id=31748

torusle · a year ago

Another bonus quirk, from the 486 and Pentium area..

BSWAP EAX converts from little endian to big endian and vice versa. It was a 32 bit instruction to begin with.

However, we have the 0x66 prefix that switches between 16 and 32 bit mode. If you apply that to BSWAP EAX undefined funky things happen.

On some CPU architectures (Intel vs. AMD) the prefix was just ignored. On others it did something that I call an "inner swap". E.g. in your four bytes that are stored in EAX byte 1 and 2 are swapped.

  0x11223344 became 0x11332244.

torusle commented on Own Constant Folder in C/C++ neilhenning.dev/posts/you... · Posted by u/todsacerdoti

tonetegeatinst · a year ago

I'm a security student. My main experience has been python and java....but I have started to learn c to better learn how low level stuff works without so much abstraction.

My understanding is that C is a great language, but I also get that its not for everyone. Its really powerful, and yet you can easily make mistakes.

For me, I'm just learning how to use C, I'm not trying to understand the compiler or make files yet. From what I get, the compiler is how you can achieve even better performance, but you need to understand how it is doing its black magic....otherwise you just might make your code slower or more inefficient.

torusle · a year ago

Nah, it is not that bad.

Sure you can mess up your performance by picking bad compiler options, but most of the time you are fine with just default optimizations enabled and let it do it's thing. No need to understand the black magic behind it.

This is only really necessary if you want to squeeze the last bit of performance out of a piece of code. And honestly, how often dies this occur in day to day coding unless you write a video or audio codec?

torusle commented on How much of a genius-level move was binary space partitioning in Doom? (2019) twobithistory.org/2019/11... · Posted by u/davikr

weinzierl · a year ago

From my memory: In the 90, if you had anything to do with computer graphics[1] you knew about binary space partitioning trees.

Off the top of my head I remember two different articles from local German computer magazines (that are still in my basement) discussing and implementing them.

But so were many, many other ideas and approaches.

Not taking away from Carmack's genius, in my opinion it was not in inventing a horse and neither in finding one, but in betting on the right one.

[1] If you were a programmer in the 90s - any kind of programmer - you most certainly had.

torusle · a year ago

Correct.

And every graphic programmer worth it's salt had a copy of "Computer Graphics - Principles and Practice" on the desk and followed whatever came out of the "Graphics Gems" series.

We knew about BSP, Octrees and all that stuff. It was common knowledge.

torusle commented on Fast linked lists dygalo.dev/blog/blazingly... · Posted by u/dmitry_dygalo

SJC_Hacker · a year ago

How do linked list prevent allocation errors? If anything it would seem to make them worse.

My experience in embedded, everything is hardcoded as a compile time constant, including fixed size arrays (or vectors of a fixed capacity)

torusle · a year ago

In embedded, you often need message queues.

A common way to implement these is to have an array of messages, sized for the worst case scenario and use this as the message pool.

You keep the unused messages in a single linked "free-list", and keep the used messages in a double linked queue or fifo structure.

That way you get O(1) allocation, de-allocation, enqueue and dequeue operations for your message queue.

Another example for this paradigm are job queues. You might have several actuators or sensors connected to a single interface and want to talk to them. The high level "business" logic enqueues such jobs and an interrupt driven logic works on these jobs in the background, aka interrupts.

And because you only move some pointers around for each of these operations it is perfectly fine to do so in interrupt handlers.

What you really want to avoid is to move kilobytes of data around. That quickly leads to missing other interrupts in time.

torusle commented on Fast linked lists dygalo.dev/blog/blazingly... · Posted by u/dmitry_dygalo

torusle · a year ago

> Linked lists are taught as fundamental data structures in programming courses, but they are more commonly encountered in tech interviews than in real-world projects.

I beg to disagree.

In kernels, drivers, and embedded systems they are very common.