I just started playing around with PIO and DMA on a Pico, and it’s really fun just how much you can do on the chip without invoking the main CPU. For context, PIO is a mini-language you can program at the edge of the chip that can directly respond to and write to external IO. DMA allows you to tell the chip to send a signal based on data in memory, and can be programmed to loop or interrupt to limit re-invoking. The linked repo uses these heavily for its fast Ethernet communication.
Thanks, and you're correct; not sure why you got downvoted for this. For anyone curious here are the data sheets for RP2040 [for original Pico] and RP2350 [for Pico 2], which describe the systems in detail.
> receive side uses a per-packet interrupt to finalize a received packet
This has made much faster systems not being able to process packets at line speed. A classic was that standard Gigabit network cards and contemporary CPUs were not able to process VoIP packets (which are tiny) at line speed, while they could easily download files (which are basically MTU-sized packets) at line speed.
Fortunately, the receive ISR isn't cracking packets, just calculating a checksum and passing the packet on to LWIP. I wish there were two DMA sniffers, so that the checksum could be calculated by the DMA engine(s), as that's where a lot of processor time is spent (event with a table driven CRC routine).
You can do it using PIO. I did that for emulating memory stick slave on rp2040. One PIO SM plus two dma channels with chained descriptors. XOR is achieved using any io reg you don’t need, with 0x3000 offset (manual mentions this as the XOR alias)
Luckily the RP2040 has a dualcore CPU so one core can be dedicated entirely to receiving the interrupts, passing it to user code on the other core via a FIFO or whatever else you fancy.
I expect the RP2350 to perform much better in this scenario! At the minimum, one of the DMA channels should be eliminated, and I'm hoping the CRC calculation will
get faster.
> Is there enough room to have it control the ethernet port for another weaker or perhaps more powerful microcontroller?
Well there is a whole unused core and plenty of built in SRAM. Seems like a good way to have an open-source version of Wiznet chips [1]. It could support full protocol offloading like Wiznet's or a lower-level raw packet sender/receiver like the ENC424J600.
I just quickly tried to fit the whole rp2040+ethernet phy in the WIZ850io formfactor (mainly because I already used that module in some projects before) and have not yet been able to make it fit without using the more expensive jlcpcb features like burried vias. It would be very cool to have though since the W5500 really needs an update.
Make a package that has a rp2050 mounted on a microSD and you've got a NAS that nobody will ever find.
Back when I was doing a dumb-server/smart-client desktop environment. Something like this would have been pretty cool. It needed a tiny API to save files, but the bulk of the environment worked as a static server.
This stuff all already exists, Raspberry Pi Zero 2 W. Board is slightly bigger than a Pico but has a full blown Linux system, 4 core arm64 cpu, 512MB ram, SD card slot, wifi, no ethernet though (add-ons are available). Or you could use a larger Pi.
It would be interesting to see a short writeup of what kind of magic was required to achieve this, as there have been multiple failed attempts before this.
I'm also curious about the performance boost from 2.81Mbit/link failure at 150MHz to 65.4Mbit/31.4Mbit at 200MHz. That doesn't sound like basic processor bottlenecks, but rather some kind of catastrophic breakdown at a lower level? Does it just occasionally completely fail to lock onto an incoming clock signal or something?
I did some further investigating - it's apparently due to not having enough setup
time on the RX pio SM. Even though the PIO clocking is fixed at 100 MHz, there are CRC errors at the lower system clocks. I tried changing the delay in the PIO instruction that starts the RX sampling, but that only made things worse (as expected). Also tried disabling the synchronizers with no improvement.
Hmm, interesting. Am I understanding it correctly that you're doing some kind of reset on the RX PIO from regular C code, and the time for "RX finish -> interrupt CPU -> reset RX PIO" is longer than the gap between packets?
If so, might it be possible to use two RX PIOs, automatically starting the next one via inter-PIO IRQ when a packet is finished? That'd give you an entire packet receive time to reset the original PIO, which should be plenty.
Usually I can grok the significance of almost any item on HN that catches my eye, but here I'm at a loss. Can someone explain why this matters?
As far as I can tell, someone has figured out how to send Ethernet packets at a relatively high rate using hardware with very limited CPU. Cool, but what can you _do with that_? If the RPi Pico has the juice to run interesting network _application-level traffic_ at line rate it's more intriguing, but I doubt that anyone's going to claim that can serve web traffic at line rate on this device, for example.
Its quite popular in the retro-computing scene, for example, to bring these old machines into the 21st century with modern microcontrollers being used to add peripheral support.
For example, the Oric-1/Atmos computers recently got a project called "LOCI" which adds USB support to the 40-year old computer[1], by using an RP2040's PIO capabilities to interface the 8-bit DATA bus with a microcontroller capable of acting as the 'gateway' to all of the devices on the USB peripheral bus.
This is amazing, frankly.
And now, being able to do Ethernet in such a simple way means that hundreds of retro-computing platforms could be put on the Internet with relative ease ..
RP2040/2350 are IO monsters. You could for example make a logic analyzer that transfers logic data through ethernet.
This "very limited" microcontroller has two cores. Both of them can execute about 25 instructions per byte for generating "application-level traffic". You could definitely saturate a 100 Mbps connection with just one core.
Now that you mention it, I think I would like to see a logic analyzer that does just that. No buffering, just straight up shovel the data to a mac address, or even IP address, and be done with it (maybe lose a few frames here and there). Let the PC worry about what to do with it, like triggers etc.
Should be cheap, right? Though 1Gbit version might still be expensive..
RP2040: https://datasheets.raspberrypi.com/rp2040/rp2040-datasheet.p...
RP2350: https://datasheets.raspberrypi.com/rp2350/rp2350-datasheet.p...
This has made much faster systems not being able to process packets at line speed. A classic was that standard Gigabit network cards and contemporary CPUs were not able to process VoIP packets (which are tiny) at line speed, while they could easily download files (which are basically MTU-sized packets) at line speed.
context switching between processors will reduce cache coherence and hence hits, but yea, it might be worth the tradeoff on busy systems
At first I thought it was the new Pico 2 (RP2350), but no, it’s the old Pi Pico with RP2040.
Is there enough room to have it control the ethernet port for another weaker or perhaps more powerful microcontroller?
Can you combine multiple picos with one being the ethernet stack and another that modifies certain packets?
Are there any other interesting things that can be done?
Well there is a whole unused core and plenty of built in SRAM. Seems like a good way to have an open-source version of Wiznet chips [1]. It could support full protocol offloading like Wiznet's or a lower-level raw packet sender/receiver like the ENC424J600.
[1] https://docs.wiznet.io/Product/iEthernet
Back when I was doing a dumb-server/smart-client desktop environment. Something like this would have been pretty cool. It needed a tiny API to save files, but the bulk of the environment worked as a static server.
It would be interesting to see a short writeup of what kind of magic was required to achieve this, as there have been multiple failed attempts before this.
I'm also curious about the performance boost from 2.81Mbit/link failure at 150MHz to 65.4Mbit/31.4Mbit at 200MHz. That doesn't sound like basic processor bottlenecks, but rather some kind of catastrophic breakdown at a lower level? Does it just occasionally completely fail to lock onto an incoming clock signal or something?
If so, might it be possible to use two RX PIOs, automatically starting the next one via inter-PIO IRQ when a packet is finished? That'd give you an entire packet receive time to reset the original PIO, which should be plenty.
Is this an effective rate, or just the reflection of a hardware limit?
7 byte preamble
1 byte SFD
6 byte dst MAC
6 byte src MAC
2 byte ethertype or length
46-1500 bytes of payload (ignoring “Jumbo” frames and 802.1q tags)
4 byte CRC
12 byte IFG (which is silence, but still counts for time on the wire)
Add it up and you have 1538 bytes “on the wire”.
TCP overhead for IPv4 is 20 bytes for IP(v4) (no options) and 20 bytes for TCP (again, no options).
So 1460 bytes of data for 1538 bytes on the wire. 1460/1538 = 0.949284
So for 100M Ethernet, 94.9284Mbps is “perfect”.
As far as I can tell, someone has figured out how to send Ethernet packets at a relatively high rate using hardware with very limited CPU. Cool, but what can you _do with that_? If the RPi Pico has the juice to run interesting network _application-level traffic_ at line rate it's more intriguing, but I doubt that anyone's going to claim that can serve web traffic at line rate on this device, for example.
What am I missing?
For example, the Oric-1/Atmos computers recently got a project called "LOCI" which adds USB support to the 40-year old computer[1], by using an RP2040's PIO capabilities to interface the 8-bit DATA bus with a microcontroller capable of acting as the 'gateway' to all of the devices on the USB peripheral bus.
This is amazing, frankly.
And now, being able to do Ethernet in such a simple way means that hundreds of retro-computing platforms could be put on the Internet with relative ease ..
[1] - https://forum.defence-force.org/viewtopic.php?t=2593&sid=2d3...
This "very limited" microcontroller has two cores. Both of them can execute about 25 instructions per byte for generating "application-level traffic". You could definitely saturate a 100 Mbps connection with just one core.
Should be cheap, right? Though 1Gbit version might still be expensive..