Why are you making your structs smaller. Probably for "performance". If your goal is literally just reducing memory usage, then this is all fine. Maybe you're running on a small microcontroller and you just straight up don't have much RAM.
But these days, in most areas, memory is cheap and plentiful. What's expensive is CPU cache. If you're optimizing memory use for that, it's really about runtime speed and less about total memory usage. You're trying to make things small so you have fewer cache misses and the CPU doesn't get stuck waiting as much.
In that case, the advice here is only part of the story. Using bitfields and smaller integers saves memory. But in order to access those fields and do anything with them, they need to get loaded into registers. I'm not an expert here, but my understanding is that loading a bitfield or small int into a word-sized register can sometimes have some bit of overhead, so you have to pay attention to the trade-off.
If I was optimizing for overall runtime speed and considering packing structs to do that, I'd really want to have good profiling and benchmarks in place first. Otherwise it's easy to think you're making progress when you're actually going backwards.
Personally, I only pay attention to the cache line and layout when the structure is a concurrent data structure and needs to sync across multiple cores.
Nevertheless, the perf increases of IO devices these days are in insane. I am wondering whether and when these perf promises will materialize. We are only on PCIE 5 this year and it's not that common yet. I am wondering how fast adoptions would be, which pushes manufacturers to iterate. The thing is that at the current level of PCIE 5, a lot of softwares already need to be rewritten to take full advantages of new devices. But rewriting softwares takes time. If software iterations are slow, it's questionable if consumers will continue to pay for new generations of devices.