Someone has to design each of those reconfigurable digital circuits and take them through an implementation flow.
Only certain problems map well to easy FPGA implementation: anything involving memory access is quite tedious.
I would also question the premise that mem access is less tedious, easy for MCUs/CPU. Esp if you need determinstic performance and response times. Most CPUs have memory hierarchies.
The more practial attempts at dynamic, partial reconfiguration involves swapping out accelerators for specific functions. Encoders, fecoders for different wireless standards, Different curves in crypto for example. And yes somebody has to implement those.
Profit is dependent on scale. FPGAs are useful if the scale is so small that an ASIC production line is more expensive than buying a couple of FPGAs.
If the scale is large enough that ASIC production is cheaper, you reap the performance improvements.
Think of it this way: FPGAs are programmed using ASIC circuitry. If you programmed an FPGA using an FPGA (using ASIC circuitry), do you think you'll achieve the same performance as the underlying FPGA? Of course not (assuming you're not cheating with some "identity" compilation). Same thing applies with any other ASIC.
Each layer of FPGA abstraction incurs a cost: more silicon/circuitry/resistance/heat/energy and lower clock speeds.