In the end I found the python trifatura library to extract the best quality content with accurate meta data.
You might want to compare your implementation to trifatura to see if there is room for improvement.
It's also important to note that other control systems in the body that affect control systems in the mind, eg. endocrine.
Someone paranoid might think that the for-profit management at Elastic is trying to pull some of their previously free software behind a paid-for product. Perhaps they accidentally marked all repos private when they only intended to make a few of them private. They have had beef with AWS in the past where they changed their licensing due to things AWS was doing. So I'll fully believe that it was a genuine accident if all the formerly public repos become public again.
Wow!! That seems so simple, and literally a few weeks to do in today's ecosystem, now thoroughly testing make take a little more time, but wow, I wonder if it was evening attempting to do RAG.