That might be quite an interesting identity resolution challenge (disclosure: I build identity resolution tech).
I would not mind taking a look. Always interested to see how others are handling such data.
We built this while working at a European credit bureau, where we needed to deduplicate and match millions of monthly record updates from various sources. Traditional approaches using graph databases and Spark couldn't handle the scale, so we built our own solution using AWS Serverless.
Each identity is stored as an individual graph structure, using rules-based and ML matching. Performance: <300ms ingest (tested to 5,000/sec), <150ms search regardless of graph size. Several fintech companies use it for fraud detection, KYC, and customer 360.
Unlike vector databases which can blur similar entities together, IdentityRAG maintains distinct customer identities while pulling data from multiple systems - even when customer details differ across databases.
You can try it out with our sample chatbot in the Github repo (linked above). Free to sign up, we charge based on number of unified customer records (it is free for playing and testing). We would love to hear your comments and questions.
There is also a demo video in the repo and you can find more details about us here: https://tilores.io/
Yes I know we can register a UG, but in the end you don't. And it is not just the share capital that is annoying it is everything else.
It literally costs 10x more to do the bookkeeping for a German company vs a UK one. Plus getting investors is much more difficult because of the notary requirements.
We used a SPV for our first round, but even that is annoying. I had some angel investors pull out purely because we are a German GmbH.