My best guess so far is that somehow I embed a long text and then I break up the returned embedding into multiple parts and search each separately? But that doesn't sound right.
Then, you want to parition the document into chunks. Late chunking pairs really well with semantic chunking because it can use late chunking's improved sentence embeddings to find semantically more cohesive chunks. In fact, you can cast this as a binary integer programming problem and find the ‘best’ chunks this way. See RAGLite [1] for an implementation of both techniques including the formulation of semantic chunking as an optimization problem.
Finally, you have a sequence of document chunks, each represented as a multi-vector sequence of sentence embeddings. You could choose to pool these sentence embeddings into a single embedding vector per chunk. Or, you could leave the multi-vector chunk embeddings as-is and apply a more advanced querying technique like ColBERT's MaxSim [2].
[1] https://github.com/superlinear-ai/raglite
[2] https://huggingface.co/blog/fsommers/document-similarity-col...
The implementation learns to play Battleship in about 2000 steps, pretty neat!