Any plans to write a tutorial for fine-tuning local models?
- Airbyte has two self-hosted options: OSS & Enterprise
- Langchain: OSS
- OpenAI: you can host an OSS model if you want to
- Pinecone: there are OSS/self-hosted alternatives
When using a local vector db, what is the security model between my data and Airbyte? For example, do I need to permit Airbyte IPs into my enviro, and is there a VPN type option for private connectivity?
Airbyte comes in 3 flavors: OSS, Cloud, Enterprise.
For OSS & Enterprise, data doesn't leave your infra since Airbyte is running in your infrastructure. For Cloud, you would have to allow some IPs to allow us to access your local db.
If you have data with PII:
One option would be to use Airbyte and bring the data into files/local db rather than directly to the vector store, add an extra step that strips the data from all PII and then configure Airbyte to move the clean file/record to the vector store.
The option that jmorgan mention is relevant here, using a "self-hosted" model.
The great thing we get by plugging this whole stack together is that we get all the refreshed data as more issues/connectors get created.