The source repo seems to be unavailable (private?), so I wanted to ask - is this a language model fine-tuned on your data, or you're using OpenAI's APIs with all data of your own stored in embeddings which are searched for the user query?
Ah, I haven't open sourced it yet! I need to remove that link for now. I am going the Embeddings route. My approach is (roughly) documented here: https://indieweb.org/OpenAI
Embeddings are in Faiss ~25 nearest neighbours are fed to GPT w/ a prompt written to ensure sources are cited (although performance of this varies - more work will be needed).