We are also developing an open-source solution for those who would like to test it out and/or contribute, it can be consumed as a web service, or embedded into .NET apps. The project is codenamed "Semantic Memory" (available in GitHub) and offers customizable external dependencies, such as using Azure Queues, RabbitMQ, or other alternatives, and options for Azure Cognitive Search, Qdrant (with plans to include Weaviate and more). The architecture is similar, with queues and pipelines.
We believe that enabling custom dependencies and logic, as well as the ability to add/remove pipeline steps, is crucial. As of now, there is no definitive answer to the best chunk size or embedding model, so our project aims to provide the flexibility to inject and replace components and pipeline behavior.
Regarding Scalability, LLM text generators and GPUs remain a limiting factor also in this area, LLMs hold great potential for analyzing input data, and I believe the focus should be less on the speed of queues and storage and more on finding the optimal way to integrate LLMs into these pipelines.
The queues and storage are the foundation on which some of these other integrations can be built on top. Agree fully on the need for LLMs within the pipelines to help with data analysis.
Our current perspective has been on leveraging LLMs as part of async processes to help analyze data. This only really works when your data follows a template where I might be able to apply the analysis to a vast number of documents. Alternatively it becomes too expensive to do at a per document basis.
What types of analysis are you doing with LLMs? Have you started to integrate some of these into your existing solution?
Currently we use LLMs to generate a summary, used as an additional chunk. As you might guess, this can take time, so we postpone the summarization at the end (the current default pipeline is: extract, partition, gen embedding, save embeddings, summarize, gen embeddings (of the summary), save emb)
Initial tests though are showing that summaries are affecting the quality of answers, so we'll probably remove it from the default flow and use it only for specific data types (e.g. chat logs).
There's a bunch of synthetic data scenarios we want to leverage LLMs for. Without going too much into details, sometimes "reading between the lines", and for some memory consolidation patterns (e.g. a "dream phase"), etc.
For instance, given the user "ask" (which could be any generic message in a copilot), decide how to query one or multiple storages. Ultimately, companies and users have different storages, and a few can be indexed with vectors (and additional fine tuned models). But there's a lot of "legacy" structured data accessible only with SQL and similar languages, so a "planner" (in the SK sense of planners) could be useful to query vector indexes, text indexes and knowledge graphs, combining the result.
we could easily add that if there's interest, e.g. using Pub/Sub and Cloud Storage. If there are .NET libraries, should be straightforward implementing some interfaces.
Similar considerations for the inference part, embedding and text generation.
We believe that enabling custom dependencies and logic, as well as the ability to add/remove pipeline steps, is crucial. As of now, there is no definitive answer to the best chunk size or embedding model, so our project aims to provide the flexibility to inject and replace components and pipeline behavior.
Regarding Scalability, LLM text generators and GPUs remain a limiting factor also in this area, LLMs hold great potential for analyzing input data, and I believe the focus should be less on the speed of queues and storage and more on finding the optimal way to integrate LLMs into these pipelines.