RAG and knowledge base
RAG (Retrieval-Augmented Generation) enriches the LLM answer with chunks retrieved from documents you uploaded.
Flow
- Documents are uploaded and split into chunks.
- Chunks are turned into embedding vectors.
- The user question is embedded; similar chunks are found via similarity search.
- Selected chunks are passed to the model as context; the model generates an answer.
Test mode
The query/test endpoint can show source chunks, scores, or embedding info (depends on backend flags).