Skip to content

RAG and knowledge base

RAG (Retrieval-Augmented Generation) enriches the LLM answer with chunks retrieved from documents you uploaded.

Flow

  1. Documents are uploaded and split into chunks.
  2. Chunks are turned into embedding vectors.
  3. The user question is embedded; similar chunks are found via similarity search.
  4. Selected chunks are passed to the model as context; the model generates an answer.

Test mode

The query/test endpoint can show source chunks, scores, or embedding info (depends on backend flags).

Knowledge bases — usage · Knowledge API

Cere Insight 2.0 documentation