Skip to content

How Answers Are Produced

End-to-End Flow

When you ask a question, the app runs a retrieval-plus-generation workflow:

  1. Your question is received in the web interface.
  2. The question is converted into an embedding (a numeric form of meaning).
  3. The vector database finds the most semantically relevant document sections.
  4. Those sections are sent to the language model as context.
  5. The model generates an answer grounded in those sections.
  6. The answer is returned in the interface.
flowchart LR
    A["User asks a question"] -->|Question| B["Web interface"]
    B -->|Query text| C["Question converted to an embedding using model BAAI bge-m3"]
    C -->|Vector query| D["Vector search in PostgreSQL with pgvector"]
    D -->|Top relevant chunks| E["Relevant document sections returned"]
    E -->|Context plus user question| F["LLM generates a grounded answer"]
    F -->|Final answer| B
    B --> G["User sees answer"]

Open this diagram full size

Tools Used (Non-Code View)

  • Document parsing and OCR: extract text from PDFs, including scanned pages.
  • Embedding model: converts text and questions into vectors for similarity search.
  • Vector database (PostgreSQL + pgvector): retrieves the most relevant sections.
  • LLM: drafts the final response using retrieved context.
  • Web app UI: where you ask questions and view answers.

What You Should Expect

  • Typical response time is a few seconds, depending on load and document size.
  • Similar prompts may produce different wording, even when evidence is similar.
  • If source content is weak, ambiguous, or incomplete, answers may be partial.
  • Focused follow-up questions usually improve results.

Diagram Legend

  • Rounded rectangle: processing step.
  • Cylinder [(...)]: stored or indexed data.
  • Diamond {...}: decision step.
  • Arrow labels: data passed between steps.