How Answers Are Produced
End-to-End Flow
When you ask a question, the app runs a retrieval-plus-generation workflow:
- Your question is received in the web interface.
- The question is converted into an embedding (a numeric form of meaning).
- The vector database finds the most semantically relevant document sections.
- Those sections are sent to the language model as context.
- The model generates an answer grounded in those sections.
- The answer is returned in the interface.
flowchart LR
A["User asks a question"] -->|Question| B["Web interface"]
B -->|Query text| C["Question converted to an embedding using model BAAI bge-m3"]
C -->|Vector query| D["Vector search in PostgreSQL with pgvector"]
D -->|Top relevant chunks| E["Relevant document sections returned"]
E -->|Context plus user question| F["LLM generates a grounded answer"]
F -->|Final answer| B
B --> G["User sees answer"]
Tools Used (Non-Code View)
- Document parsing and OCR: extract text from PDFs, including scanned pages.
- Embedding model: converts text and questions into vectors for similarity search.
- Vector database (PostgreSQL + pgvector): retrieves the most relevant sections.
- LLM: drafts the final response using retrieved context.
- Web app UI: where you ask questions and view answers.
What You Should Expect
- Typical response time is a few seconds, depending on load and document size.
- Similar prompts may produce different wording, even when evidence is similar.
- If source content is weak, ambiguous, or incomplete, answers may be partial.
- Focused follow-up questions usually improve results.
Diagram Legend
- Rounded rectangle: processing step.
- Cylinder
[(...)]: stored or indexed data. - Diamond
{...}: decision step. - Arrow labels: data passed between steps.