Where AI Outputs Come From
Virza uses AI across several features: chat, summaries, extraction, and search ranking. This page explains exactly how each AI output is generated and what you can trust about the results.
Chat responses
When you ask a question in Virza’s chat:
- Evidence retrieval: Virza searches your workspace documents (or a specific collection/document, depending on your selected scope) for passages relevant to your question
- Source evaluation: Retrieved passages are scored for relevance using a neural cross-encoder model
- Response generation: A large language model generates an answer, grounding it in the retrieved evidence when relevant passages were found
- Citation attachment: Every claim derived from your documents includes an inline citation linking to the specific source passage
The AI model may also draw on its training knowledge when:
- No relevant documents were found in your workspace
- Your question is clearly general (e.g., “What is quantum computing?”)
- Your question requires context beyond what’s in your documents
The knowledge source badge always tells you which sources were used.
Document summaries
Summaries are generated by passing the document’s extracted text to a language model with a structured prompt. The model produces a summary based solely on the document’s content. No external knowledge is added.
- Free and Starter plans: standard AI model
- Pro and higher: advanced model (Claude) for higher accuracy and depth
Table and figure descriptions
AI-generated descriptions of tables and figures use a vision model that analyzes the cropped image of each artifact. The description is generated from what the model can see in the image. It does not combine information from other parts of the document.
Search ranking
Virza’s search uses AI in two ways:
- Embedding generation: your query and document content are converted to vector representations that capture semantic meaning, enabling concept-level matching beyond keywords
- Neural reranking: a cross-encoder model reads your query alongside each candidate result to produce a fine-grained relevance score
These AI components improve result quality but do not generate text. They score and rank existing content.
What is inferred vs. extracted
| Output | Source | Reliability |
|---|---|---|
| Document title, authors, DOI | Extracted from document + verified via CrossRef/arXiv | High, cross-referenced with external databases |
| Section boundaries | Detected from document structure | High for standard academic layouts |
| Table data (headers, rows) | Extracted from document using Docling TableFormer | High (97.9% accuracy on standard tables) |
| Figure crops | Extracted from document pages using bounding box detection | High |
| Citation records | Parsed from bibliography section via GROBID + CrossRef | High for well-formatted references |
| Executive summary | Inferred by AI model from document text | Medium, captures key points but may miss nuance |
| AI chat answers | Inferred by AI model, grounded in retrieved evidence | Varies, check the confidence meter and source badges |
| Artifact descriptions | Inferred by vision AI from cropped image | Medium, best for clear charts and diagrams |
| Claims (Enterprise) | Inferred by AI model from results sections | Medium, always verify against the original text |
Rule of thumb: anything labeled “extracted” comes directly from the document and is highly reliable. Anything labeled “inferred” was generated by an AI model and should be verified for important work.
Privacy of AI processing
- Your documents are processed on secure infrastructure, never on shared public AI endpoints
- Document content is never used to train AI models
- AI conversation history is stored within your workspace and accessible only to workspace members with appropriate roles
- Workspace admins can audit AI usage from Settings
See Privacy for the full data handling policy.