Where AI Outputs Come From

Virza uses AI across several features: chat, summaries, extraction, and search ranking. This page explains exactly how each AI output is generated and what you can trust about the results.

Chat responses

When you ask a question in Virza’s chat:

Evidence retrieval: Virza searches your workspace documents (or a specific collection/document, depending on your selected scope) for passages relevant to your question
Source evaluation: Retrieved passages are scored for relevance using a neural cross-encoder model
Response generation: A large language model generates an answer, grounding it in the retrieved evidence when relevant passages were found
Citation attachment: Every claim derived from your documents includes an inline citation linking to the specific source passage

The AI model may also draw on its training knowledge when:

No relevant documents were found in your workspace
Your question is clearly general (e.g., “What is quantum computing?”)
Your question requires context beyond what’s in your documents

The knowledge source badge always tells you which sources were used.

Document summaries

Summaries are generated by passing the document’s extracted text to a language model with a structured prompt. The model produces a summary based solely on the document’s content. No external knowledge is added.

Free and Starter plans: standard AI model
Pro and higher: advanced model (Claude) for higher accuracy and depth

Table and figure descriptions

AI-generated descriptions of tables and figures use a vision model that analyzes the cropped image of each artifact. The description is generated from what the model can see in the image. It does not combine information from other parts of the document.

Search ranking

Virza’s search uses AI in two ways:

Embedding generation: your query and document content are converted to vector representations that capture semantic meaning, enabling concept-level matching beyond keywords
Neural reranking: a cross-encoder model reads your query alongside each candidate result to produce a fine-grained relevance score

These AI components improve result quality but do not generate text. They score and rank existing content.

What is inferred vs. extracted

Output	Source	Reliability
Document title, authors, DOI	Extracted from document + verified via CrossRef/arXiv	High, cross-referenced with external databases
Section boundaries	Detected from document structure	High for standard academic layouts
Table data (headers, rows)	Extracted from document using Docling TableFormer	High (97.9% accuracy on standard tables)
Figure crops	Extracted from document pages using bounding box detection	High
Citation records	Parsed from bibliography section via GROBID + CrossRef	High for well-formatted references
Executive summary	Inferred by AI model from document text	Medium, captures key points but may miss nuance
AI chat answers	Inferred by AI model, grounded in retrieved evidence	Varies, check the confidence meter and source badges
Artifact descriptions	Inferred by vision AI from cropped image	Medium, best for clear charts and diagrams
Claims (Enterprise)	Inferred by AI model from results sections	Medium, always verify against the original text

Rule of thumb: anything labeled “extracted” comes directly from the document and is highly reliable. Anything labeled “inferred” was generated by an AI model and should be verified for important work.

Privacy of AI processing

Your documents are processed on secure infrastructure, never on shared public AI endpoints
Document content is never used to train AI models
AI conversation history is stored within your workspace and accessible only to workspace members with appropriate roles
Workspace admins can audit AI usage from Settings

See Privacy for the full data handling policy.