Glossary

Key terms used throughout Virza, defined in the context of the platform.

A

Artifact A structured element extracted from a document, such as tables, figures, equations, citations, or claims. Each artifact has its own status, confidence score, and bounding box location within the original document.

AI credits Usage units consumed by AI-powered operations (chat, summaries, extraction, embeddings). Different operations consume different amounts based on computational cost. Credits reset monthly on your billing date.

C

Claims extraction (Enterprise) The process of identifying structured evidence statements from research papers, including typed claims (statistical, causal, comparative), p-values, effect sizes, and confidence intervals.

Collection A group of related documents within a workspace. Documents can belong to multiple collections. Collections can be nested hierarchically.

Confidence meter A visual indicator showing what percentage of claims in an AI response are grounded in cited evidence from your workspace documents.

Cross-encoder reranking A neural ranking technique where a model reads your query alongside each search result to produce a fine-grained relevance score. More accurate than keyword or vector matching alone.

D

Deep Research An iterative evidence retrieval mode that performs multiple rounds of search and analysis to build comprehensive answers from your library. Uses gap analysis to identify missing perspectives.

Discover A search tab that queries 600M+ papers from external academic databases (Semantic Scholar, OpenAlex, PubMed, ArXiv, Crossref, Exa).

Document In Virza, a document is not just a file. It’s a multi-layered processed artifact with extracted text, metadata, sections, tables, figures, equations, citations, embeddings, and summaries.

Docling The primary AI document parser used by Virza. Extracts text with structure preservation, tables (97.9% accuracy via TableFormer), figures, and equations.

E

Embedding A vector representation of text that captures semantic meaning. Used for semantic search, finding documents by concept, not just keywords. Virza generates embeddings for both document sections and search queries.

Enrichment The AI processing phase that adds summaries, embeddings, vision descriptions, and other AI-generated artifacts to a document after core extraction is complete.

Evidence strength A tier (Strong / Good / Weak) assigned to each source cited in an AI response, based on how relevant the cited passage is to your question.

Extraction table A structured comparison table built by AI from multiple documents. Columns define what to extract; rows are populated by reading each document.

G

GROBID An open-source machine learning library used by Virza to parse bibliographic references from academic papers.

H

HyDE (Hypothetical Document Embeddings) A technique where the AI generates a hypothetical answer to your question, then uses that answer’s embedding to find similar content in your library. Used for question-type queries to improve semantic retrieval.

K

Knowledge source badge A colored indicator on every AI response showing whether the answer is grounded in your documents (green), blends documents with general knowledge (violet), or uses only general knowledge (amber).

M

Metadata Bibliographic information about a document: title, authors, abstract, DOI, journal, publication year, and document type. Extracted automatically and editable manually.

Metadata only A document status indicating the paper was imported from a citation match without the actual PDF file. Upload the PDF to enable full processing.

P

Pipeline The multi-stage document processing system that transforms an uploaded file into a research-ready artifact. Includes security scanning, text extraction, metadata detection, section segmentation, table/figure detection, citation parsing, AI summarization, and search indexing.

Presigned URL A time-limited, cryptographically signed URL that allows your browser to upload or download files directly from cloud storage without the data passing through Virza’s API server.

Q

Quarantine A separate storage zone where uploaded files are held during virus scanning before being moved to production storage.

R

Ready with warnings A document status indicating that core extraction succeeded but one or more optional enrichment stages (summary, embeddings, citations) failed. The document is fully usable for reading and searching.

Reciprocal Rank Fusion (RRF) A technique for merging results from multiple search systems (keyword and semantic) into a single ranked list. Used in Virza’s search pipeline.

S

Semantic search Search by meaning rather than exact keywords. Uses AI embeddings to match the concept of your query against document content. “climate impact on migration” finds papers about climate refugees even without those exact words.

Share link A secure, unique URL that gives read access to a specific resource (document, session, or citation). Can be password-protected, time-limited, and access-limited.

W

Workspace The top-level container in Virza. All documents, collections, notes, conversations, and team members belong to a workspace. Workspaces are strictly isolated. Data in one workspace is invisible to all others.

Workspace fingerprint An embedding representation of your workspace’s research focus, built from the documents you’ve uploaded. Used to boost search results that align with your research trajectory.