How Parsing Works
When you upload a document, Virza runs it through a multi-stage AI pipeline that extracts text, metadata, citations, tables, figures, equations, and more.
For the complete processing lifecycle, including security scanning, status tracking, failure handling, and troubleshooting, see How Processing Works.
Pipeline overview
Every uploaded document goes through these phases:
- Security: Virus scanning and file validation in quarantine storage
- Extraction: PDF parsing via Docling, metadata detection, section segmentation
- Enrichment: Tables, figures, equations, citations (GROBID), CrossRef metadata
- AI analysis: Summaries, embeddings, vision descriptions (plan-dependent)
- Indexing: Full-text and semantic search indexing
What each plan unlocks
All plans receive full text extraction, metadata, citations, tables, figures, equations, summaries, and search indexing.
Pro and above add:
- Vision descriptions of charts and diagrams
- Structured table data extraction
- Document structure analysis
- Academic embeddings (Specter2)
- Multi-level summaries
- QA precompute for rapid document comprehension
Enterprise adds:
- Claims extraction with typed evidence, p-values, and effect sizes
- Agentic retrieval for multi-hop reasoning
- Methodology scoring
- Citation verification
Further reading
- How Processing Works for the full pipeline lifecycle with status tracking and error handling
- What Virza Extracts for a detailed breakdown of every extraction type
- Document Status for all 11 status states and what they mean
- Upload Limits for per-plan file size, page count, and storage limits
Last updated on