Skip to Content
DocumentsParsing Overview

How Parsing Works

When you upload a document, Virza runs it through a multi-stage AI pipeline that extracts text, metadata, citations, tables, figures, equations, and more.

For the complete processing lifecycle, including security scanning, status tracking, failure handling, and troubleshooting, see How Processing Works.

Pipeline overview

Every uploaded document goes through these phases:

  1. Security: Virus scanning and file validation in quarantine storage
  2. Extraction: PDF parsing via Docling, metadata detection, section segmentation
  3. Enrichment: Tables, figures, equations, citations (GROBID), CrossRef metadata
  4. AI analysis: Summaries, embeddings, vision descriptions (plan-dependent)
  5. Indexing: Full-text and semantic search indexing

What each plan unlocks

All plans receive full text extraction, metadata, citations, tables, figures, equations, summaries, and search indexing.

Pro and above add:

  • Vision descriptions of charts and diagrams
  • Structured table data extraction
  • Document structure analysis
  • Academic embeddings (Specter2)
  • Multi-level summaries
  • QA precompute for rapid document comprehension

Enterprise adds:

  • Claims extraction with typed evidence, p-values, and effect sizes
  • Agentic retrieval for multi-hop reasoning
  • Methodology scoring
  • Citation verification

Further reading

Last updated on