Extraction Tables
Extraction Tables let you build structured grids where AI reads each of your documents and fills in the cells. Define the columns you care about — sample size, methodology, primary outcome, confidence intervals — and Virza extracts that data from every paper in your table automatically.
Extraction Tables require a Pro plan. They are also gated behind the research_extraction_tables feature flag. If you don’t see the Extraction Tables option, check your plan in Billing & Plans.
What extraction tables are for
Extraction tables solve the problem of reading dozens of papers and manually copy-pasting data points into a spreadsheet. Common use cases:
| Use case | Example columns |
|---|---|
| Systematic reviews | Population, Intervention, Comparator, Outcome, Sample Size, Effect Size, p-value |
| Methodology comparison | Study design, Data collection method, Analysis technique, Limitations |
| Cohort or RCT summary | Trial phase, Randomisation method, Blinding, Drop-out rate, Primary endpoint |
| Technology evaluation | Programming language, Framework, Benchmark dataset, Reported accuracy, Hardware |
| Qualitative research | Research approach, Theoretical framework, Participant count, Data analysis method |
Each cell is extracted by AI from the paper’s full text — not from the abstract alone. The model reads the methods, results, and discussion sections to find the most accurate answer.
Requirements
Before running extraction on a document, it must meet both of these conditions:
- Fully processed — the document must have a
Readystatus. Documents that are still scanning or parsing do not have extracted text yet, so the AI has nothing to read. - Has readable text — scanned images with no OCR, password-protected PDFs, and corrupt uploads cannot be extracted. Virza runs OCR automatically, but very low-quality scans may produce poor results.
If a document is still processing when you trigger extraction, those cells will be marked Not Found. You can remove and re-add the document after it finishes processing, then re-run extraction to fill those cells.
Creating a table
Open Extraction Tables
From the sidebar, navigate to Research → Extraction Tables. If this is your first table, you will see an empty state with a New Table button.
Give your table a name
Enter a descriptive title — for example, “RCT meta-analysis 2024” or “NLP benchmarking comparison”. Optionally add a description explaining the research question this table is answering.
Choose a template pack (optional)
If your documents belong to a recognised study type, select a pre-built template pack. Templates give you a head start with columns that are already worded correctly for that study type:
| Template pack | Best for |
|---|---|
| RCT (Randomised Controlled Trial) | Clinical trials, drug studies, intervention research |
| Cohort study | Observational epidemiology, longitudinal studies |
| Qualitative research | Interviews, ethnography, grounded theory, thematic analysis |
| Systematic review | Evidence synthesis, meta-analyses, PRISMA-style reviews |
| Technology / benchmark | ML papers, software comparisons, performance evaluations |
You can use a template pack as a starting point and add, edit, or remove columns after.
Add documents
Click Add Documents to pick papers from your library. You can:
- Select documents one by one from the document picker
- Filter by collection to quickly add a whole set of related papers
- Add up to 100 documents per table
Documents that are not yet fully processed are shown with a warning icon. You can still add them, but those rows will not extract until the documents finish processing.
Create your table
Click Create Table. Virza creates the table in Draft status with pending cells for every document × column combination.
Defining columns
Columns are the backbone of your table. Each column has:
- Name — a short label shown in the table header (e.g., “Sample size”)
- Prompt — the instruction the AI follows when reading each paper (e.g., “What is the total number of participants in the study? Return only the number.”)
- Data type — the expected format of the extracted value
Column data types
| Type | Use when | Example |
|---|---|---|
| Text | The value is a sentence, phrase, or description | Methodology description, study design |
| Number | The value is a numeric figure | Sample size, p-value, mean age |
| Boolean | The answer is yes/no or true/false | ”Was the study double-blinded?”, “Was ethics approval obtained?” |
| List | The value is multiple items | Outcome measures, co-authors, interventions tested |
Choosing the right data type helps the AI format its answer correctly and makes the table easier to read at a glance.
Writing effective prompts
The quality of your extraction depends almost entirely on your prompts. A vague prompt produces vague results.
Rules for good prompts:
- Be specific about what you want — instead of “Sample size”, write “What is the total number of participants enrolled in the study? Include dropouts. Return only the integer.”
- Specify the format — “Return only the number”, “Return yes or no”, “Return a comma-separated list”
- Tell the AI where to look — “As reported in the Methods section”, “As stated in Table 1 or the Results section”
- Handle missing data — “If not reported, return N/A”
- Avoid ambiguity — if a paper could have multiple answers (e.g., multiple arms with different sample sizes), specify which you want: “Total enrolled across all arms combined”
Prompt examples by column type:
| Column name | Effective prompt |
|---|---|
| Sample size | What is the total number of participants enrolled in the study? Return only the number. If not reported, return N/A. |
| Study design | What type of study design is used? (e.g., randomised controlled trial, cohort study, case-control, cross-sectional). Return a single phrase. |
| Primary outcome | What is the primary outcome measure as stated by the authors? Return the exact outcome name as written in the paper. |
| Double-blinded | Was the study double-blinded? Answer yes, no, or unclear. |
| Key limitations | List the main limitations the authors acknowledge. Return each as a short phrase, comma-separated. |
| Follow-up duration | What is the follow-up duration? Include the time unit (weeks, months, years). Return the exact value as written. |
| Effect size | What is the reported effect size or main statistical result? Include the metric type (OR, HR, RR, Cohen’s d, etc.) and confidence interval if reported. |
| Country | In which country or countries was the study conducted? Return a comma-separated list. |
Batching: Virza sends all columns for one document in a single AI request. This means 20 columns on 50 documents = 50 AI calls, not 1,000. This is fast and cost-efficient.
Column limits
Each table supports up to 20 columns. If you need more, consider splitting your extraction into multiple focused tables (e.g., one table for study design, another for outcomes).
Using pre-built template columns
Template packs give you pre-written column definitions with proven prompts. To use them:
- Open your table and click Add Column
- Switch to the Templates tab
- Select a template pack and choose the columns you want to add
- Click Add Selected Columns
You can modify a template column’s name or prompt after adding it. The template key is retained for reference but does not affect extraction.
RCT template columns
| Column | What it extracts |
|---|---|
| Sample Size | Total enrolled participants |
| Randomisation Method | How participants were randomised (block, stratified, etc.) |
| Blinding | Single, double, or open-label |
| Primary Outcome | Stated primary endpoint |
| Follow-up Duration | Duration with unit |
| Drop-out Rate | Percentage or count of drop-outs |
| Intervention | What the treatment group received |
| Control / Comparator | What the control group received |
| Statistical Method | Primary analysis approach |
| Effect Measure | OR, RR, HR, mean difference, etc. with CI |
| p-value | Reported significance level for primary outcome |
| Ethics Approval | Whether ethics approval is stated |
Cohort template columns
| Column | What it extracts |
|---|---|
| Study Design | Prospective or retrospective cohort |
| Sample Size | Total cohort size |
| Exposure | Exposure or risk factor studied |
| Outcome | Primary outcome measured |
| Confounders Adjusted | Variables adjusted for in analysis |
| Follow-up Duration | Follow-up period with unit |
| Loss to Follow-up | Percentage lost |
| Association Measure | RR, HR, OR with confidence interval |
Qualitative template columns
| Column | What it extracts |
|---|---|
| Research Approach | Phenomenology, grounded theory, ethnography, etc. |
| Participant Count | Number of participants |
| Sampling Strategy | Purposive, snowball, theoretical, etc. |
| Data Collection | Interviews, focus groups, observations, documents |
| Analysis Method | Thematic, content, discourse analysis, etc. |
| Theoretical Framework | Underlying theory or paradigm |
| Saturation Reached | Whether data saturation is reported |
| Key Themes | Main themes or categories identified |
Adding and removing documents
Adding more documents after creation
Click Add Documents from the table view. Any new documents are added with Pending cells for all existing columns. Run extraction again to fill the new rows.
Removing documents
Click the row’s context menu and select Remove Document. This removes the document from the table but does not delete it from your library. Cells are soft-deleted (retained internally for 30 days) and can be re-added.
Running extraction
Review pending cells
After adding documents and columns, the table shows cells in Pending status (shown as a grey dash). Pending cells are waiting to be extracted.
Click Extract
Click the Extract button (or Re-extract if the table has been run before). Virza enqueues the extraction job and the table status changes to Extracting.
Wait for results
Extraction processes 10 documents at a time in batches. For a 50-document table with 10 columns, expect around 60–120 seconds total. A progress indicator in the table header shows how many cells have been filled.
Review the results
Once complete, the table status changes to Ready and cells show their extracted values. Cells have one of four final states:
| Cell status | Meaning |
|---|---|
| Done | AI found and extracted a value |
| Not Found | AI could not find the requested information in the document |
| Failed | An error occurred during extraction (try re-extracting) |
| Pending | Not yet processed (trigger extraction to fill) |
Table stuck in “Extracting”? If the table status is still “Extracting” after several minutes with no progress, the extraction may have failed silently. Refresh the page — if the table is still stuck, click Re-extract to re-queue only the remaining pending and failed cells.
Re-extracting cells
Extraction is not destructive. You can re-run extraction at any time:
- Re-extract all — click the Extract button again; Virza re-processes only cells that are still
PendingorFailed, leaving alreadyDonecells untouched. - Re-extract a single cell — click the cell, then click Re-extract this cell from the cell detail panel.
- Re-extract after editing a prompt — if you update a column’s prompt, cells already marked
Doneare reset toPendingfor that column only, then re-extraction fills them with the new prompt.
Understanding “Not Found” results
A Not Found result means the AI read the full document and could not identify the information you asked for. Common causes:
| Cause | Solution |
|---|---|
| The document genuinely doesn’t report this data | Normal — indicates a gap in the literature |
| The prompt is too narrow or uses jargon not present in the paper | Broaden or rephrase the prompt |
| The document is a very short abstract or metadata-only entry | The document was not fully processed — check its status |
| The relevant section uses different terminology | Add synonyms to your prompt: “What is the sample size, participant count, or cohort size?” |
| The information is in a table or figure but not in prose | For highly structured data, specify “including data reported in tables or figures” |
Not Found cells count as a valid result and can be sorted and filtered like any other.
Editing the table
Renaming the table
Click the table title to edit it inline. Press Enter or click away to save.
Editing column prompts
Click the column header, then Edit Column. Update the prompt and save. All cells for that column are reset to Pending and will be re-extracted on the next extraction run.
Reordering columns
Drag column headers left or right to reorder them. Column positions are saved automatically.
Deleting columns
Click the column header → Delete Column. Cells are soft-deleted (recoverable within 30 days). This cannot be undone from the UI within the same session.
Exporting to CSV
Click Export → Download CSV from the table toolbar. The CSV file contains:
- One row per document
- One column per extraction column, plus document metadata columns (title, authors, year, DOI)
- Cell values as plain text (lists are joined with semicolons)
- Empty cells for Not Found or pending values
The filename is extraction-table-{table-id}.csv. You can open it directly in Excel, Google Sheets, or import it into R/Python for statistical analysis.
Table limits
| Limit | Value |
|---|---|
| Documents per table | 100 |
| Columns per table | 20 |
| Tables per workspace | Unlimited |
| Extraction timeout per table | 10 minutes |
| Cell retention after deletion | 30 days |
Tips and best practices
Start narrow, then expand — begin with 5–10 high-priority columns on a small subset of papers (10–15). Review the quality before running on your full corpus.
Use collections to organise documents — create a collection for the papers relevant to your review before opening Extraction Tables. You can filter the document picker by collection to add them all at once.
Test prompts on a single paper first — before running on 80 documents, test your column prompt on one paper you know well. Verify the cell captures what you expect, then scale up.
Number columns are strict — if a paper reports a range (“100–150 participants”), the AI may return the range as text. Make your prompt explicit: “If a range is given, return the midpoint as an integer.”
Not all documents process equally — preprints, conference papers, and theses may use different section structures. If you get many Not Found cells for a specific paper, check its document status and consider whether it was fully parsed.
Re-extract is non-destructive — if you tweak a prompt and run extraction again, only the affected column’s cells are reset. All other cells remain intact.
Troubleshooting
The table is stuck in “Extracting” status
The extraction worker may have crashed or the job may have timed out. Refresh the page and click Re-extract. If it persists, check that your documents are all in Ready status.
All cells show “Not Found” for a specific document
The document likely failed to parse correctly. Go to your library, find the document, and check its status. If it shows a processing error, try re-uploading the original file.
Cells show “Failed” for many documents
This usually indicates an issue with the AI service (transient error). Click Re-extract to retry. Failed cells are re-queued automatically on re-extraction.
The Extract button is disabled or missing
Check that you have Editor role or higher in the workspace. Viewers can read and export tables but cannot trigger extraction or add columns. See Roles & Permissions for details.
The Extraction Tables option is not visible in the sidebar
This feature requires a Pro plan and the research_extraction_tables feature flag to be enabled for your workspace. Contact your workspace owner or check Billing & Plans.
Privacy and data handling
- Document text is sent to Virza’s AI gateway (Virza Cortex) for extraction. No document content is used to train models.
- Extraction jobs are workspace-scoped — cells from one workspace are never visible to another.
- Extracted cell values are stored encrypted in Virza’s database.
- Deleting a table soft-deletes all cells. Hard deletion occurs automatically after 30 days.
For full privacy details, see Data Isolation.