Extraction Tables

Extraction Tables let you build structured grids where AI reads each of your documents and fills in the cells. Define the columns you care about — sample size, methodology, primary outcome, confidence intervals — and Virza extracts that data from every paper in your table automatically.

Extraction Tables require a Pro plan. They are also gated behind the research_extraction_tables feature flag. If you don’t see the Extraction Tables option, check your plan in Billing & Plans.

What extraction tables are for

Extraction tables solve the problem of reading dozens of papers and manually copy-pasting data points into a spreadsheet. Common use cases:

Use case	Example columns
Systematic reviews	Population, Intervention, Comparator, Outcome, Sample Size, Effect Size, p-value
Methodology comparison	Study design, Data collection method, Analysis technique, Limitations
Cohort or RCT summary	Trial phase, Randomisation method, Blinding, Drop-out rate, Primary endpoint
Technology evaluation	Programming language, Framework, Benchmark dataset, Reported accuracy, Hardware
Qualitative research	Research approach, Theoretical framework, Participant count, Data analysis method

Each cell is extracted by AI from the paper’s full text — not from the abstract alone. The model reads the methods, results, and discussion sections to find the most accurate answer.

Requirements

Before running extraction on a document, it must meet both of these conditions:

Fully processed — the document must have a Ready status. Documents that are still scanning or parsing do not have extracted text yet, so the AI has nothing to read.
Has readable text — scanned images with no OCR, password-protected PDFs, and corrupt uploads cannot be extracted. Virza runs OCR automatically, but very low-quality scans may produce poor results.

If a document is still processing when you trigger extraction, those cells will be marked Not Found. You can remove and re-add the document after it finishes processing, then re-run extraction to fill those cells.

Creating a table

Open Extraction Tables

From the sidebar, navigate to Research → Extraction Tables. If this is your first table, you will see an empty state with a New Table button.

Give your table a name

Enter a descriptive title — for example, “RCT meta-analysis 2024” or “NLP benchmarking comparison”. Optionally add a description explaining the research question this table is answering.

Choose a template pack (optional)

If your documents belong to a recognised study type, select a pre-built template pack. Templates give you a head start with columns that are already worded correctly for that study type:

Template pack	Best for
RCT (Randomised Controlled Trial)	Clinical trials, drug studies, intervention research
Cohort study	Observational epidemiology, longitudinal studies
Qualitative research	Interviews, ethnography, grounded theory, thematic analysis
Systematic review	Evidence synthesis, meta-analyses, PRISMA-style reviews
Technology / benchmark	ML papers, software comparisons, performance evaluations

You can use a template pack as a starting point and add, edit, or remove columns after.

Add documents

Click Add Documents to pick papers from your library. You can:

Select documents one by one from the document picker
Filter by collection to quickly add a whole set of related papers
Add up to 100 documents per table

Documents that are not yet fully processed are shown with a warning icon. You can still add them, but those rows will not extract until the documents finish processing.

Create your table

Click Create Table. Virza creates the table in Draft status with pending cells for every document × column combination.

Defining columns

Columns are the backbone of your table. Each column has:

Name — a short label shown in the table header (e.g., “Sample size”)
Prompt — the instruction the AI follows when reading each paper (e.g., “What is the total number of participants in the study? Return only the number.”)
Data type — the expected format of the extracted value

Column data types

Type	Use when	Example
Text	The value is a sentence, phrase, or description	Methodology description, study design
Number	The value is a numeric figure	Sample size, p-value, mean age
Boolean	The answer is yes/no or true/false	”Was the study double-blinded?”, “Was ethics approval obtained?”
List	The value is multiple items	Outcome measures, co-authors, interventions tested

Choosing the right data type helps the AI format its answer correctly and makes the table easier to read at a glance.

Writing effective prompts

The quality of your extraction depends almost entirely on your prompts. A vague prompt produces vague results.

Rules for good prompts:

Be specific about what you want — instead of “Sample size”, write “What is the total number of participants enrolled in the study? Include dropouts. Return only the integer.”
Specify the format — “Return only the number”, “Return yes or no”, “Return a comma-separated list”
Tell the AI where to look — “As reported in the Methods section”, “As stated in Table 1 or the Results section”
Handle missing data — “If not reported, return N/A”
Avoid ambiguity — if a paper could have multiple answers (e.g., multiple arms with different sample sizes), specify which you want: “Total enrolled across all arms combined”

Prompt examples by column type:

Column name	Effective prompt
Sample size	What is the total number of participants enrolled in the study? Return only the number. If not reported, return N/A.
Study design	What type of study design is used? (e.g., randomised controlled trial, cohort study, case-control, cross-sectional). Return a single phrase.
Primary outcome	What is the primary outcome measure as stated by the authors? Return the exact outcome name as written in the paper.
Double-blinded	Was the study double-blinded? Answer yes, no, or unclear.
Key limitations	List the main limitations the authors acknowledge. Return each as a short phrase, comma-separated.
Follow-up duration	What is the follow-up duration? Include the time unit (weeks, months, years). Return the exact value as written.
Effect size	What is the reported effect size or main statistical result? Include the metric type (OR, HR, RR, Cohen’s d, etc.) and confidence interval if reported.
Country	In which country or countries was the study conducted? Return a comma-separated list.

Batching: Virza sends all columns for one document in a single AI request. This means 20 columns on 50 documents = 50 AI calls, not 1,000. This is fast and cost-efficient.

Column limits

Each table supports up to 20 columns. If you need more, consider splitting your extraction into multiple focused tables (e.g., one table for study design, another for outcomes).

Using pre-built template columns

Template packs give you pre-written column definitions with proven prompts. To use them:

Open your table and click Add Column
Switch to the Templates tab
Select a template pack and choose the columns you want to add
Click Add Selected Columns

You can modify a template column’s name or prompt after adding it. The template key is retained for reference but does not affect extraction.

RCT template columns

Column	What it extracts
Sample Size	Total enrolled participants
Randomisation Method	How participants were randomised (block, stratified, etc.)
Blinding	Single, double, or open-label
Primary Outcome	Stated primary endpoint
Follow-up Duration	Duration with unit
Drop-out Rate	Percentage or count of drop-outs
Intervention	What the treatment group received
Control / Comparator	What the control group received
Statistical Method	Primary analysis approach
Effect Measure	OR, RR, HR, mean difference, etc. with CI
p-value	Reported significance level for primary outcome
Ethics Approval	Whether ethics approval is stated

Cohort template columns

Column	What it extracts
Study Design	Prospective or retrospective cohort
Sample Size	Total cohort size
Exposure	Exposure or risk factor studied
Outcome	Primary outcome measured
Confounders Adjusted	Variables adjusted for in analysis
Follow-up Duration	Follow-up period with unit
Loss to Follow-up	Percentage lost
Association Measure	RR, HR, OR with confidence interval

Qualitative template columns

Column	What it extracts
Research Approach	Phenomenology, grounded theory, ethnography, etc.
Participant Count	Number of participants
Sampling Strategy	Purposive, snowball, theoretical, etc.
Data Collection	Interviews, focus groups, observations, documents
Analysis Method	Thematic, content, discourse analysis, etc.
Theoretical Framework	Underlying theory or paradigm
Saturation Reached	Whether data saturation is reported
Key Themes	Main themes or categories identified

Adding and removing documents

Adding more documents after creation

Click Add Documents from the table view. Any new documents are added with Pending cells for all existing columns. Run extraction again to fill the new rows.

Removing documents

Click the row’s context menu and select Remove Document. This removes the document from the table but does not delete it from your library. Cells are soft-deleted (retained internally for 30 days) and can be re-added.

Running extraction

Review pending cells

After adding documents and columns, the table shows cells in Pending status (shown as a grey dash). Pending cells are waiting to be extracted.

Click Extract

Click the Extract button (or Re-extract if the table has been run before). Virza enqueues the extraction job and the table status changes to Extracting.

Wait for results

Extraction processes 10 documents at a time in batches. For a 50-document table with 10 columns, expect around 60–120 seconds total. A progress indicator in the table header shows how many cells have been filled.

Review the results

Once complete, the table status changes to Ready and cells show their extracted values. Cells have one of four final states:

Cell status	Meaning
Done	AI found and extracted a value
Not Found	AI could not find the requested information in the document
Failed	An error occurred during extraction (try re-extracting)
Pending	Not yet processed (trigger extraction to fill)

Table stuck in “Extracting”? If the table status is still “Extracting” after several minutes with no progress, the extraction may have failed silently. Refresh the page — if the table is still stuck, click Re-extract to re-queue only the remaining pending and failed cells.

Re-extracting cells

Extraction is not destructive. You can re-run extraction at any time:

Re-extract all — click the Extract button again; Virza re-processes only cells that are still Pending or Failed, leaving already Done cells untouched.
Re-extract a single cell — click the cell, then click Re-extract this cell from the cell detail panel.
Re-extract after editing a prompt — if you update a column’s prompt, cells already marked Done are reset to Pending for that column only, then re-extraction fills them with the new prompt.

Understanding “Not Found” results

A Not Found result means the AI read the full document and could not identify the information you asked for. Common causes:

Cause	Solution
The document genuinely doesn’t report this data	Normal — indicates a gap in the literature
The prompt is too narrow or uses jargon not present in the paper	Broaden or rephrase the prompt
The document is a very short abstract or metadata-only entry	The document was not fully processed — check its status
The relevant section uses different terminology	Add synonyms to your prompt: “What is the sample size, participant count, or cohort size?”
The information is in a table or figure but not in prose	For highly structured data, specify “including data reported in tables or figures”

Not Found cells count as a valid result and can be sorted and filtered like any other.

Editing the table

Renaming the table

Click the table title to edit it inline. Press Enter or click away to save.

Editing column prompts

Click the column header, then Edit Column. Update the prompt and save. All cells for that column are reset to Pending and will be re-extracted on the next extraction run.

Reordering columns

Drag column headers left or right to reorder them. Column positions are saved automatically.

Deleting columns

Click the column header → Delete Column. Cells are soft-deleted (recoverable within 30 days). This cannot be undone from the UI within the same session.

Exporting to CSV

Click Export → Download CSV from the table toolbar. The CSV file contains:

One row per document
One column per extraction column, plus document metadata columns (title, authors, year, DOI)
Cell values as plain text (lists are joined with semicolons)
Empty cells for Not Found or pending values

The filename is extraction-table-{table-id}.csv. You can open it directly in Excel, Google Sheets, or import it into R/Python for statistical analysis.

Table limits

Limit	Value
Documents per table	100
Columns per table	20
Tables per workspace	Unlimited
Extraction timeout per table	10 minutes
Cell retention after deletion	30 days

Tips and best practices

Start narrow, then expand — begin with 5–10 high-priority columns on a small subset of papers (10–15). Review the quality before running on your full corpus.

Use collections to organise documents — create a collection for the papers relevant to your review before opening Extraction Tables. You can filter the document picker by collection to add them all at once.

Test prompts on a single paper first — before running on 80 documents, test your column prompt on one paper you know well. Verify the cell captures what you expect, then scale up.

Number columns are strict — if a paper reports a range (“100–150 participants”), the AI may return the range as text. Make your prompt explicit: “If a range is given, return the midpoint as an integer.”

Not all documents process equally — preprints, conference papers, and theses may use different section structures. If you get many Not Found cells for a specific paper, check its document status and consider whether it was fully parsed.

Re-extract is non-destructive — if you tweak a prompt and run extraction again, only the affected column’s cells are reset. All other cells remain intact.

Troubleshooting

The table is stuck in “Extracting” status

The extraction worker may have crashed or the job may have timed out. Refresh the page and click Re-extract. If it persists, check that your documents are all in Ready status.

All cells show “Not Found” for a specific document

The document likely failed to parse correctly. Go to your library, find the document, and check its status. If it shows a processing error, try re-uploading the original file.

Cells show “Failed” for many documents

This usually indicates an issue with the AI service (transient error). Click Re-extract to retry. Failed cells are re-queued automatically on re-extraction.

The Extract button is disabled or missing

Check that you have Editor role or higher in the workspace. Viewers can read and export tables but cannot trigger extraction or add columns. See Roles & Permissions for details.

The Extraction Tables option is not visible in the sidebar

This feature requires a Pro plan and the research_extraction_tables feature flag to be enabled for your workspace. Contact your workspace owner or check Billing & Plans.

Privacy and data handling

Document text is sent to Virza’s AI gateway (Virza Cortex) for extraction. No document content is used to train models.
Extraction jobs are workspace-scoped — cells from one workspace are never visible to another.
Extracted cell values are stored encrypted in Virza’s database.
Deleting a table soft-deletes all cells. Hard deletion occurs automatically after 30 days.

For full privacy details, see Data Isolation.