Analyzing research papers with PDF upload

Wu-Weism can analyze research papers directly. You can attach a PDF to any Causal Chat message — the extracted evidence then grounds the response in the paper’s actual data — or use the dedicated PDF Synthesis tool to cross-analyze up to six papers simultaneously. Both surfaces extract structured numeric evidence, generate governed claims, and record everything in the Claim Ledger.

Two ways to use PDFs

Causal Chat attachment
PDF Synthesis (/pdf-synthesis)

Attach a single PDF to a Causal Chat message. The PDF’s content is extracted and used to inform the causal response to your question. Best for:

Grounding a specific causal question in a paper’s reported data
Checking whether a paper’s findings support or contradict a claim
Extracting key statistics while also running causal analysis

Analyzing a single paper in Causal Chat

Open a Causal Chat conversation

Navigate to /chat and start a new conversation or continue an existing one.

Click the attachment button

Click the paperclip icon (attachment button) in the message input bar. A file picker will open. Select your PDF file.The PDF name will appear as an attachment chip next to the message input. You can attach one PDF per message.

Type your question

Write your causal question in the message input as you normally would. The question will be analyzed in the context of the attached paper.

Example question with PDF attached

What causal claims about IL-6 and fibrosis progression does this paper support?

Another example

Does the methodology in this paper meet the identifiability requirements 
for a Rung 2 intervention claim?

Read the three-section output

When a PDF is attached, the response follows a structured three-section output contract:

Section 1: All explicit numbers with contextEvery numeric value extracted from the paper, grouped by category:

Category	Examples
Potential metrics	Effect sizes, p-values, confidence intervals, odds ratios
Structural	Sample sizes, group counts, time points, dosages
Bibliographic	Publication year, journal impact factor if present
Citation years	Years of cited works
Reference indices	Reference numbers linked to cited claims

Section 2: Claim-eligible numericsA filtered subset of Section 1 containing only the values that are precise enough and contextually clear enough to anchor a governed claim. Values with ambiguous units, missing context, or contradictory definitions elsewhere in the paper are excluded from this section.

Section 3: Three claims with uncertainty labelsThree governed claims derived from the paper’s evidence. Each claim carries an evidence class label:

Evidence class	Meaning
`bibliographic/structural only`	The claim is supported only by citation references or study design information — no direct numeric metrics
`mixed`	The claim combines structural evidence with some numeric data, but the metrics are indirect or aggregated
`metric-bearing`	The claim is directly supported by explicit numeric measurements from the paper

Example Section 3 output

[Claim 1 — metric-bearing]
Anti-inflammatory treatment reduced IL-6 levels by 38% (95% CI: 24–51%) 
in the intervention arm vs. control at 12 weeks.
Uncertainty: metric-bearing | Confidence: 0.79

[Claim 2 — mixed]
Fibrosis progression was attenuated in the treatment group, consistent 
with reduced IL-6 signaling, though direct fibrosis measurements were 
not stratified by IL-6 quartile.
Uncertainty: mixed | Confidence: 0.52

[Claim 3 — bibliographic/structural only]
The study design is consistent with prior RCT methodology for cytokine 
intervention trials (citations: [12], [14], [17]).
Uncertainty: bibliographic/structural only | Confidence: 0.41

All three claims are automatically recorded in the Claim Ledger.

Wu-Weism extracts text from PDFs using the document’s text layer. Scanned PDFs — images of printed pages with no embedded text — will have significantly lower extraction quality. Key statistics may be missed, misread, or absent from Sections 1 and 2. If you are working with scanned documents, use OCR software to create a searchable PDF before uploading. Claims derived from scanned PDFs will be labeled with lower confidence scores.

Using PDF Synthesis for multi-paper analysis

PDF Synthesis is a dedicated research surface at /pdf-synthesis designed for analyzing multiple papers at once.

Navigate to /pdf-synthesis

Open PDF Synthesis from the workbench navigation.

Upload your PDFs

Drag and drop up to six PDF files into the upload area, or click Select files to use the file picker. Each file appears in the upload queue with its name and size.

Optionally specify a research focus

In the Research focus field, enter a brief description of what you are investigating. This helps Wu-Weism weight evidence and frame claims around your specific question rather than the papers’ full scope.

Example research

Effect of anti-inflammatory interventions on IL-6 in patients with 
chronic kidney disease

Providing a research focus consistently improves synthesis quality. Without it, Wu-Weism treats each paper equally and generates claims that reflect the papers’ own framings, which may not align with your specific research question. Even a one-sentence focus statement makes a meaningful difference.

Review synthesized output

The synthesis produces:

Numeric evidence summary: Aggregated metrics across all papers, including table counts, trusted table counts (tables where headers and data are reliably parsed), and total data point counts per paper.
Cross-paper claim comparison: Where multiple papers address the same causal relationship, Wu-Weism surfaces agreement and contradiction.
Governed claims set: Claims derived from the full corpus, each attributed to the source paper(s) and labeled with an evidence class.
Reconciliation notes: Where papers conflict on effect direction or magnitude, the synthesis explicitly flags the disagreement rather than averaging it away.

What gets extracted

When Wu-Weism processes a PDF — whether in Causal Chat or PDF Synthesis — it extracts:

Extracted field	Description
Numeric evidence	All numeric values with surrounding context
Table counts	Number of tables identified in the document
Trusted table counts	Tables where structure is reliably parsed (headers and rows matched)
Data point counts	Total discrete numeric data points extracted
Claims	Candidate claims derived from evidence
Bibliographic metadata	Where present: authors, year, journal

Claims and the Claim Ledger

Every claim produced from a PDF analysis — whether in Causal Chat or PDF Synthesis — is automatically recorded in the Claim Ledger with:

Source paper name
Section of the paper the claim derives from
Evidence class label
Confidence score
Session and timestamp

You can review, annotate, challenge, and export these claims from /claims at any time.

Limitations

Text layer required: PDFs must have an embedded text layer. Scanned-only documents have significantly degraded extraction.
Complex table layouts: Multi-level headers, merged cells, and sideways tables may not parse correctly into the trusted table count.
Supplementary materials: If supplementary data is in a separate file, upload it as a separate document in PDF Synthesis or attach it in a second Causal Chat message.
Non-English papers: Extraction and claim generation quality is highest for English-language documents.

Next steps

Combine PDF evidence with intervention questions: Running interventions
Understand how claims are governed and audited: Claim Ledger
Run a full multi-paper synthesis with the Hybrid Synthesis tool: Hybrid Synthesis

​Two ways to use PDFs

​Analyzing a single paper in Causal Chat

​Using PDF Synthesis for multi-paper analysis

​What gets extracted

​Claims and the Claim Ledger

​Limitations

​Next steps

Two ways to use PDFs

Analyzing a single paper in Causal Chat

Using PDF Synthesis for multi-paper analysis

What gets extracted

Claims and the Claim Ledger

Limitations

Next steps