PDF Synthesis - Wu-Weism

PDF Synthesis is a focused extraction surface: give it one document, and it returns a structured breakdown of the quantitative evidence inside it and the causal claims that evidence can support. Where Hybrid Synthesis is designed for cross-document conflict and novelty, PDF Synthesis is designed for depth — extracting everything causally relevant from a single source and presenting it in a form that feeds directly into the Claim Ledger. Navigate to PDF Synthesis by selecting PDF Synthesis in the sidebar, or by opening /pdf-synthesis directly.

Uploading a document

Open the dropzone

The PDF Synthesis surface opens with a dropzone as its primary interface. You will see a large upload area in the center of the screen.

Drop or select your PDF

Drag your PDF file onto the dropzone, or click Choose file to browse your filesystem. Only .pdf files are accepted.

Wait for processing

PDF Synthesis runs extraction automatically after upload. A progress indicator shows parsing and analysis stages as they complete. Extraction time depends on document length and numeric density.

Review the three-section output

When extraction completes, the output panel populates with three structured sections (described below).

PDF Synthesis processes one document per run. If you need to synthesize across multiple documents simultaneously, use Hybrid Synthesis instead.

The three-section output

Every completed PDF Synthesis run returns the same three-section structure. This contract is fixed — you can rely on it for downstream processing or integration into reporting workflows.

Section 1 — All explicit numbers

Section 1 is an exhaustive inventory of every number that appears in the document in a causal or scientific context. This includes:

Measured quantities with units
Effect sizes and odds ratios
Sample sizes and study parameters
Percentages, rates, and proportions
Confidence intervals and standard errors
p-values and test statistics
Dates and durations relevant to causal timing

Each entry records the number, its unit (if applicable), the sentence it appeared in, and the page or section location in the source document. Section 1 is intentionally exhaustive and unfiltered. Its purpose is to ensure nothing is missed before the filtering step in Section 2.

Section 2 — Claim-eligible numerics

Section 2 is a filtered subset of Section 1: only those numbers that could support a causal claim. A number is claim-eligible if it:

Represents an effect, association, or outcome (not merely a reference or label)
Has sufficient context to identify what was measured and under what conditions
Could, in combination with a causal model, support a directional assertion about a relationship between variables

Each entry in Section 2 carries an evidence class label:

Label	What it means for this number
`bibliographic/structural only`	The number appears in a citation, reference list, or structural element. It cannot be used as primary evidence for a claim.
`mixed`	The number is present in the body but lacks full measurement context (no reported conditions, missing units, unclear operationalization). Usable with caution at Rung 1.
`metric-bearing`	The number is a well-formed measurement with sufficient context to support Rung 2 claims and, with appropriate SCM framing, Rung 3 inference.

Use Section 2 to quickly assess whether a document’s evidence base can support the causal claims you want to make. A document that returns mostly bibliographic/structural only labels in Section 2 is a thin evidentiary source regardless of how substantial it appears.

Section 3 — Three claims with uncertainty labels

Section 3 is the synthesis output: three candidate causal claims derived from the Section 2 numerics, each labeled with an uncertainty level. Each claim includes:

Claim statement: a falsifiable causal assertion grounded in the document’s evidence
Supporting evidence: the specific Section 2 numerics that underpin the claim
Causal rung: Rung 1, 2, or 3, reflecting the epistemic type of the claim
Uncertainty label: one of low, moderate, or high, reflecting how confidently the evidence supports the claim

The three claims are not exhaustive — they represent the three best-supported and most causally specific claims the extraction engine can construct from the document. They are starting points for investigation, not final conclusions.

Uncertainty labels in Section 3 must be preserved when recording claims to the Claim Ledger. Stripping uncertainty labels is a governance failure and will cause the Epistemic Dashboard’s uncertainty disclosure benchmark to fail.

How results feed into the Claim Ledger

After Section 3 is generated, each claim can be recorded to the Claim Ledger by clicking Record claim next to it. The recorded entry carries:

The claim statement
The uncertainty label
The source document (filename and extraction timestamp)
The supporting Section 2 numerics as provenance

Claims recorded from PDF Synthesis are traceable to their source document at the numeric level — not just to the document as a whole. This fine-grained provenance is what distinguishes governed claims from informal notes.

Difference from Hybrid Synthesis

PDF Synthesis and Hybrid Synthesis share extraction infrastructure but serve different purposes:

	PDF Synthesis	Hybrid Synthesis
Input	One PDF	Up to 6 PDFs and/or 5 company names
Goal	Deep extraction from a single source	Cross-source conflict resolution and novelty
Output	Three-section structured extraction	Novel hypotheses with confidence scores
Claim recording	Manual (click to record)	Automatic for top claim; manual for others
Best for	Auditing a single document’s evidentiary content	Finding what multiple sources collectively imply

If you have a single paper and want to know what causal claims it can support, use PDF Synthesis. If you have a corpus and want to know what those sources collectively imply beyond their individual conclusions, use Hybrid Synthesis.

Hybrid Synthesis

Multi-source synthesis for reconciling conflicting claims across a corpus.

Claim Ledger

Understand how recorded claims are governed and exported.

Epistemic Dashboard

Track evidence class distribution and uncertainty disclosure across your work.

Causal Workbench

Use extracted claims as starting points for SCM-grounded causal dialogue.

​Uploading a document

​The three-section output

​Section 1 — All explicit numbers

​Section 2 — Claim-eligible numerics

​Section 3 — Three claims with uncertainty labels

​How results feed into the Claim Ledger

​Difference from Hybrid Synthesis

​Related pages

Hybrid Synthesis

Claim Ledger

Epistemic Dashboard

Causal Workbench

Uploading a document

The three-section output

Section 1 — All explicit numbers

Section 2 — Claim-eligible numerics

Section 3 — Three claims with uncertainty labels

How results feed into the Claim Ledger

Difference from Hybrid Synthesis

Related pages