Skip to main content

Documentation Index

Fetch the complete documentation index at: https://wuweism.com/llms.txt

Use this file to discover all available pages before exploring further.

The Epistemic Dashboard gives you a cross-session view of the causal quality of your work. Where individual surfaces (Causal Workbench, Hybrid Synthesis, Legal Causation) focus on producing outputs, the Epistemic Dashboard focuses on evaluating them — tracking whether your reasoning, your evidence base, and your claims meet the standards of rigorous causal science. Navigate to the Epistemic Dashboard by selecting Epistemic in the sidebar, or by opening /epistemic directly.

What it monitors

The dashboard aggregates signal from all your Wu-Weism sessions into four areas:
  1. Scientific evidence tracking: numeric and qualitative evidence surfaces across your sessions
  2. Alignment audit reports: how your causal reasoning aligns with best practices at each rung of Pearl’s ladder
  3. Spectral health monitor: a real-time view of causal health across your active and completed work
  4. Benchmark results: your performance against scientific integrity benchmarks
Each area draws from the same underlying provenance graph that governs your claims — so the dashboard reflects actual work, not a separate evaluation layer.

Scientific evidence tracking

The evidence panel aggregates all numeric and qualitative evidence that has been extracted from your sessions — PDFs analyzed via the Causal Workbench, Hybrid Synthesis, or PDF Synthesis. For each piece of evidence the panel shows:
  • Source: the document or session the evidence came from
  • Evidence class: one of three labels (see below)
  • Claim linkage: which claims in the Claim Ledger this evidence supports
  • Extraction timestamp: when the evidence entered the system

Evidence class labels

LabelMeaning
bibliographic/structural onlyThe source contains references or structural content but no extractable quantitative evidence. Claims derived from this source carry higher uncertainty.
mixedThe source contains some numeric content alongside qualitative discussion. Evidence may support claims at Rung 1 but may not be sufficient for Rung 2 assertions without additional validation.
metric-bearingThe source contains well-formed quantitative evidence — effect sizes, confidence intervals, p-values, measured quantities. Sufficient to support Rung 2 claims and, with appropriate SCM specification, Rung 3 inference.
Filtering the evidence panel by class helps you quickly identify where your evidence base is thin and where it is robust.

Alignment audit reports

An alignment audit report is a structured evaluation of a session or synthesis run against causal best practices. Reports are generated automatically for each completed session and are available in the Audit Reports tab of the dashboard. Each report evaluates:
  • Rung consistency: whether Rung 2 and Rung 3 claims are supported by evidence of the appropriate class
  • SCM coverage: whether the causal variables asserted in responses are present in the loaded Truth Cartridge
  • Assumption explicitness: whether the assumptions underlying each causal claim were stated or remained implicit
  • Counterfactual validity: for Rung 3 claims, whether the counterfactual world was specified with sufficient precision
Reports are scored on a 0–100 alignment scale. A score below 70 typically indicates that claims were made at a higher rung than the evidence supports — a common issue when qualitative sources are used to ground quantitative causal assertions.
Alignment audit reports can be exported as PDF or JSON. The JSON format is structured for programmatic ingestion into review workflows or institutional reporting systems.

Spectral health monitor

The spectral health monitor provides a continuous, cross-session view of causal health as a set of metrics. Unlike the per-session audit report, the spectral monitor aggregates across all your work and updates as new sessions complete. Metrics tracked:
MetricDescription
Rung distributionThe proportion of your claims at Rung 1, 2, and 3. A healthy distribution for most research contexts skews toward Rung 2 with a smaller Rung 3 component. Heavy Rung 1 concentration may indicate under-specified causal questions.
Evidence coverage ratioThe fraction of your Rung 2+ claims that are supported by metric-bearing evidence. Low coverage indicates claims that outrun the evidentiary base.
SCM coherence scoreHow consistently your session outputs stay within the constraints of the loaded SCMs. High coherence means your questions and the model’s responses are well-matched to the domain cartridge.
Claim stabilityWhether recorded claims have been revised or retracted after initial recording. High instability may indicate poorly specified questions or volatile evidence sources.
The monitor displays each metric as a time-series over your session history, so you can observe trends rather than just point-in-time values.

Benchmark results

The benchmarks tab shows your performance against a set of scientific integrity reference points. These benchmarks test whether your causal work meets the standards of reproducible, falsifiable causal science. Benchmarks evaluated:
Evaluates whether your recorded claims are stated in a way that admits of empirical refutation. Claims expressed in vague or unfalsifiable language score low. A passing claim must specify: (1) the causal variable, (2) the outcome variable, (3) the direction of the effect, and (4) the conditions under which the claim holds.
Evaluates whether every recorded claim can be traced to a source — a session, a document, and an extraction event. Orphaned claims (present in the Claim Ledger without traceable provenance) fail this benchmark.
Evaluates whether claims that carry uncertainty labels (from PDF Synthesis or Hybrid Synthesis) have had those labels preserved in the Claim Ledger entry. Stripping uncertainty labels when recording claims is a governance failure.
Evaluates whether Rung 2 claims are clearly distinguished from Rung 1 claims in your session outputs and recorded claims. Conflating observational findings with interventional conclusions is the most common epistemic error in applied causal analysis.
Benchmark results are updated after each session completes. Historical benchmark scores are retained so you can track improvement over time.

Claim Ledger

The governed record of claims that feeds the Epistemic Dashboard.

PDF Synthesis

Understand how evidence class labels are assigned during document analysis.

Causal Ladder

The three-rung framework underpinning alignment audit scoring.

Hybrid Synthesis

Multi-source synthesis whose outputs feed alignment audit reports.