Skip to content

Instantly share code, notes, and snippets.

@ruvnet
Last active February 18, 2026 16:07
Show Gist options
  • Select an option

  • Save ruvnet/d72c063d68b9058855b44e425d85563c to your computer and use it in GitHub Desktop.

Select an option

Save ruvnet/d72c063d68b9058855b44e425d85563c to your computer and use it in GitHub Desktop.
Causal Atlas RVF Runtime — Planet Detection & Life Candidate Scoring

ADR-040: Causal Atlas RVF Runtime — Planet Detection & Life Candidate Scoring

Status: Proposed Date: 2026-02-18 Author: System Architect (AgentDB v3) Supersedes: None Related: ADR-003 (RVF Format), ADR-006 (Unified Self-Learning RVF), ADR-007 (Full Capability Integration), ADR-008 (Chat UI RVF) Package: @agentdb/causal-atlas

Context

ADR-008 demonstrated that a single RVF artifact can embed a minimal Linux userspace, an LLM inference engine, and a self-learning pipeline into one portable file. This ADR extends that pattern to scientific computing: a portable RVF runtime that ingests public astronomy and physics datasets, builds a multi-scale interaction graph, maintains a dynamic coherence field, and emits replayable witness logs for every derived claim.

The design draws engineering inspiration from causal sets, loop-gravity-style discretization, and holographic boundary encoding, but it is implemented as a practical data system, not a physics simulator. The holographic principle manifests as a concrete design choice: primarily store and index boundaries, and treat interior state as reconstructable from boundary witnesses and retained archetypes.

Existing Capabilities (ADR-003 through ADR-008)

Component Package Relevant APIs
RVF segments @ruvector/rvf, @ruvector/rvf-node embedKernel, extractKernel, embedEbpf, segments, derive
HNSW indexing @ruvector/rvf-node ingestBatch, query, compact, HNSW with metadata filters
Witness chains @ruvector/rvf-node, RvfSolver verifyWitness, SHAKE-256 witness chains, signed root hash
Graph transactions NativeAccelerator graphTransaction, graphBatchInsert, Cypher queries
SIMD embeddings @ruvector/ruvllm 768-dim SIMD embed, cosine/dot/L2, HNSW memory search
SONA learning SonaLearningBackend Micro-LoRA, trajectory recording, EWC++
Federated coordination FederatedSessionManager Cross-agent trajectories, warm-start patterns
Contrastive training ContrastiveTrainer InfoNCE, hard negative mining, 3-stage curriculum
Adaptive index AdaptiveIndexTuner 5-tier compression, Matryoshka truncation, health monitoring
Kernel embedding KernelBuilder (ADR-008) Minimal Linux boot from KERNEL_SEG + INITRD_SEG
Lazy model download ChatInference (ADR-008) Deferred GGUF load on first inference call

What This ADR Adds

  1. Domain adapters for astronomy data (light curves, spectra, galaxy catalogs)
  2. Compressed causal atlas with partial-order event graph
  3. Coherence field index with cut pressure and partition entropy
  4. Multi-scale interaction memory with budget-controlled tiered retention
  5. Boundary evolution tracker with holographic-style boundary-first storage
  6. Planet detection pipeline (Kepler/TESS transit search)
  7. Life candidate scoring pipeline (spectral disequilibrium signatures)
  8. Progressive data download from public sources on first activation

Goal State

A single RVF artifact that boots a minimal Linux userspace, progressively downloads and ingests public astronomy and physics datasets on first activation (lazy, like ADR-008's GGUF model download), builds a multi-scale interaction graph, maintains a dynamic coherence field, and emits replayable witness logs for every derived claim.

Primary Outputs

# Output Description
1 Atlas snapshots Queryable causal partial order plus embeddings
2 Coherence field Partition tree plus cut pressure signals over time
3 Multi-scale memory Delta-encoded interaction history from seconds to micro-windows
4 Boundary tracker Boundary changes, drift, and anomaly alerts
5 Planet candidates Ranked list with traceable evidence
6 Life candidates Ranked list of spectral disequilibrium signatures with traceable evidence

Non-Goals

  1. Proving quantum gravity
  2. Replacing astrophysical pipelines end-to-end
  3. Claiming life detection without conventional follow-up observation

Public Data Sources

All data is progressively downloaded from public archives on first activation. The RVF artifact ships with download manifests and integrity hashes, not the raw data itself.

Planet Finding

Source Access Reference
Kepler light curves and pixel files MAST bulk and portal archive.stsci.edu/kepler
TESS light curves and full-frame images MAST portal archive.stsci.edu/tess

Life-Relevant Spectra

Source Access Reference
JWST exoplanet spectra exo.MAST and MAST holdings archive.stsci.edu
NASA Exoplanet Archive parameters Cross-linking to spectra and mission products exoplanetarchive.ipac.caltech.edu

Large-Scale Structure

Source Access Reference
SDSS public catalogs (spectra, redshifts) DR17 sdss4.org/dr17

Progressive Download Strategy

Following the lazy-download pattern established in ADR-008 for GGUF models:

  1. Manifest-first: RVF ships with MANIFEST_SEG containing download URLs, SHA-256 hashes, expected sizes, and priority tiers
  2. Tier 0 (boot): Minimal curated dataset (~50 MB) for offline demo — 100 Kepler targets with known confirmed planets, embedded in VEC_SEG
  3. Tier 1 (first run): Download 1,000 Kepler targets on first pipeline activation. Background download, progress reported via CLI/HTTP
  4. Tier 2 (expansion): Full Kepler/TESS catalog download on explicit rvf ingest --expand command
  5. Tier 3 (spectra): JWST and archive spectra downloaded when life candidate pipeline is first activated
  6. Seal-on-complete: After download, data is ingested into VEC_SEG and INDEX_SEG, a new witness root is committed, and the RVF is sealed into a reproducible snapshot
Download state machine:

  [boot] ──first-inference──> [downloading-tier-1]
           │                        │
           │ (offline demo works)   │ (progress: 0-100%)
           │                        │
           ▼                        ▼
  [tier-0-only]              [tier-1-ready]
                                    │
                         rvf ingest --expand
                                    │
                                    ▼
                             [tier-2-ready]
                                    │
                         life pipeline activated
                                    │
                                    ▼
                             [tier-3-ready] ──seal──> [sealed-snapshot]

Each tier download:

  • Resumes from last byte on interruption (HTTP Range headers)
  • Validates SHA-256 after download
  • Commits a witness record for the download event
  • Can be skipped with --offline flag (uses whatever is already present)

RVF Artifact Layout

Extends the ADR-003 segment model with domain-specific segments.

# Segment Contents
1 MANIFEST_SEG Segment table, hashes, policy, budgets, version gates, download manifests
2 KERNEL_SEG Minimal Linux kernel image for portable boot (reuse ADR-008)
3 INITRD_SEG Minimal userspace: busybox, RuVector binaries, data ingest tools, query server
4 EBPF_SEG Socket allow-list and syscall reduction. Default: local loopback + explicit download ports only
5 VEC_SEG Embedding vectors: light-curve windows, spectrum windows, graph node descriptors, partition boundary descriptors
6 INDEX_SEG HNSW unified attention index for vectors and boundary descriptors
7 GRAPH_SEG Dynamic interaction graph: nodes, edges, timestamps, authority, provenance
8 DELTA_SEG Append-only change log of graph updates and field updates
9 WITNESS_SEG Deterministic witness chain: canonical serialization, signed root hash progression
10 POLICY_SEG Data provenance requirements, candidate publishing thresholds, deny rules, confidence floors
11 DASHBOARD_SEG Vite-bundled Three.js visualization app — static assets served by runtime HTTP server

Data Model

Core Entities

interface Event {
  id: string;
  t_start: number;          // epoch seconds
  t_end: number;
  domain: 'kepler' | 'tess' | 'jwst' | 'sdss' | 'derived';
  payload_hash: string;      // SHA-256 of raw data window
  provenance: Provenance;
}

interface Observation {
  id: string;
  instrument: string;        // 'kepler-lc' | 'tess-ffi' | 'jwst-nirspec' | ...
  target_id: string;         // e.g., KIC or TIC identifier
  data_pointer: string;      // segment offset into VEC_SEG
  calibration_version: string;
  provenance: Provenance;
}

interface InteractionEdge {
  src_event_id: string;
  dst_event_id: string;
  type: 'causal' | 'periodicity' | 'shape_similarity' | 'co_occurrence' | 'spatial';
  weight: number;
  lag: number;               // temporal lag in seconds
  confidence: number;
  provenance: Provenance;
}

interface Boundary {
  boundary_id: string;
  partition_left_set_hash: string;
  partition_right_set_hash: string;
  cut_weight: number;
  cut_witness: string;       // witness chain reference
  stability_score: number;
}

interface Candidate {
  candidate_id: string;
  category: 'planet' | 'life';
  evidence_pointers: string[];   // event and edge IDs
  score: number;
  uncertainty: number;
  publishable: boolean;          // based on POLICY_SEG rules
  witness_trace: string;         // WITNESS_SEG reference for replay
}

interface Provenance {
  source: string;            // 'mast-kepler' | 'mast-tess' | 'mast-jwst' | ...
  download_witness: string;  // witness chain entry for the download
  transform_chain: string[]; // ordered list of transform IDs applied
  timestamp: string;         // ISO-8601
}

Domain Adapters

Planet Transit Adapter

Input:  flux time series + cadence metadata (Kepler/TESS FITS)
Output: Event nodes for windows
        InteractionEdges for periodicity hints and shape similarity
        Candidate nodes for dip detections

Spectrum Adapter

Input:  wavelength, flux, error arrays (JWST NIRSpec, etc.)
Output: Event nodes for band windows
        InteractionEdges for molecule feature co-occurrence
        Disequilibrium score components

Cosmic Web Adapter (optional, Phase 2+)

Input:  galaxy positions and redshifts (SDSS)
Output: Graph of spatial adjacency and filament membership

The Four System Constructs

1. Compressed Causal Atlas

Definition: A partial order of events plus minimal sufficient descriptors to reproduce derived edges.

Construction:

  1. Windowing — Light curves into overlapping windows at multiple scales

    • Scales: 2 hours, 12 hours, 3 days, 27 days
  2. Feature extraction — Robust features per window

    • Flux derivative statistics
    • Autocorrelation peaks
    • Wavelet energy bands
    • Transit-shaped matched filter response
  3. Embedding — RuVector SIMD embed per window, stored in VEC_SEG

  4. Causal edges — Add edge when window A precedes window B and improves predictability of B (conditional mutual information proxy or prediction gain, subject to POLICY_SEG constraints)

    • Edge weight: prediction gain magnitude
    • Provenance: exact windows, transform IDs, threshold used
  5. Atlas compression

    • Keep only top-k causal parents per node
    • Retain stable boundary witnesses
    • Delta-encode updates into DELTA_SEG

Output API:

Endpoint Returns
atlas.query(event_id) Parents, children, plus provenance
atlas.trace(candidate_id) Minimal causal chain for a candidate

2. Coherence Field Index

Definition: A field over the atlas graph that assigns coherence pressure and cut stability over time.

Signals:

Signal Description
Cut pressure Minimum cut values over selected subgraphs
Partition entropy Distribution of cluster sizes and churn rate
Disagreement Cross-detector disagreement rate
Drift Embedding distribution shift in sliding window

Algorithm:

  1. Maintain a partition tree. Update with dynamic min-cut on incremental graph changes
  2. For each update epoch:
    • Compute cut witnesses for top boundaries
    • Emit boundary events into GRAPH_SEG
    • Append witness record into WITNESS_SEG
  3. Index boundaries via descriptor vector:
    • Cut value, partition sizes, local graph curvature proxy, recent churn

Query API:

Endpoint Returns
coherence.get(target_id, epoch) Field values for target at epoch
boundary.nearest(descriptor) Similar historical boundary states via INDEX_SEG

3. Multi-Scale Interaction Memory

Definition: A memory that retains interactions at multiple time resolutions with strict budget control.

Three tiers:

Tier Resolution Content
S Seconds to minutes High-fidelity deltas
M Hours to days Aggregated deltas
L Weeks to months Boundary summaries and archetypes

Retention rules:

  1. Preserve events that are boundary-critical
  2. Preserve events that are candidate evidence
  3. Compress everything else via archetype clustering in INDEX_SEG

Mechanism:

  • DELTA_SEG is append-only
  • Periodic compaction produces a new RVF root with a witness proof of preservation rules applied

4. Boundary Evolution Tracker

Definition: A tracker that treats boundaries as primary objects that evolve over time.

This is where the holographic flavor is implemented. You primarily store and index boundaries, and treat interior state as reconstructable from boundary witnesses and retained archetypes.

Output API:

Endpoint Returns
boundary.timeline(target_id) Boundary evolution over time
boundary.alerts Alerts when: cut pressure spikes, boundary identity flips, disagreement exceeds threshold, drift persists beyond policy

Planet Detection Pipeline

Stage P0: Ingest

Input: Kepler or TESS light curves from MAST (progressively downloaded)

  1. Normalize flux
  2. Remove obvious systematics (detrending)
  3. Segment into windows and store as Event nodes

Stage P1: Candidate Generation

  1. Matched filter bank for transit-like dips
  2. Period search on candidate dip times (BLS or similar)
  3. Create Candidate node per period hypothesis

Stage P2: Coherence Gating

Candidate must pass all gates:

Gate Requirement
Multi-scale stability Stable across multiple window scales
Boundary consistency Consistent boundary signature around transit times
Low drift Drift below threshold across adjacent windows

Score components:

Component Description
SNR-like strength Signal-to-noise of transit dip
Shape consistency Cross-transit shape agreement
Period stability Variance of period estimates
Coherence stability Coherence field stability around candidate

Emit: Candidate with evidence pointers + witness trace listing exact windows, transforms, and thresholds used.

Life Candidate Pipeline

Life detection here means pre-screening for non-equilibrium atmospheric chemistry signatures, not proof.

Stage L0: Ingest

Input: Published or mission spectra tied to targets via MAST and NASA Exoplanet Archive (progressively downloaded on first pipeline activation)

  1. Normalize and denoise within instrument error model
  2. Window spectra by wavelength bands
  3. Create band Event nodes

Stage L1: Feature Extraction

  1. Identify absorption features and confidence bands
  2. Encode presence vectors for key molecule families (H2O, CO2, CH4, O3, NH3, etc.)
  3. Build InteractionEdges between features that co-occur in physically meaningful patterns

Stage L2: Disequilibrium Scoring

Core concept: Life-like systems maintain chemical ratios that resist thermodynamic relaxation.

Implementation as graph scoring:

  1. Build a reaction plausibility graph (prior rule set in POLICY_SEG)
  2. Compute inconsistency score between observed co-occurrences and expected equilibrium patterns
  3. Track stability of that score across epochs and observation sets

Score components:

Component Description
Persistent multi-molecule imbalance Proxy for non-equilibrium chemistry
Feature repeatability Agreement across instruments or visits
Contamination risk penalty Instrument artifact and stellar contamination
Stellar activity confound penalty Host star variability coupling

Output: Life candidate list with explicit uncertainty + required follow-up observations list generated by POLICY_SEG rules.

Runtime and Portability

Boot Sequence

  1. RVF boots minimal Linux from KERNEL_SEG and INITRD_SEG (reuse ADR-008 KernelBuilder)
  2. Starts rvf-runtime daemon exposing local HTTP and CLI
  3. On first inference/query, progressively downloads required data tier

Local Interfaces

CLI:

rvf run artifact.rvf                    # boot the runtime
rvf query planet list                   # ranked planet candidates
rvf query life list                     # ranked life candidates
rvf trace <candidate_id>               # full witness trace for any candidate
rvf ingest --expand                     # download tier-2 full catalog
rvf status                              # download progress, segment sizes, witness count

HTTP:

GET /                                   # Three.js dashboard (served from DASHBOARD_SEG)
GET /assets/*                           # Dashboard static assets

GET /api/atlas/query?event_id=...       # causal parents/children
GET /api/atlas/trace?candidate_id=...   # minimal causal chain
GET /api/coherence?target_id=...&epoch= # field values
GET /api/boundary/timeline?target_id=...
GET /api/boundary/alerts
GET /api/candidates/planet              # ranked planet list
GET /api/candidates/life                # ranked life list
GET /api/candidates/:id/trace           # witness trace
GET /api/status                         # system health + download progress
GET /api/memory/tiers                   # tier S/M/L utilization

WS  /ws/live                            # real-time boundary alerts, pipeline progress, candidate updates

Determinism

  1. Fixed seeds for all stochastic operations
  2. Canonical serialization of every intermediate artifact
  3. Witness chain commits after each epoch
  4. Two-machine reproducibility: identical RVF root hash for identical input

Security Defaults

  1. Network off by default
  2. If enabled, eBPF allow-list: MAST/archive download ports + local loopback only
  3. No remote writes without explicit policy toggle in POLICY_SEG
  4. Downloaded data verified against MANIFEST_SEG hashes before ingestion

Three.js Visualization Dashboard

The RVF embeds a Vite-bundled Three.js dashboard in DASHBOARD_SEG. The runtime HTTP server serves it at / (root). All visualizations are driven by the same API endpoints the CLI uses, so every rendered frame corresponds to queryable, witness-backed data.

Architecture

DASHBOARD_SEG (inside RVF)
  dist/
    index.html            # Vite SPA entry
    assets/
      main.[hash].js      # Three.js + D3 + app logic (tree-shaken)
      main.[hash].css     # Tailwind/minimal styles
      worker.js           # Web Worker for graph layout

Runtime serves:
  GET /                   -> DASHBOARD_SEG/dist/index.html
  GET /assets/*           -> DASHBOARD_SEG/dist/assets/*
  GET /api/*              -> JSON API (atlas, coherence, candidates, etc.)
  WS  /ws/live            -> Live streaming of boundary alerts and pipeline progress

Build pipeline: Vite builds the dashboard at package time into a single tree-shaken bundle. The bundle is embedded into DASHBOARD_SEG during RVF assembly. No Node.js required at runtime — the dashboard is pure static assets served by the existing HTTP server.

Dashboard Views

V1: Causal Atlas Explorer (Three.js 3D)

Interactive 3D force-directed graph of the causal atlas.

Feature Implementation
Node rendering THREE.InstancedMesh for events — color by domain (Kepler=blue, TESS=cyan, JWST=gold, derived=white)
Edge rendering THREE.LineSegments with opacity mapped to edge weight
Causal flow Animated particles along causal edges showing temporal direction
Scale selector Toggle between window scales (2h, 12h, 3d, 27d) — re-layouts graph
Candidate highlight Click candidate in sidebar to trace its causal chain in 3D, dimming unrelated nodes
Witness replay Step through witness chain entries, animating graph state forward/backward
LOD Level-of-detail: far=boundary nodes only, mid=top-k events, close=full subgraph

Data source: GET /api/atlas/query, GET /api/atlas/trace

V2: Coherence Field Heatmap (Three.js + shader)

Real-time coherence field rendered as a colored surface over the atlas graph.

Feature Implementation
Field surface THREE.PlaneGeometry subdivided grid, vertex colors from coherence values
Cut pressure Red hotspots where cut pressure is high, cool blue where stable
Partition boundaries Glowing wireframe lines at partition cuts
Time scrubber Scrub through epochs to see coherence evolution
Drift overlay Toggle to show embedding drift as animated vector arrows
Alert markers Pulsing icons at boundary alert locations

Data source: GET /api/coherence, GET /api/boundary/timeline, WS /ws/live

V3: Planet Candidate Dashboard (2D panels + 3D orbit)

Split view combining data panels with 3D orbital visualization.

Panel Content
Ranked list Sortable table: candidate ID, score, uncertainty, period, SNR, publishable status
Light curve viewer Interactive D3 chart: raw flux, detrended flux, transit model overlay, per-window score
Phase-folded plot All transits folded at detected period, with confidence band
3D orbit preview THREE.Line showing inferred orbital path around host star, sized by uncertainty
Evidence trace Expandable tree showing witness chain from raw data to final score
Score breakdown Radar chart: SNR, shape consistency, period stability, coherence stability

Data source: GET /api/candidates/planet, GET /api/candidates/:id/trace

V4: Life Candidate Dashboard (2D panels + 3D molecule)

Split view for spectral disequilibrium analysis.

Panel Content
Ranked list Sortable table: candidate ID, disequilibrium score, uncertainty, molecule flags, publishable
Spectrum viewer Interactive D3 chart: wavelength vs flux, molecule absorption bands highlighted
Molecule presence matrix Heatmap of detected molecule families vs confidence
3D molecule overlay THREE.Sprite labels at absorption wavelengths in a 3D wavelength space
Reaction graph Force-directed graph of molecule co-occurrences vs equilibrium expectations
Confound panel Bar chart: stellar activity penalty, contamination risk, repeatability score

Data source: GET /api/candidates/life, GET /api/candidates/:id/trace

V5: System Status Dashboard

Operational health and download progress.

Panel Content
Download progress Per-tier progress bars with byte counts and ETA
Segment sizes Stacked bar chart of RVF segment utilization
Memory tiers S/M/L tier fill levels and compaction history
Witness chain Scrolling log of recent witness entries with hash preview
Pipeline status P0/P1/P2 and L0/L1/L2 stage indicators with event counts
Performance Query latency histogram, events/second throughput

Data source: GET /api/status, GET /api/memory/tiers, WS /ws/live

WebSocket Live Stream

// WS /ws/live — server pushes events as they happen
interface LiveEvent {
  type: 'boundary_alert' | 'candidate_new' | 'candidate_update' |
        'download_progress' | 'witness_commit' | 'pipeline_stage' |
        'coherence_update';
  timestamp: string;
  data: Record<string, unknown>;
}

The dashboard subscribes on connect and updates all views in real-time as pipelines process data and boundaries evolve.

Vite Build Configuration

// vite.config.ts for dashboard build
import { defineConfig } from 'vite';

export default defineConfig({
  build: {
    outDir: 'dist/dashboard',
    assetsDir: 'assets',
    rollupOptions: {
      output: {
        manualChunks: {
          three: ['three'],         // ~150 KB gzipped
          d3: ['d3-scale', 'd3-axis', 'd3-shape', 'd3-selection'],
        },
      },
    },
  },
});

Bundle budget: < 500 KB gzipped total (Three.js ~150 KB, D3 subset ~30 KB, app logic ~50 KB, styles ~10 KB). The dashboard adds minimal overhead to the RVF artifact.

Design Decision: D5 — Dashboard Embedded in RVF

The Three.js dashboard is bundled at build time and embedded in DASHBOARD_SEG rather than served from an external CDN or requiring a separate install. This ensures:

  1. Fully offline: Works without network after boot
  2. Version-locked: Dashboard always matches the API version it queries
  3. Single artifact: One RVF file = runtime + data + visualization
  4. Witness-aligned: Dashboard renders exactly the data the witness chain can verify

Package Structure

packages/agentdb-causal-atlas/
  src/
    index.ts                    # createCausalAtlasServer() factory
    CausalAtlasServer.ts        # HTTP + CLI runtime + dashboard serving + WS
    CausalAtlasEngine.ts        # Core atlas, coherence, memory, boundary
    adapters/
      PlanetTransitAdapter.ts   # Kepler/TESS light curve ingestion
      SpectrumAdapter.ts        # JWST/archive spectral ingestion
      CosmicWebAdapter.ts       # SDSS spatial graph (Phase 2)
    pipelines/
      PlanetDetection.ts        # P0-P2 planet detection pipeline
      LifeCandidate.ts          # L0-L2 life candidate pipeline
    constructs/
      CausalAtlas.ts            # Compressed causal partial order
      CoherenceField.ts         # Partition tree + cut pressure
      MultiScaleMemory.ts       # Tiered S/M/L retention
      BoundaryTracker.ts        # Boundary evolution + alerts
    download/
      ProgressiveDownloader.ts  # Tiered lazy download with resume
      DataManifest.ts           # URL + hash + size manifests
    KernelBuilder.ts            # Reuse/extend from ADR-008
  dashboard/                    # Vite + Three.js visualization app
    vite.config.ts              # Build config — outputs to dist/dashboard/
    index.html                  # SPA entry point
    src/
      main.ts                   # App bootstrap, router, WS connection
      api.ts                    # Typed fetch wrappers for /api/* endpoints
      ws.ts                     # WebSocket client for /ws/live
      views/
        AtlasExplorer.ts        # V1: 3D causal atlas (Three.js force graph)
        CoherenceHeatmap.ts     # V2: Coherence field surface + cut pressure
        PlanetDashboard.ts      # V3: Planet candidates + light curves + 3D orbit
        LifeDashboard.ts        # V4: Life candidates + spectra + molecule graph
        StatusDashboard.ts      # V5: System health, downloads, witness log
      three/
        AtlasGraph.ts           # InstancedMesh nodes, LineSegments edges, particles
        CoherenceSurface.ts     # PlaneGeometry with vertex-colored field
        OrbitPreview.ts         # Orbital path visualization
        CausalFlow.ts           # Animated particles along causal edges
        LODController.ts        # Level-of-detail: boundary → top-k → full
      charts/
        LightCurveChart.ts      # D3 flux time series with transit overlay
        SpectrumChart.ts        # D3 wavelength vs flux with molecule bands
        RadarChart.ts           # Score breakdown radar
        MoleculeMatrix.ts       # Heatmap of molecule presence vs confidence
      components/
        Sidebar.ts              # Candidate list, filters, search
        TimeScrubber.ts         # Epoch scrubber for coherence replay
        WitnessLog.ts           # Scrolling witness chain entries
        DownloadProgress.ts     # Tier progress bars
      styles/
        main.css                # Minimal Tailwind or hand-rolled styles
  tests/
    causal-atlas.test.ts
    planet-detection.test.ts
    life-candidate.test.ts
    progressive-download.test.ts
    coherence-field.test.ts
    boundary-tracker.test.ts
    dashboard.test.ts           # Dashboard build + API integration tests

Implementation Phases

Phase 1: Core Atlas + Planet Detection + Dashboard Shell (v0.1)

Scope: Kepler and TESS only. No spectra. No life scoring.

  1. Implement ProgressiveDownloader with tier-0 curated dataset (100 Kepler targets)
  2. Implement PlanetTransitAdapter for FITS light curve ingestion
  3. Implement CausalAtlas with windowing, feature extraction, SIMD embedding
  4. Implement PlanetDetection pipeline (P0-P2)
  5. Implement WITNESS_SEG with SHAKE-256 chain
  6. CLI: rvf run, rvf query planet list, rvf trace
  7. HTTP: /api/candidates/planet, /api/atlas/trace
  8. Dashboard: Vite scaffold, V1 Atlas Explorer (Three.js 3D graph), V3 Planet Dashboard (ranked list + light curve chart), V5 Status Dashboard (download progress + witness log). Embedded in DASHBOARD_SEG, served at /
  9. WebSocket /ws/live for real-time pipeline progress

Acceptance: 1,000 Kepler targets, top-100 ranked list includes >= 80 confirmed planets, every item replays to same score and witness root on two machines. Dashboard renders atlas graph and candidate list in browser.

Phase 2: Coherence Field + Boundary Tracker + Dashboard V2 (v0.2)

  1. Implement CoherenceField with dynamic min-cut, partition entropy
  2. Implement BoundaryTracker with timeline and alerts
  3. Implement MultiScaleMemory with S/M/L tiers and budget control
  4. Add coherence gating to planet pipeline
  5. HTTP: /api/coherence, /api/boundary/*, /api/memory/tiers
  6. Dashboard: V2 Coherence Heatmap (Three.js field surface + cut pressure overlay + time scrubber), boundary alert markers via WebSocket

Phase 3: Life Candidate Pipeline + Dashboard V4 (v0.3)

  1. Implement SpectrumAdapter for JWST/archive spectral data
  2. Implement LifeCandidate pipeline (L0-L2)
  3. Implement disequilibrium scoring with reaction plausibility graph
  4. Tier-3 progressive download for spectral data
  5. CLI: rvf query life list
  6. HTTP: /api/candidates/life
  7. Dashboard: V4 Life Dashboard (spectrum viewer + molecule presence matrix
    • reaction graph + confound panel)

Acceptance: Published spectra with known atmospheric detections vs nulls, AUC > 0.8, every score includes confound penalties and provenance trace. Dashboard renders spectrum analysis in browser.

Phase 4: Cosmic Web + Full Integration (v0.4)

  1. CosmicWebAdapter for SDSS spatial graph
  2. Cross-domain coherence (planet candidates enriched by large-scale context)
  3. Dashboard: 3D cosmic web view, cross-domain candidate linking
  4. Full offline demo with sealed RVF snapshot
  5. rvf ingest --expand for tier-2 bulk download
  6. Dashboard polish: LOD optimization, mobile-responsive layout, dark/light theme

Evaluation Plan

Planet Detection Acceptance Test

Metric Requirement
Recall@100 >= 80 confirmed planets in top 100
False positives@100 Documented with witness traces
Median time per star Measured and reported
Reproducibility Identical root hash on two machines

Life Candidate Acceptance Test

Metric Requirement
AUC (detected vs null) > 0.8
Confound penalties Present on every score
Provenance trace Complete for every score

System Acceptance Test

Test Requirement
Boot reproducibility Identical root hash across two machines
Query determinism Identical results for same dataset snapshot
Witness verification verifyWitness passes for all chains
Progressive download Resumes correctly after interruption

Failure Modes and Fix Path

Failure Fix
Noise dominates coherence field Strengthen policy priors, add confound penalties, enforce multi-epoch stability
Over-compression kills rare signals Boundary-critical retention rules + candidate evidence pinning
Spurious life signals from stellar activity Model stellar variability as its own interaction graph, penalize coupling
Compute blow-up Strict budgets in POLICY_SEG, tiered memory, boundary-first indexing
Download interruption HTTP Range resume, partial-ingest checkpoint, witness for partial state

Design Decisions

D1: Kepler/TESS only in v1, spectra in v3

Phase 1 delivers a concrete, testable planet-detection system. Life scoring requires additional instrument-specific adapters and more nuanced policy rules. Separating them de-risks the schedule.

D2: Progressive download with embedded demo subset

The RVF artifact ships with a curated ~50 MB tier-0 dataset for fully offline demonstration. Full catalog data is downloaded lazily, following the pattern proven in ADR-008 for GGUF model files. This keeps the initial artifact small (< 100 MB without kernel) while supporting the full 1,000+ target benchmark.

D3: Boundary-first storage (holographic principle)

Boundaries are stored as first-class indexed objects. Interior state is reconstructed on-demand from boundary witnesses and retained archetypes. This reduces storage by 10-50x for large graphs while preserving queryability and reproducibility.

D4: Witness chain for every derived claim

Every candidate, every coherence measurement, and every boundary change is committed to the SHAKE-256 witness chain. This enables two-machine reproducibility verification and provides a complete audit trail from raw data to final score.

References

  1. MAST — Kepler
  2. MAST — TESS
  3. MAST Home
  4. NASA Exoplanet Archive
  5. SDSS DR17
  6. ADR-003: RVF Native Format Integration
  7. ADR-006: Unified Self-Learning RVF Integration
  8. ADR-007: RuVector Full Capability Integration
  9. ADR-008: Chat UI RVF Kernel Embedding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment