Model Card · Chainsight

Model Card

chainsight-local-v0Model card per Mitchell et al. (2019). Educational research only — not designed for live trading and not financial advice.

Heads

Head	Spec	Implementation	Target
Directional	§6.1	LightGBM binary, isotonic-OOF calibration	P(label = +1) over triple-barrier; horizons {7, 30}
Volatility	§6.4	Corsi (2009) HAR-RV baseline + LightGBM residual	Forward log-realised vol over horizon
Quantile	§6.3	Three independent LightGBM-quantile regressors (row-wise rearrangement; CQR (Romano 2019) on the acceptance-gate metric only, not yet on served bands)	Forward log-return quantiles {P10, P50, P90}
Regime	§6.2	Gaussian HMM (Viterbi) → LightGBM 4-class classifier	Regime ∈ {markdown, accumulation, markup, distribution}
Meta-blender	§6.5/§6.6	Logistic stacker on per-row OOF base predictions + isotonic	Calibrated blended P(up)
Tail risk	§6.5	Logistic regression on [p_up_dir, regime_probs, vix_z, dxy_z, real_yield_10y_z]	P(>20% drawdown within 30d)

Spec §7.4 acceptance gates · 4 / 6 green (live) · 1 pending

Directional log-loss< 0.69 (raw OOF)· 0.691
Directional ECE< 0.05 (after isotonic)· 0.022
Meta-blender ECE< 0.05 (after isotonic)· 0.008
Quantile 80% coverage[0.75, 0.85] (raw OOF)· 0.81
Volatility QLIKE≥ 0.1 vs HAR-RV (OOF)· 6.5%
Regime cross-entropy< 1.1 (OOF)· 0.712
Tail-risk calibration|pred − realised| < 0.05 (OOF)· —

Model Health

Directional ECE

0.022target < 0.05Calibration of the directional head: the gap between predicted probabilities and reality. Lower = when it says "70% up", BTC actually rose ~70% of the time.

OOF Log-Loss

0.691target < 0.69How well the directional model scores on held-out data. The 0.69 (ln 2) ceiling is the coin-flip baseline; below it means the model beats random.

Feature Drift

3 / 15|z| > 2 features todayCount of inputs that look unusually extreme vs their history. A handful is normal; a third or more is the cue to retrain.

Coverage 80%

0.81target [0.75, 0.85]Share of past days the realised price landed inside the 80% forecast cone. Should sit near 0.80 — far off means the cone is mis-sized.

What it means

Four quick self-checks on the model's quality, measured on held-out (never-trained-on) data. For the calibration, log-loss, and coverage gates, the accent color means the check passed and red means it failed. Feature drift is a monitoring signal, not a §7.4 gate: accent = healthy, amber = drift building (watch), red = time to retrain.

Why it matters

These tell you whether to trust the headline forecasts above. If calibration or log-loss go red, the probabilities are unreliable; if drift climbs, the inputs have shifted away from what the model learned on. Full breakdown lives on the model-card page.

Limitations

Off-chain price discovery (CEXes, derivatives) dominates Bitcoin short-term price formation. Expect a hard ceiling around 55–57% directional accuracy on a 7d horizon — this system pursues calibration first and discrimination second. All seven §7.4 gates are computed nightly from purged-k-fold OOF arrays at training time and persisted to `model_health` (2026-06-29). A gate renders “pending” only when its source OOF series is degenerate — most commonly tail-risk calibration on the synthetic fixture, which lacks >=20% drawdowns.