Blog
Evaluating LLM Quantitative Estimation Under Uncertainty
A study on LLM quantitative estimation under uncertainty.
FermiBench: Evaluating LLM Quantitative Reasoning with Distributional Fermi Estimation
A benchmark for how well frontier models decompose quantitative questions into factor-level distributions and compose them into calibrated estimates.
Offloading Sentinel's Analysis Pipeline to Local Inference — Conclusions
What works for what, what to buy, what to build next.
Offloading Sentinel's Analysis Pipeline to Local Inference — Part 3
threat-bench, take two: fine-tuning failures, a bag-of-words model that wasn't supposed to work, and an ensemble gain that mostly didn't survive holdout.
Offloading Sentinel's Analysis Pipeline to Local Inference — Part 2
Ei-bench: benchmarking GPT-5, GPT-5-mini, and open-weight models against clean human labels on existential importance classification.
Offloading Sentinel's Analysis Pipeline to Local Inference — Part 1
Isolating model intelligence: how good are open-weight models at understanding threats?
Offloading Sentinel's Analysis Pipeline to Local Inference — Part 0
Preliminary benchmarks of open-weight models for replacing GPT models in Sentinel's threat detection pipeline.
Potential Projects
A strategic evaluation of where Sentinel's time and resources are best spent
Is It Worth Monitoring Reddit?
A strategic assessment of Reddit as an OSINT source for early-warning intelligence and risk detection.
Some Thoughts on Déjà Vu
The feeling of having already experienced the present moment.
Notes on Running a Medusa Backend on DigitalOcean App Platform
Quick notes on deploying a Medusa v2 e-commerce backend to DigitalOcean App Platform with managed Postgres, Valkey, and Spaces.
How Rare Was Houston’s 27-Miss Three-Point Streak? A Minimal Math Dive
May 28, 2018. Houston Rockets. Game 7. The team missed 27 consecutive three-point shots, a record-setting streak that left fans and analysts in disbelief. But how rare is such a streak?