Jorge Cambra

Software Engineer

Blog

Evaluating LLM Quantitative Estimation Under Uncertainty

A study on LLM quantitative estimation under uncertainty.

FermiBench: Evaluating LLM Quantitative Reasoning with Distributional Fermi Estimation

A benchmark for how well frontier models decompose quantitative questions into factor-level distributions and compose them into calibrated estimates.

Offloading Sentinel's Analysis Pipeline to Local Inference — Conclusions

What works for what, what to buy, what to build next.

Offloading Sentinel's Analysis Pipeline to Local Inference — Part 3

threat-bench, take two: fine-tuning failures, a bag-of-words model that wasn't supposed to work, and an ensemble gain that mostly didn't survive holdout.

Offloading Sentinel's Analysis Pipeline to Local Inference — Part 2

Ei-bench: benchmarking GPT-5, GPT-5-mini, and open-weight models against clean human labels on existential importance classification.

Offloading Sentinel's Analysis Pipeline to Local Inference — Part 1

Isolating model intelligence: how good are open-weight models at understanding threats?

Offloading Sentinel's Analysis Pipeline to Local Inference — Part 0

Preliminary benchmarks of open-weight models for replacing GPT models in Sentinel's threat detection pipeline.

Potential Projects

A strategic evaluation of where Sentinel's time and resources are best spent

Is It Worth Monitoring Reddit?

A strategic assessment of Reddit as an OSINT source for early-warning intelligence and risk detection.

Some Thoughts on Déjà Vu

The feeling of having already experienced the present moment.

Notes on Running a Medusa Backend on DigitalOcean App Platform

Quick notes on deploying a Medusa v2 e-commerce backend to DigitalOcean App Platform with managed Postgres, Valkey, and Spaces.

How Rare Was Houston’s 27-Miss Three-Point Streak? A Minimal Math Dive

May 28, 2018. Houston Rockets. Game 7. The team missed 27 consecutive three-point shots, a record-setting streak that left fans and analysts in disbelief. But how rare is such a streak?