Anomaly Detection Belongs in Your Database — built SIMD-accelerated isolation forests into Stratum's SQL engine [P]

We added native anomaly detection in Stratum, our columnar analytics engine for the JVM. Train and score isolation forest models entirely from SQL — no Python, no export pipeline:

SELECT * FROM transactions WHERE ANOMALY_SCORE('fraud_model') > 0.7;

6 microseconds per transaction, SIMD-accelerated, runs inside the query engine. The full write-up covers why we built it, how isolation forests work, and benchmarks against PyOD/scikit-learn:

https://datahike.io/notes/anomaly-detection-in-your-database/

Stratum is open source (Apache 2.0): https://github.com/replikativ/stratum

Happy to answer questions about the implementation — the isolation forest is pure Java with Vector API SIMD, scoring is fused into the query execution pipeline so it benefits from zone map pruning and chunked streaming.

submitted by /u/flyingfruits
[link] [comments]

Anomaly Detection Belongs in Your Database — built SIMD-accelerated isolation forests into Stratum's SQL engine [P]

Want to read more?

Tagged with