1 min readfrom Machine Learning

Anomaly Detection Belongs in Your Database — built SIMD-accelerated isolation forests into Stratum's SQL engine [P]

We added native anomaly detection in Stratum, our columnar analytics engine for the JVM. Train and score isolation forest models entirely from SQL — no Python, no export pipeline:

SELECT * FROM transactions WHERE ANOMALY_SCORE('fraud_model') > 0.7; 

6 microseconds per transaction, SIMD-accelerated, runs inside the query engine. The full write-up covers why we built it, how isolation forests work, and benchmarks against PyOD/scikit-learn:

https://datahike.io/notes/anomaly-detection-in-your-database/

Stratum is open source (Apache 2.0): https://github.com/replikativ/stratum

Happy to answer questions about the implementation — the isolation forest is pure Java with Vector API SIMD, scoring is fused into the query execution pipeline so it benefits from zone map pruning and chunked streaming.

submitted by /u/flyingfruits
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#automated anomaly detection
#rows.com
#natural language processing for spreadsheets
#self-service analytics tools
#generative AI for data analysis
#AI-native spreadsheets
#Excel alternatives for data analysis
#financial modeling with spreadsheets
#no-code spreadsheet solutions
#predictive analytics in spreadsheets
#cloud-native spreadsheets
#predictive analytics
#self-service analytics
#spreadsheet API integration
#anomaly detection
#isolation forests
#SQL
#Stratum
#SIMD
#columnar analytics engine