April 5, 2026•1 min read•from Data Science

Best technique for training models on a sample of data?

Due to memory limits on my work computer I'm unable to train machine learning models on our entire analysis dataset. Given my data is highly imbalanced I'm under-sampling from the majority class of the binary outcome.

What is the proper method to train ML models on sampled data with cross-validation and holdout data?

After training on my under-sampled data should I do a final test on a portion of "unsampled data" to choose the best ML model?

submitted by /u/RobertWF_47
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article→

Tagged with

#generative AI for data analysis

#Excel alternatives for data analysis

#conversational data analysis

#data analysis tools

#big data management in spreadsheets

#real-time data collaboration

#intelligent data visualization

#data visualization tools

#enterprise data management

#big data performance

#data cleaning solutions

#machine learning in spreadsheet applications

#rows.com

#large dataset processing

#financial modeling with spreadsheets

#natural language processing for spreadsheets