•1 min read•from Machine Learning
How do you experiment with a (very) large model architecture? [D]
Im trying to reproduce a paper (a very particular kind of diffusion model), and their training regime is incredibly compute heavy.
In general, how are quick experiments performed to validate hypotheses when the models are large and compute is expensive?
Some cursory browsing yields the following: 1) Using only 5-10% of the entire dataset. 2) Drastically reducing the batch size and compensating for it in the learning rate 3) Reducing the number of epochs/iterations.
But I've had to infer these from resources online and what LLMs tell me. Is there anything in addition to/beyond/contradicting these?
[link] [comments]
Want to read more?
Check out the full article on the original site
Tagged with
#large dataset processing
#rows.com
#natural language processing for spreadsheets
#machine learning in spreadsheet applications
#generative AI for data analysis
#Excel alternatives for data analysis
#financial modeling with spreadsheets
#model architecture
#diffusion model
#dataset
#training regime
#compute heavy
#large models
#quick experiments
#validate hypotheses
#batch size
#learning rate
#compute expensive
#epochs
#iterations