How do you experiment with a (very) large model architecture? [D]

Im trying to reproduce a paper (a very particular kind of diffusion model), and their training regime is incredibly compute heavy.

In general, how are quick experiments performed to validate hypotheses when the models are large and compute is expensive?

Some cursory browsing yields the following: 1) Using only 5-10% of the entire dataset. 2) Drastically reducing the batch size and compensating for it in the learning rate 3) Reducing the number of epochs/iterations.

But I've had to infer these from resources online and what LLMs tell me. Is there anything in addition to/beyond/contradicting these?

submitted by /u/Aathishs04
[link] [comments]

How do you experiment with a (very) large model architecture? [D]

Want to read more?

Tagged with