•1 min read•from Machine Learning
What should a PyTorch training end-of-run performance summary show? [D]
![What should a PyTorch training end-of-run performance summary show? [D]](/_next/image?url=https%3A%2F%2Fpreview.redd.it%2F2q71s9ltkvzg1.png%3Fwidth%3D140%26height%3D123%26auto%3Dwebp%26s%3D77945925d0fb80c62d90ad917fad64c3af772f43&w=3840&q=75)
| For most slow PyTorch runs the first question isn't show me every trace event, it is just: where do I even start? - where did step time go? I haven been thinking about what a compact end-of-run summary would look like: lightweight enough to run on every job, not just dedicated profiling runs. Here's one example of what that output could look like: Curious how others are solving this today. What would make something like this useful? What is missing? [link] [comments] |
Want to read more?
Check out the full article on the original site
Tagged with
#rows.com
#big data performance
#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#real-time data collaboration
#real-time collaboration
#PyTorch
#training
#end-of-run
#performance summary
#input-bound
#compute-bound
#step time
#memory stable
#profiling runs
#wait-heavy
#compact summary
#ranks imbalanced
#slow runs