1 min readfrom Machine Learning

Dataset of 150k+ stool images and not sure how to fully use it [D]

I have a dataset of around 150k stool images, and I’m trying to better understand the “right” way to use it for training a computer vision model.

Right now, our process is pretty manual. We initially trained on about 5k images that were individually verified by a human. For every image, we checked/corrected the Bristol type, consistency, color, mucus/blood indicators, etc. Then we trained the model on those verified annotations.

As we continue training, we keep doing the same thing: manually reviewing and correcting images before feeding them back into the model.

My question is basically: does this workflow make sense from an ML perspective? Is this how people normally approach building a solid vision dataset/model, especially in a domain where annotation quality matters a lot? Or is there a smarter/more scalable approach people usually move toward once they have a large dataset?

I’m mainly trying to understand best practices around dataset quality, human verification, iterative training, and scaling annotation without introducing bad labels.

submitted by /u/SamePersonality5183
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#large dataset processing
#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#rows.com
#workflow automation
#stool images
#computer vision model
#dataset quality
#annotation quality
#human verification
#best practices
#manual process
#iterative training
#workflow
#consistency
#verified annotations
#Bristol type
#scalable approach
#large dataset