Dataset of 150k+ stool images and not sure how to fully use it [D]
I have a dataset of around 150k stool images, and I’m trying to better understand the “right” way to use it for training a computer vision model.
Right now, our process is pretty manual. We initially trained on about 5k images that were individually verified by a human. For every image, we checked/corrected the Bristol type, consistency, color, mucus/blood indicators, etc. Then we trained the model on those verified annotations.
As we continue training, we keep doing the same thing: manually reviewing and correcting images before feeding them back into the model.
My question is basically: does this workflow make sense from an ML perspective? Is this how people normally approach building a solid vision dataset/model, especially in a domain where annotation quality matters a lot? Or is there a smarter/more scalable approach people usually move toward once they have a large dataset?
I’m mainly trying to understand best practices around dataset quality, human verification, iterative training, and scaling annotation without introducing bad labels.
[link] [comments]
Want to read more?
Check out the full article on the original site