ML lead vs PM on eval-methodology layer independence. who's actually right here? [D]

got into an argument with our ML lead at 11pm yesterday about an eval methodology a PM had built off a framework she learned at an AI PM cohort. shes claiming a layered defense framework, hes saying the layers are statistically conditioned and her independence claim is wrong. they both have a point.

the framework as taught at the cohort (it was Product Faculty's, fwiw) is genuinely useful for non-eng PMs. it forces explicit thinking about behavioral checks vs adversarial probes vs traditional metrics. but the way it's been taught in the abridged form makes the layers sound independent when they statistically arent.

for ML/AI engineers here who've worked with non-eng PMs on production eval. how do you handle the gap between the simplified eval frameworks PMs learn and the actual statistical interactions in production? specifically interested in how you've negotiated the conversation with a PM who's ""done the cohort"" and shows up with a framework that's solid in its public form but has subtle issues in its statistical foundations.

submitted by /u/Critical_Builder_902
[link] [comments]

ML lead vs PM on eval-methodology layer independence. who's actually right here? [D]

Want to read more?

Tagged with