5:["$","div",null,{"className":"min-h-screen bg-background","children":["$","div",null,{"className":"container mx-auto px-4 py-12 max-w-3xl","children":["$","article",null,{"children":[["$","header",null,{"className":"mb-10","children":[["$","div",null,{"className":"flex items-center gap-2 mb-6","children":["$","$L13",null,{"href":"/","className":"text-sm font-medium text-muted-foreground hover:text-foreground transition-colors inline-flex items-center gap-2","children":[["$","span",null,{"children":"←"}],["$","span",null,{"children":"Back to all posts"}]]}]}],["$","div",null,{"className":"flex flex-wrap items-center gap-3 text-sm font-medium text-muted-foreground mb-5","children":[["$","time",null,{"dateTime":"2026-05-30T17:07:42.318Z","className":"font-semibold","children":"May 30, 2026"}],["$","span",null,{"children":"•"}],["$","span",null,{"children":[1," min read"]}],["$","span",null,{"children":"•"}],["$","span",null,{"className":"italic","children":["from ","Machine Learning"]}],null]}],["$","h1",null,{"className":"text-4xl md:text-5xl font-bold mb-6 leading-tight","children":"How to fine-tune an LLM for open-ended problems? [P]"}],null]}],null,null,["$","div",null,{"className":"mb-10","children":["$","div",null,{"className":"prose prose-lg lg:prose-xl max-w-none prose-headings:font-bold prose-p:leading-relaxed prose-a:text-primary prose-a:no-underline hover:prose-a:underline prose-img:rounded-lg prose-blockquote:border-l-4 prose-blockquote:border-primary/30 prose-blockquote:italic","dangerouslySetInnerHTML":{"__html":"

I want to develop an LLM that can solve open-ended math problems (such as proof-only problems). This means that RLVR where we use the final answer alone as reward signal is not enough. Since SFT is useless here and GRPO/PPO methods will not have an appropriate reward function, what kind of fine-tuning can I do? For data, I will use the MathNet dataset.

submitted by /u/TechNerd10191
[link] [comments]"}}]}],["$","div",null,{"className":"my-10 p-6 bg-muted/50 rounded-xl border-2 border-dashed","children":["$","div",null,{"className":"flex items-center justify-between flex-wrap gap-4","children":[["$","div",null,{"children":[["$","h3",null,{"className":"font-semibold mb-1","children":"Want to read more?"}],["$","p",null,{"className":"text-sm text-muted-foreground","children":"Check out the full article on the original site"}]]}],["$","a",null,{"href":"https://www.reddit.com/r/MachineLearning/comments/1ts1sl5/how_to_finetune_an_llm_for_openended_problems_p","target":"_blank","rel":"noopener noreferrer","className":"inline-flex items-center gap-2 px-6 py-3 bg-primary text-primary-foreground font-semibold rounded-lg hover:bg-primary/90 transition-colors","children":["View original article",["$","span",null,{"children":"→"}]]}]]}]}],false]}]}]}]