2 min readfrom Machine Learning

Anthropic's new model Fable will silently handicap work on LLMs [D]

Seems like they have engineered some specific limitations that are widely cited as follows:

In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations https://news.ycombinator.com/item?id=48464732

Other comments note how even using the word 'nuclear' in the context of scientific research elicits refusal behavior by the model: https://news.ycombinator.com/item?id=48473302

This makes it seem quite plausible that the model could subtly sabotage any machine learning work (even as false positive). Some suggest this has been happening behind the scenes for a while already, but can anyone confirm that?

submitted by /u/AccomplishedCat4770
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#rows.com
#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#machine learning in spreadsheet applications
#self-service analytics tools
#self-service analytics
#LLMs
#Claude
#Fable
#Anthropic
#Prompt Modification
#Steering Vectors
#Parameter-Efficient Fine-Tuning (PEFT)
#Frontier LLM Development
#Pretraining Pipelines
#Distributed Training
#ML Accelerator Design
#Safeguards
#Refusal Behavior