1 min readfrom Machine Learning

Best Text to Text Translation Model? [D]

I'm working on a project that translates any language into English.

So far, I've tried NMT models like NLLB, MADLAD, and SeamlessM4T v2.

The main issue is that they struggle with proper nouns such as:

- names

- places

- dates

- organizations

I also tried LLMs like Gemma 4, Qwen 3 4B, and Aya Tiny Global, but the issue still persists. The LLMs sometimes partially translate or modify entity names as well.

I even tried NER masking / placeholder replacement before translation, but multilingual NER itself becomes a bottleneck. Most NER models only work reliably for a limited set of languages, while my dataset contains 100+ languages, including many low-resource ones.

How do production systems usually handle this problem? Are there better multilingual translation models, multilingual NER approaches, or decoding techniques for preserving entities properly?

Requirements:

- Support for 100+ languages

- Runs locally on an RTX GPU

- Model size under 7B

- English is always the target language.

submitted by /u/Illustrious_Age_2792
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#natural language processing
#rows.com
#AI formula generation techniques
#large dataset processing
#financial modeling with spreadsheets
#Text to Text Translation
#NMT models
#proper nouns
#multilingual NER
#NLLB
#MADLAD
#SeamlessM4T v2
#low-resource languages
#LLMs
#Gemma 4
#Qwen 3 4B
#Aya Tiny Global