•2 min read•from Machine Learning
[P] QLoRA Fine-Tuning of Qwen2.5-1.5B for CEFR English Proficiency Classification (A1–C2) [P]
I fine-tuned Qwen2.5-1.5B for multi-class CEFR English proficiency classification using QLoRA (4-bit NF4).
The goal was to classify English text into one of the 6 CEFR levels (A1 → C2), which can be useful for:
- adaptive language learning systems,
- placement testing,
- readability estimation,
- educational NLP applications.
Dataset
The dataset contains 1,785 English texts balanced across:
- 6 CEFR levels,
- 10 domains/topics.
The samples were synthetically generated using:
- Groq API
- Llama-3.3-70B
Generation constraints were designed to preserve:
- vocabulary complexity,
- grammatical progression,
- sentence structure variation,
- CEFR-specific linguistic patterns.
Training Setup
Base model:
- Qwen2.5-1.5B
Fine-tuning method:
- QLoRA
- 4-bit NF4 quantization
- LoRA adapters
Only ~0.28% of model parameters were trained.
Results
Held-out test set:
- 179 samples
Metrics:
- Accuracy: 84.9%
- Macro F1: 84.9%
Per-level recall:
| Level | Recall |
|---|---|
| A1 | 96.6% |
| A2 | 90.0% |
| B1 | 90.0% |
| B2 | 86.7% |
| C1 | 86.7% |
| C2 | 60.0% |
Most errors come from C1/C2 confusion, which is expected due to the subtle linguistic boundary between those levels.
Deployment
I also built:
- a FastAPI inference API,
- Docker deployment setup.
Example Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch model = AutoModelForSequenceClassification.from_pretrained( "yanou16/cefr-english-classifier" ) tokenizer = AutoTokenizer.from_pretrained( "yanou16/cefr-english-classifier" ) text = "Artificial intelligence is transforming many industries." inputs = tokenizer(text, return_tensors="pt") with torch.no_grad(): outputs = model(**inputs) pred = outputs.logits.argmax(dim=-1).item() print(pred) Feedback is welcome, especially regarding:
- evaluation methodology,
- synthetic data quality,
- improving C2 classification performance,
- better benchmarking approaches.
[link] [comments]
Want to read more?
Check out the full article on the original site
Tagged with
#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#machine learning in spreadsheet applications
#enterprise-level spreadsheet solutions
#large dataset processing
#big data performance
#spreadsheet API integration
#rows.com
#AI formula generation techniques
#big data management in spreadsheets
#conversational data analysis
#business intelligence tools
#cloud-based spreadsheet applications
#real-time data collaboration
#financial modeling with spreadsheets
#intelligent data visualization
#no-code spreadsheet solutions
#natural language processing
#data visualization tools