1 min readfrom Machine Learning

UK GDPR Small Business Q&A — 5,000 synthetic pairs with article-level citations [D]

Dataset for fine-tuning compliance assistants. Each pair includes:
- A practical SME-facing question ("Can I use pre-ticked consent boxes?")
- An answer with specific UK GDPR article references, ICO guidance by name, and actionable steps
- Source metadata: which GDPR concepts were used, which generation strategy, timestamp

Generation method: questions via local Qwen 14B from a curated term bank, answers via DeepSeek API for factual reliability. JSON + Parquet, MIT license for the 1K sample.

This is a niche dataset — it's not a benchmark contender, it's for people building privacy tools for UK businesses. If you're doing legal NLP or compliance RAG, might be useful.

Free sample: https://huggingface.co/datasets/Draeg82/uk-gdpr-small-business-qa

submitted by /u/a_serial_hobbyist_
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#natural language processing for spreadsheets
#generative AI for data analysis
#Excel alternatives for data analysis
#business intelligence tools
#AI formula generation techniques
#large dataset processing
#rows.com
#financial modeling with spreadsheets
#self-service analytics tools
#collaborative spreadsheet tools
#data visualization tools
#data analysis tools
#spreadsheet API integration
#enterprise-level spreadsheet solutions
#UK GDPR
#compliance assistants
#SME-facing question
#ICO guidance
#pre-ticked consent boxes
#actionable steps