Good practices in data scripts
Hey guys! Hope youre having a great weekend. Need some help on advice or tips to build sustainable and scalable code, currently im working as a data analyst and tend to do some projects in the ML side, i use AI to help me handle the coding part while i manage the business side and logic, the way i use Claude or GPT is that i ask for specific snippets that handle what im building in the moment instead of asking for a full script, but tend to notice that AI always return a specifc function that handles multiple transformations and aggregations at once which later makes the whole thing hard to debbug in case anything changes, personally i tend to use only generic functions (like text normalization, handling null values, etc) that can be used across multiple scripts and leave all the transformations, business rules, agreggations like blocks outside functions. I was wondering if there are best practices to follow like a "standard" way to build data pipelines and follow best practices to keep it simple, scalable and debbugable.
Thanks for any advice or book/video recomendation!
[link] [comments]
Want to read more?
Check out the full article on the original site