New post at Data Duets: Data cleaning agent

We have released our newest skill for the Augmented Data Science framework: a data cleaning agent.

Why do we need a skill for data cleaning? Because data cleaning isn’t just an execution task, it requires explicit modeling decisions.

In this series, we are studying the best use of AI in data science. Along the way, we develop and test skills with the ultimate goal of combining them using an orchestrator agent. Our goal is not automation, but to define the roles: data scientists set the intent, target, and method, while LLM agents execute.

Check out the post to see how we tested this skill on a large Amazon purchase dataset for customer behavior modeling (and how it successfully avoided data leakage).

Link to the post