Pedram Agand
← Projects
Publication

DaTu: AI-Assisted Exploratory Data Analysis for Data Science Education

Software Impacts, Elsevier2024
P. Agand, et al. (2024). DaTu: AI-Assisted Exploratory Data Analysis for Data Science Education.” Software Impacts, Elsevier.
PythonData AnalysisEducationAI Tools

Software tool for AIassisted exploratory data analysis, published in Software Impacts (Elsevier). Designed to help practitioners and teams quickly understand da

DaTu is a software tool that automates the tedious first steps of exploratory data analysis — understanding dataset structure, detecting quality issues, and generating a first-pass summary of what's actually in the data.

The Problem

Anyone who has worked on a real data science project knows the drill: you get a dataset, and before you can do anything meaningful with it, you spend hours just trying to understand what you have. What are the column types? How much is missing? Are there obvious outliers? What's the distribution?

This is necessary work, but it's also repetitive and mechanical. DaTu automates it.

What DaTu Does

The tool takes a raw dataset and produces a structured analysis covering:

  • Schema inference — column types, detected categories, numeric ranges
  • Quality assessment — missing value patterns, duplicate detection, outlier identification
  • Distribution summaries — histograms, correlation matrices, categorical value counts
  • Suggested next steps — based on the data profile, what preprocessing steps are likely needed

The output is structured to be actionable, not just descriptive. The goal is to reduce the time from "received dataset" to "ready to model" from hours to minutes.

Design Decisions

The tool is designed for education as much as production. Data science students often skip EDA or do it superficially because it feels like overhead before the "real" work. DaTu makes the EDA artifact concrete and shareable — a document you can discuss with collaborators or submit alongside a project.

The AI-assisted components handle natural language summarization of the data profile: instead of just showing a table of missing value percentages, DaTu explains what the pattern suggests and why it matters.

Publication

Published in Software Impacts (Elsevier) as a software description paper — a format for tools with clear scientific utility that deserve documentation beyond a README.

Full paper: ScienceDirect

I write about this kind of work — reliability, uncertainty, building things that work in production. One email per month.