← All toolsFinance & Analysisanalyst_toolsAll editions

CLEAN_DATA

Performs senior-level data cleaning on a CSV or Excel file. Automatically handles missing values, removes duplicates, standardizes column names, and normalizes text strings. Returns the path to the cleaned file and a summary of the cleaning operations.

Parameters

NameTypeRequiredDescription
input_pathstringyesAbsolute path to the messy CSV or Excel file.
output_pathstringyesAbsolute path where the cleaned file will be saved. Must match input extension.
sheet_namestringβ€”For Excel files only: specific sheet to clean. Default is the first sheet.
handle_missingdrop | mean | median | mode | zeroβ€”Strategy for missing values. Default is 'mean' for numeric, 'empty' for text.
remove_duplicatesbooleanβ€”Whether to drop exact duplicate rows. Default is true.
normalize_textbooleanβ€”Whether to trim whitespace and standardize casing on text columns. Default is true.
standardize_columnsbooleanβ€”Whether to convert column names to snake_case. Default is true.

How to use it

You normally trigger this by describing what you want in chat β€” the agent selects CLEAN_DATA automatically. For example:

Try saying
β€œuse analyst tools to clean …”

In a workflow

As a step in a multi-step workflow DAG:

json
{
  "id": "s1",
  "agent": "analyst_tools",
  "action": "CLEAN_DATA",
  "args": {
    "input_path": "/Users/me/Documents/file.txt",
    "output_path": "/Users/me/Documents/file.txt"
  },
  "depends_on": [],
  "outputs": []
}

Direct call

For scripting, call it directly via POST /execute_tool. Every tool returns { success, message, data }.

bash
curl -X POST http://127.0.0.1:8000/execute_tool \
  -H "Content-Type: application/json" \
  -d '{"tool_name":"CLEAN_DATA","args":{"input_path":"/Users/me/Documents/file.txt","output_path":"/Users/me/Documents/file.txt"}}'

Part of the analyst_tools plugin. Browse the full Plugin & Tool Catalog or the relevant feature guide.