CLEAN_DATA
Performs senior-level data cleaning on a CSV or Excel file. Automatically handles missing values, removes duplicates, standardizes column names, and normalizes text strings. Returns the path to the cleaned file and a summary of the cleaning operations.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
input_path | string | yes | Absolute path to the messy CSV or Excel file. |
output_path | string | yes | Absolute path where the cleaned file will be saved. Must match input extension. |
sheet_name | string | β | For Excel files only: specific sheet to clean. Default is the first sheet. |
handle_missing | drop | mean | median | mode | zero | β | Strategy for missing values. Default is 'mean' for numeric, 'empty' for text. |
remove_duplicates | boolean | β | Whether to drop exact duplicate rows. Default is true. |
normalize_text | boolean | β | Whether to trim whitespace and standardize casing on text columns. Default is true. |
standardize_columns | boolean | β | Whether to convert column names to snake_case. Default is true. |
How to use it
You normally trigger this by describing what you want in chat β the agent selects CLEAN_DATA automatically. For example:
Try saying
βuse analyst tools to clean β¦β
In a workflow
As a step in a multi-step workflow DAG:
json
{
"id": "s1",
"agent": "analyst_tools",
"action": "CLEAN_DATA",
"args": {
"input_path": "/Users/me/Documents/file.txt",
"output_path": "/Users/me/Documents/file.txt"
},
"depends_on": [],
"outputs": []
}Direct call
For scripting, call it directly via POST /execute_tool. Every tool returns { success, message, data }.
bash
curl -X POST http://127.0.0.1:8000/execute_tool \
-H "Content-Type: application/json" \
-d '{"tool_name":"CLEAN_DATA","args":{"input_path":"/Users/me/Documents/file.txt","output_path":"/Users/me/Documents/file.txt"}}'Part of the analyst_tools plugin. Browse the full Plugin & Tool Catalog or the relevant feature guide.