Auto ML - Automated Machine Learning Platform
An intelligent automated machine learning platform that provides comprehensive data analysis, preprocessing, model selection, and hyperparameter tuning capabilities through Model Context Protocol (MCP) tools.
๐ Features
๐ Data Analysis & Exploration
- Data Information: Get comprehensive dataset statistics including shape, memory usage, data types, and missing values
- CSV Reading: Efficient CSV file reading with pandas and pyarrow support
- Correlation Analysis: Visualize correlation matrices for numerical and categorical variables
- Outlier Detection: Identify and visualize outliers in your datasets
๐ง Data Preprocessing
- Automated Preprocessing: Handle missing values, encode categorical variables, and scale numerical features
- Feature Engineering: Prepare features for both regression and classification problems
- Data Validation: Check for duplicates and data quality issues
๐ค Machine Learning Models
- Multiple Algorithms: Support for various ML algorithms including:
- Regression: Linear Regression, Ridge, Lasso, ElasticNet, Random Forest, XGBoost, SVR, KNN, CatBoost
- Classification: Logistic Regression, Ridge Classifier, Random Forest, XGBoost, SVM, KNN, Decision Tree, Naive Bayes, CatBoost
- Ensemble Methods: Voting classifiers and regressors for improved performance
๐ Model Evaluation & Visualization
- Performance Metrics:
- Regression: Rยฒ, MAE, MSE
- Classification: Accuracy, F1-Score
- Confusion Matrix Visualization: For classification problems
- Model Comparison: Compare multiple models side-by-side
โ๏ธ Hyperparameter Tuning
- Automated Tuning: Optimize model hyperparameters using advanced search algorithms
- Customizable Scoring: Choose from various evaluation metrics
- Trial Management: Control the number of optimization trials
๐ Project Structure
Auto/
โโโ data/ # Sample datasets
โ โโโ Ai.csv
โ โโโ Calories.csv
โ โโโ Cost.csv
โ โโโ Digital.csv
โ โโโ ford.csv
โ โโโ Habits.csv
โ โโโ heart.csv
โ โโโ Lifestyle.csv
โ โโโ Mobiles.csv
โ โโโ Personality.csv
โ โโโ Salaries.csv
โ โโโ Sleep.csv
โโโ tools/
โ โโโ all_tools.py # MCP tool definitions
โโโ utils/
โ โโโ before_model.py # Feature preparation
โ โโโ details.py # Data information
โ โโโ hyperparameter.py # Hyperparameter tuning
โ โโโ model_selection.py # Model selection and evaluation
โ โโโ preprocessing.py # Data preprocessing
โ โโโ read_csv_file.py # CSV reading utilities
โ โโโ visualize_data.py # Visualization functions
โโโ main.py # Application entry point
โโโ server.py # MCP server configuration
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
๐ ๏ธ Installation
Prerequisites
- Python 3.8 or higher
- pip or uv package manager
Setup
Clone the repository
git clone https://github.com/emircansoftware/MCP_Server_DataScience.git cd MCP_Server_DataScience
Install dependencies
# Using pip pip install -r requirements.txt # Or using uv (recommended) uv sync
Run the application
python main.py
๐ Dependencies
- MCP Framework:
mcp[cli]>=1.9.4
- Model Context Protocol for tool integration - Data Processing:
pandas>=2.3.0
,pyarrow>=20.0.0
,numpy>=2.3.1
- Machine Learning:
scikit-learn>=1.3.0
,xgboost>=2.0.0
,lightgbm>=4.3.0
- Additional ML:
catboost
(for CatBoost models)
๐ฏ Usage
Starting the MCP Server
from server import mcp
# Run the server
mcp.run()
Available Tools
The platform provides the following MCP tools:
Data Analysis Tools
information_about_data(file_name)
: Get comprehensive dataset informationreading_csv(file_name)
: Read CSV files efficientlyvisualize_correlation_num(file_name)
: Visualize numerical correlationsvisualize_correlation_cat(file_name)
: Visualize categorical correlationsvisualize_correlation_final(file_name, target_column)
: Final correlation analysisvisualize_outliers(file_name)
: Detect and visualize outliersvisualize_outliers_final(file_name, target_column)
: Final outlier analysis
Preprocessing Tools
preprocessing_data(file_name, target_column)
: Automated data preprocessingprepare_data(file_name, target_column, problem_type)
: Feature preparation
Model Training & Evaluation
models(problem_type, file_name, target_column)
: Train and evaluate multiple modelsvisualize_accuracy_matrix(file_name, target_column, problem_type)
: Confusion matrix visualizationbest_model_hyperparameter(model_name, file_name, target_column, problem_type, n_trials, scoring, random_state)
: Hyperparameter tuning
Example Workflow
# 1. Analyze your data
info = information_about_data("data/heart.csv")
# 2. Preprocess the data
preprocessed = preprocessing_data("data/heart.csv", "target")
# 3. Prepare features for classification
features = prepare_data("data/heart.csv", "target", "classification")
# 4. Train and evaluate models
results = models("classification", "data/heart.csv", "target")
# 5. Visualize results
confusion_matrix = visualize_accuracy_matrix("data/heart.csv", "target", "classification")
# 6. Optimize best model
best_model = best_model_hyperparameter("RandomForestClassifier", "data/heart.csv", "target", "classification", 100, "accuracy", 42)
๐ Sample Datasets
The project includes various sample datasets for testing:
- heart.csv: Heart disease prediction dataset
- Salaries.csv: Salary prediction dataset
- Calories.csv: Calorie prediction dataset
- Personality.csv: Personality analysis dataset
- Digital.csv: Digital behavior dataset
- Lifestyle.csv: Lifestyle analysis dataset
- Mobiles.csv: Mobile phone dataset
- Habits.csv: Habit analysis dataset
- Sleep.csv: Sleep pattern dataset
- Cost.csv: Cost analysis dataset
- ford.csv: Ford car dataset
- Ai.csv: AI-related dataset
- cat.csv: Cat-related dataset
๐ง Configuration
Environment Variables
- Set your preferred random seed for reproducible results
- Configure MCP server settings in
server.py
Customization
- Add new ML algorithms in
utils/model_selection.py
- Extend preprocessing steps in
utils/preprocessing.py
- Create custom visualization functions in
utils/visualize_data.py
๐ค Contributing
We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Contributing Guidelines
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Model Context Protocol for the MCP framework
- scikit-learn for machine learning algorithms
- XGBoost for gradient boosting
- CatBoost for categorical boosting
- pandas for data manipulation
๐ Support
If you encounter any issues or have questions:
- Check the Issues page
- Create a new issue with detailed information
- Contact the maintainers
Made with โค๏ธ for the ML community