Auto ML - Automated Machine Learning Platform

PythonLicenseMCP

An intelligent automated machine learning platform that provides comprehensive data analysis, preprocessing, model selection, and hyperparameter tuning capabilities through Model Context Protocol (MCP) tools.

๐Ÿš€ Features

๐Ÿ“Š Data Analysis & Exploration

  • Data Information: Get comprehensive dataset statistics including shape, memory usage, data types, and missing values
  • CSV Reading: Efficient CSV file reading with pandas and pyarrow support
  • Correlation Analysis: Visualize correlation matrices for numerical and categorical variables
  • Outlier Detection: Identify and visualize outliers in your datasets

๐Ÿ”ง Data Preprocessing

  • Automated Preprocessing: Handle missing values, encode categorical variables, and scale numerical features
  • Feature Engineering: Prepare features for both regression and classification problems
  • Data Validation: Check for duplicates and data quality issues

๐Ÿค– Machine Learning Models

  • Multiple Algorithms: Support for various ML algorithms including:
    • Regression: Linear Regression, Ridge, Lasso, ElasticNet, Random Forest, XGBoost, SVR, KNN, CatBoost
    • Classification: Logistic Regression, Ridge Classifier, Random Forest, XGBoost, SVM, KNN, Decision Tree, Naive Bayes, CatBoost
    • Ensemble Methods: Voting classifiers and regressors for improved performance

๐Ÿ“ˆ Model Evaluation & Visualization

  • Performance Metrics:
    • Regression: Rยฒ, MAE, MSE
    • Classification: Accuracy, F1-Score
  • Confusion Matrix Visualization: For classification problems
  • Model Comparison: Compare multiple models side-by-side

โš™๏ธ Hyperparameter Tuning

  • Automated Tuning: Optimize model hyperparameters using advanced search algorithms
  • Customizable Scoring: Choose from various evaluation metrics
  • Trial Management: Control the number of optimization trials

๐Ÿ“ Project Structure

Auto/
โ”œโ”€โ”€ data/                   # Sample datasets
โ”‚   โ”œโ”€โ”€ Ai.csv
โ”‚   โ”œโ”€โ”€ Calories.csv
โ”‚   โ”œโ”€โ”€ Cost.csv
โ”‚   โ”œโ”€โ”€ Digital.csv
โ”‚   โ”œโ”€โ”€ ford.csv
โ”‚   โ”œโ”€โ”€ Habits.csv
โ”‚   โ”œโ”€โ”€ heart.csv
โ”‚   โ”œโ”€โ”€ Lifestyle.csv
โ”‚   โ”œโ”€โ”€ Mobiles.csv
โ”‚   โ”œโ”€โ”€ Personality.csv
โ”‚   โ”œโ”€โ”€ Salaries.csv
โ”‚   โ””โ”€โ”€ Sleep.csv
โ”œโ”€โ”€ tools/
โ”‚   โ””โ”€โ”€ all_tools.py       # MCP tool definitions
โ”œโ”€โ”€ utils/
โ”‚   โ”œโ”€โ”€ before_model.py    # Feature preparation
โ”‚   โ”œโ”€โ”€ details.py         # Data information
โ”‚   โ”œโ”€โ”€ hyperparameter.py  # Hyperparameter tuning
โ”‚   โ”œโ”€โ”€ model_selection.py # Model selection and evaluation
โ”‚   โ”œโ”€โ”€ preprocessing.py   # Data preprocessing
โ”‚   โ”œโ”€โ”€ read_csv_file.py   # CSV reading utilities
โ”‚   โ””โ”€โ”€ visualize_data.py  # Visualization functions
โ”œโ”€โ”€ main.py                # Application entry point
โ”œโ”€โ”€ server.py              # MCP server configuration
โ”œโ”€โ”€ requirements.txt       # Python dependencies
โ””โ”€โ”€ README.md             # This file

๐Ÿ› ๏ธ Installation

Prerequisites

  • Python 3.8 or higher
  • pip or uv package manager

Setup

  1. Clone the repository

    git clone https://github.com/emircansoftware/MCP_Server_DataScience.git
    cd MCP_Server_DataScience
    
  2. Install dependencies

    # Using pip
    pip install -r requirements.txt
    
    # Or using uv (recommended)
    uv sync
    
  3. Run the application

    python main.py
    

๐Ÿ“‹ Dependencies

  • MCP Framework: mcp[cli]>=1.9.4 - Model Context Protocol for tool integration
  • Data Processing: pandas>=2.3.0, pyarrow>=20.0.0, numpy>=2.3.1
  • Machine Learning: scikit-learn>=1.3.0, xgboost>=2.0.0, lightgbm>=4.3.0
  • Additional ML: catboost (for CatBoost models)

๐ŸŽฏ Usage

Starting the MCP Server

from server import mcp

# Run the server
mcp.run()

Available Tools

The platform provides the following MCP tools:

Data Analysis Tools
  • information_about_data(file_name): Get comprehensive dataset information
  • reading_csv(file_name): Read CSV files efficiently
  • visualize_correlation_num(file_name): Visualize numerical correlations
  • visualize_correlation_cat(file_name): Visualize categorical correlations
  • visualize_correlation_final(file_name, target_column): Final correlation analysis
  • visualize_outliers(file_name): Detect and visualize outliers
  • visualize_outliers_final(file_name, target_column): Final outlier analysis
Preprocessing Tools
  • preprocessing_data(file_name, target_column): Automated data preprocessing
  • prepare_data(file_name, target_column, problem_type): Feature preparation
Model Training & Evaluation
  • models(problem_type, file_name, target_column): Train and evaluate multiple models
  • visualize_accuracy_matrix(file_name, target_column, problem_type): Confusion matrix visualization
  • best_model_hyperparameter(model_name, file_name, target_column, problem_type, n_trials, scoring, random_state): Hyperparameter tuning

Example Workflow

# 1. Analyze your data
info = information_about_data("data/heart.csv")

# 2. Preprocess the data
preprocessed = preprocessing_data("data/heart.csv", "target")

# 3. Prepare features for classification
features = prepare_data("data/heart.csv", "target", "classification")

# 4. Train and evaluate models
results = models("classification", "data/heart.csv", "target")

# 5. Visualize results
confusion_matrix = visualize_accuracy_matrix("data/heart.csv", "target", "classification")

# 6. Optimize best model
best_model = best_model_hyperparameter("RandomForestClassifier", "data/heart.csv", "target", "classification", 100, "accuracy", 42)

๐Ÿ“Š Sample Datasets

The project includes various sample datasets for testing:

  • heart.csv: Heart disease prediction dataset
  • Salaries.csv: Salary prediction dataset
  • Calories.csv: Calorie prediction dataset
  • Personality.csv: Personality analysis dataset
  • Digital.csv: Digital behavior dataset
  • Lifestyle.csv: Lifestyle analysis dataset
  • Mobiles.csv: Mobile phone dataset
  • Habits.csv: Habit analysis dataset
  • Sleep.csv: Sleep pattern dataset
  • Cost.csv: Cost analysis dataset
  • ford.csv: Ford car dataset
  • Ai.csv: AI-related dataset
  • cat.csv: Cat-related dataset

๐Ÿ”ง Configuration

Environment Variables

  • Set your preferred random seed for reproducible results
  • Configure MCP server settings in server.py

Customization

  • Add new ML algorithms in utils/model_selection.py
  • Extend preprocessing steps in utils/preprocessing.py
  • Create custom visualization functions in utils/visualize_data.py

๐Ÿค Contributing

We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Contributing Guidelines

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ“ž Support

If you encounter any issues or have questions:

  1. Check the Issues page
  2. Create a new issue with detailed information
  3. Contact the maintainers

Made with โค๏ธ for the ML community

MCP Server ยท Populars

MCP Server ยท New