embodied_arm_mcp

基于 ROS2 Jazzy + faster-whisper + MiniMax M3 + MCP 协议构建的语音大模型控制机械臂项目（Day 5 v0.1）。

Day 5 完成语音+LLM 全链路；Day 6 接入 MCP Server + 6DOF 机械臂仿真；Day 7 录视频 + 简历 v2.0。

🎯 项目目标

搭建语音大模型控制机械臂的端到端链路——说话 → Whisper 语音识别 → MiniMax M3 语义理解 → MCP 协议调用 → ROS2 机械臂执行。这是 2026 具身智能（Embodied AI）招聘风口的核心技术链路。

为什么是主推：

5 大 2026 风口词全打：具身智能 / Embodied AI / VLA 模型 / MCP 协议 / 数字孪生
大模型 + 机器人跨界，技术稀缺性极强
真实场景：人 → Agent → 机器人，端到端全栈
与 Day 3-4 避障项目互补（移动底盘感知决策 vs 机械臂 + LLM 上层智能）

🏗️ 架构（Day 5 阶段）

┌────────────────────────────────────────────────────┐
│                  语音输入层                         │
│  ┌────────────┐  ┌────────────┐  ┌──────────────┐  │
│  │ 录音 (pyaudio)│─►│ Whisper STT │─►│ MiniMax M3  │  │
│  │ .wav 16kHz  │  │ (faster-    │  │ (OpenAI 兼容)│  │
│  └────────────┘  │  whisper)   │  │ → tool_call  │  │
│                  └────────────┘  └──────────────┘  │
│                                            │       │
│  ┌────────────┐                             │       │
│  │ Pyttsx3 /  │◄────────────────────────────┘       │
│  │ EdgeTTS    │                                       │
│  └────────────┘                                       │
└────────────────────────────────────────────────────┘

Day 6 + 7 扩展（计划中）：

MCP Server 暴露 5 个工具：move_to / gripper_open / gripper_close / record_motion / replay_motion
ROS2 Jazzy + MoveIt2 + 6DOF 机械臂（panda 仿真）
端到端：「移动到 (0.3, 0, 0.2)」→ 机械臂动起来

🛠️ 环境

OS：WSL2 Ubuntu 24.04
ROS2：Jazzy Jalisco
Python：3.12（系统）+ venv 隔离
LLM：MiniMax M3（OpenAI 兼容协议）
STT：faster-whisper base 模型（CPU 实时）

🚀 运行步骤

1. 克隆 + 装依赖

git clone https://github.com/JakLiao/embodied_arm_mcp.git
cd embodied_arm_mcp

# 装 ROS2 Jazzy + 必要系统包（Day 5 不用 sudo，但 Day 6 需要）
# sudo apt install ros-jazzy-moveit ros-jazzy-moveit-py portaudio19-dev

# 建 venv + 装 Python 依赖
cd ..
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cd src/embodied_arm_mcp

2. 配置 MiniMax API Key

# ~/.bashrc 第 9 行 export MINIMAX_API_KEY=...
source ~/.bashrc
echo $MINIMAX_API_KEY  # 应有值

3. 编译 ROS2 包

cd ~/embodied_arm_mcp_ws
source /opt/ros/jazzy/setup.bash
source install/setup.bash
colcon build --packages-select embodied_arm_mcp --merge-install

4. 运行各模块

# 4.1 Whisper STT（用 JFK 测试音频）
wget https://github.com/openai/whisper/raw/main/tests/jfk.flac -O /tmp/jfk.flac
python -c "import soundfile; d,sr=soundfile.read('/tmp/jfk.flac'); soundfile.write('/tmp/test.wav',d,sr,subtype='PCM_16')"
whisper_stt --audio /tmp/test.wav --model base --language en
# 期望：And so my fellow Americans, ask not what your country can do for you...

# 4.2 TTS（Pyttsx3 / EdgeTTS）
tts --text "你好 MiniMax M3" --engine edge --output /tmp/test.mp3

# 4.3 LLM Agent（MiniMax M3）
llm_agent --text "用一句话介绍你自己"

# 4.4 端到端语音 → LLM 管线
voice_pipeline --audio /tmp/test.wav --no-speak --language en

📊 性能基准（Day 5 实测）

链路	延迟	备注
Whisper base STT（11s 音频）	0.5s	CPU 实时
MiniMax M3 单轮对话	2.8-6.5s	含网络 + 思考
TTS（EdgeTTS 中文）	0.5s	仅生成 mp3，不含播放
端到端（不含录音）	~3-7s	STT + LLM + TTS

📁 文件结构

embodied_arm_mcp_ws/
├── README.md
├── .gitignore
├── venv/                                    # Python 虚拟环境
├── src/embodied_arm_mcp/
│   ├── package.xml
│   ├── setup.py
│   ├── setup.cfg
│   ├── resource/embodied_arm_mcp
│   └── embodied_arm_mcp/
│       ├── __init__.py
│       ├── audio_recorder.py                # pyaudio 录音
│       ├── whisper_stt.py                   # faster-whisper STT
│       ├── tts.py                           # Pyttsx3 + EdgeTTS
│       ├── llm_agent.py                     # MiniMax M3 (OpenAI 兼容)
│       └── voice_llm_pipeline.py            # STT + LLM 端到端
└── install/                                 # colcon build 输出

⚠️ 踩坑汇总

WSL2 无音频设备：pyaudio 找不到设备，arecord -l 看不到。Day 5 走方案 C（用现成 .wav）。Day 7 录视频时用 Windows 端录音 + 共享文件。
pyaudio 安装失败：需要 sudo apt install portaudio19-dev（Day 5 跳过，Day 6 可能需要）。
Pyttsx3 缺 eSpeak：WSL2 默认无 eSpeak，需 sudo apt install espeak-ng（Day 5 跳过，用 EdgeTTS 替代）。
MiniMax API Key：必须 source ~/.bashrc 才能拿到环境变量（非交互 shell 不读 .bashrc）。
pip 装包慢：默认 pypi 在国内极慢（23kB/s），用清华镜像 pip install -i https://pypi.tuna.tsinghua.edu.cn/simple。
ros2 run 找不到 venv 包：entry_points 装在系统 Python，运行前需 export PYTHONPATH=$PWD/venv/lib/python3.12/site-packages:$PYTHONPATH。

📚 参考资料

资源	链接
黑马 BV131ZuBdEMZ P56-P63	https://www.bilibili.com/video/BV131ZuBdEMZ
faster-whisper	https://github.com/SYSTRAN/faster-whisper
MiniMax API	https://api.minimaxi.com
MCP 协议	https://modelcontextprotocol.io/
MoveIt2 文档	https://moveit.picknik.ai/main/index.html

📝 License

MIT

embodied_arm_mcp

embodied_arm_mcp

🎯 项目目标

🏗️ 架构（Day 5 阶段）

🛠️ 环境

🚀 运行步骤

1. 克隆 + 装依赖

2. 配置 MiniMax API Key

3. 编译 ROS2 包

4. 运行各模块

📊 性能基准（Day 5 实测）

📁 文件结构

⚠️ 踩坑汇总

📚 参考资料

📝 License

MCP Server · Populars

🦞 OpenClaw — Personal AI Assistant

MarkItDown-MCP

MarkItDown

Awesome MCP Servers

mcp-server-sentry: A Sentry MCP server

MCP Server · New

MCP Vector Search

MCP Proxy Server

Docling MCP: making docling agentic

duckle

Fabric MCP Server