- Add automatic recording stop when user turns off camera/screenshare - Listen for startStreaming events to detect video_ssrc=0 (camera/screen off) - Add disconnect and connection close event handling - Implement proper cleanup of event listeners to prevent memory leaks - Add targetUserId tracking for accurate event filtering - Update both videoRecording.js and webcamRecording.js services - Update discord.js-selfbot-v13 submodule with latest webcam recording fixes
6.5 KiB
Docker Compose Examples for Local AI Stack
This document provides production-ready docker-compose.yml examples for setting up the self-hosted AI services required by the Teto AI Companion bot. These services should be included in the same docker-compose.yml file as the teto_ai bot service itself to ensure proper network communication.
Important
These examples require a host machine with an NVIDIA GPU and properly installed drivers. They use CDI (Container Device Interface) for GPU reservations, which is the modern standard for Docker.
🤖 vLLM Service (Language & Vision Model)
This service uses vLLM to serve a powerful language model with an OpenAI-compatible API endpoint. This allows Teto to perform natural language understanding and generation locally. If you use a multi-modal model, this service will also provide vision capabilities.
services:
vllm-openai:
# This section reserves GPU resources for the container.
# It ensures vLLM has exclusive access to the NVIDIA GPUs.
deploy:
resources:
reservations:
devices:
- driver: cdi
device_ids: ['nvidia.com/gpu=all']
capabilities: ['gpu']
# Mount local directories for model weights and cache.
# This prevents re-downloading models on every container restart.
volumes:
- /path/to/your/llm_models/hf_cache:/root/.cache/huggingface
- /path/to/your/llm_models:/root/LLM_models
# Map the container's port 8000 to a host port (e.g., 11434).
# Your .env file should point to this host port.
ports:
- "11434:8000"
environment:
# (Optional) Add your Hugging Face token if needed for private models.
- HUGGING_FACE_HUB_TOKEN=your_hf_token_here
# Optimizes PyTorch memory allocation, can improve performance.
- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512,garbage_collection_threshold:0.8
# Necessary for multi-GPU communication and performance.
ipc: host
image: vllm/vllm-openai:latest
# --- vLLM Command Line Arguments ---
# These arguments configure how vLLM serves the model.
# Adjust them based on your model and hardware.
command: >
--model jeffcookio/Mistral-Small-3.2-24B-Instruct-2506-awq-sym
--tensor-parallel-size 2 # Number of GPUs to use.
--max-model-len 32256 # Maximum context length.
--limit-mm-per-prompt image=4 # For multi-modal models.
--enable-auto-tool-choice # For models that support tool use.
--tool-call-parser mistral
--enable-chunked-prefill
--disable-log-stats
--gpu-memory-utilization 0.75 # Use 75% of GPU VRAM.
--enable-prefix-caching
--max-num-seqs 4 # Max concurrent sequences.
--served-model-name Mistral-Small-3.2
vLLM Configuration Notes
--model: Specify the Hugging Face model identifier you want to serve.--tensor-parallel-size: Set this to the number of GPUs you want to use for a single model. For a single GPU, this should be1.--gpu-memory-utilization: Adjust this value based on your VRAM.0.75(75%) is a safe starting point.- Check the official vLLM documentation for the latest command-line arguments and supported models.
🎤 Wyoming Voice Services (Piper TTS & Whisper STT)
These services provide Text-to-Speech (Piper) and Speech-to-Text (Whisper) capabilities over the Wyoming protocol. They run as separate containers but are managed within the same Docker Compose file.
services:
# --- Whisper STT Service ---
# Converts speech from the voice channel into text for Teto to understand.
wyoming-whisper:
image: slackr31337/wyoming-whisper-gpu:latest
container_name: wyoming-whisper
environment:
# Configure the Whisper model size and language.
# Smaller models are faster but less accurate.
- MODEL=base-int8
- LANGUAGE=en
- COMPUTE_TYPE=int8
- BEAM_SIZE=5
ports:
# Exposes the Wyoming protocol port for Whisper.
- "10300:10300"
volumes:
# Mount a volume to persist Whisper model data.
- /path/to/your/whisper_data:/data
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: cdi
device_ids: ['nvidia.com/gpu=all']
capabilities: ['gpu']
# --- Piper TTS Service ---
# Converts Teto's text responses into speech.
wyoming-piper:
image: slackr31337/wyoming-piper-gpu:latest
container_name: wyoming-piper
environment:
# Specify which Piper voice model to use.
- PIPER_VOICE=en_US-amy-medium
ports:
# Exposes the Wyoming protocol port for Piper.
- "10200:10200"
volumes:
# Mount a volume to persist Piper voice models.
- /path/to/your/piper_data:/data
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: cdi
device_ids: ['nvidia.com/gpu=all']
capabilities: ['gpu']
Wyoming Configuration Notes
- Multiple Ports: Note that
WhisperandPiperlisten on different ports (10300and10200in this example). Your bot's configuration will need to point to the correct service and port. - Voice Models: You can download different
Pipervoice models and place them in your persistent data directory to change Teto's voice. - GPU Usage: These images are for GPU-accelerated voice processing. If your GPU is dedicated to
vLLM, you may consider using CPU-based images for Wyoming to conserve VRAM.
🌐 Networking
For the services to communicate with each other, they must share a Docker network. Using an external network is a good practice for managing complex applications.
# Add this to the bottom of your docker-compose.yml file
networks:
backend:
external: true
Before starting your stack, create the network manually:
docker network create backend
Then, ensure each service in your docker-compose.yml (including the teto_ai bot) is attached to this network:
services:
teto_ai:
# ... your bot's configuration
networks:
- backend
vllm-openai:
# ... vllm configuration
networks:
- backend
wyoming-whisper:
# ... whisper configuration
networks:
- backend
wyoming-piper:
# ... piper configuration
networks:
- backend
This allows the Teto bot to communicate with vllm-openai, wyoming-whisper, and wyoming-piper using their service names as hostnames.