Updated the docs to focus on a local only stack instead of one relient on services like OpenAI, Eleven labs and so on.

This commit is contained in:
Mikolaj Wojciech Gorski 2025-07-26 14:26:18 +02:00
parent 44b45b7212
commit 2e94820164
7 changed files with 489 additions and 176 deletions

View file

@ -35,14 +35,18 @@ Kasane Teto is your server's AI companion who can:
## 🚀 Quick Start
> [!IMPORTANT]
> This project is designed to run exclusively within Docker containers. Bare-metal installation is not officially supported. All instructions assume a working Docker environment.
1. **Setup Environment**
```bash
git clone <repository-url>
cd discord_teto
# Configure AI and Discord credentials
# Configure Discord credentials & local AI endpoints
export USER_TOKEN="your_discord_token"
export OPENAI_API_KEY="your_openai_key" # or other AI provider
export VLLM_ENDPOINT="http://localhost:8000" # Or your vLLM server
export WYOMING_ENDPOINT="http://localhost:10300" # Or your Wyoming server
```
2. **Start Teto**
@ -106,10 +110,11 @@ src/
```
### AI Integration
- **Language Model**: GPT-4/Claude/Local LLM for conversation
- **Vision Model**: CLIP/GPT-4V for image understanding
- **Voice Synthesis**: Eleven Labs/Azure Speech for Teto's voice
- **Memory System**: Vector database for conversation history
- **Language Model**: Self-hosted LLM via `vLLM` (OpenAI compatible endpoint)
- **Vision Model**: Multi-modal models served through `vLLM`
- **Voice Synthesis**: `Piper` TTS via `Wyoming` protocol
- **Speech Recognition**: `Whisper` STT via `Wyoming` protocol
- **Memory System**: Local vector database for conversation history
- **Personality Engine**: Custom prompt engineering for character consistency
## 🎭 Teto's Personality
@ -157,21 +162,19 @@ src/
## 🔧 Configuration
### AI Provider Setup
### Local AI Provider Setup
```env
# OpenAI (recommended)
OPENAI_API_KEY=your_openai_key
OPENAI_MODEL=gpt-4-turbo-preview
# Local vLLM Server (OpenAI Compatible)
VLLM_ENDPOINT="http://localhost:8000/v1"
LOCAL_MODEL_NAME="mistralai/Mistral-7B-Instruct-v0.2" # Or your preferred model
# Alternative: Anthropic Claude
ANTHROPIC_API_KEY=your_claude_key
# Wyoming Protocol for Voice (Piper TTS / Whisper STT)
WYOMING_HOST="localhost"
WYOMING_PORT="10300"
PIPER_VOICE="en_US-lessac-medium"
# Voice Synthesis
ELEVENLABS_API_KEY=your_elevenlabs_key
TETO_VOICE_ID=kasane_teto_voice_clone
# Vision Capabilities
VISION_MODEL=gpt-4-vision-preview
# Vision Capabilities are enabled if the vLLM model is multi-modal
VISION_ENABLED=true
```
### Personality Customization
@ -196,6 +199,8 @@ export const TETO_PERSONALITY = {
## 🐳 Docker Deployment
This project is officially supported for **Docker deployments only**. The container-first approach is critical for managing the complex local AI stack, ensuring that all services, dependencies, and configurations operate together consistently.
### Production Setup
```bash
# Start Teto with all AI capabilities
@ -206,10 +211,11 @@ docker compose logs -f teto_ai
```
### Resource Requirements
- **Memory**: 4GB+ recommended for AI processing
- **CPU**: Multi-core for real-time AI inference
- **Storage**: SSD recommended for fast model loading
- **Network**: Stable connection for AI API calls
- **VRAM**: 8GB+ for 7B models, 24GB+ for larger models
- **Memory**: 16GB+ RAM recommended
- **CPU**: Modern multi-core CPU
- **Storage**: Fast SSD for model weights (15GB+ per model)
- **Network**: Local network for inter-service communication
## 🔐 Privacy & Ethics
@ -292,7 +298,7 @@ This project is for educational and community use. Please ensure compliance with
---
**Version**: 3.0.0 (AI-Powered)
**AI Models**: GPT-4, CLIP, Eleven Labs
**AI Stack**: Local-First (vLLM, Piper, Whisper)
**Runtime**: Node.js 20+ with Docker
Bring Kasane Teto to life in your Discord server! 🎵✨

View file

@ -17,9 +17,9 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
## 📚 Documentation Structure
### 🚀 Getting Started
- **[Setup Guide](setup.md)** - Complete installation and AI configuration
- **[Setup Guide](setup.md)** - Complete installation and local AI stack configuration
- **[Quick Start](../README.md#quick-start)** - Get Teto running in 5 minutes
- **[Configuration](configuration.md)** - AI models, personality, and customization
- **[Configuration](configuration.md)** - Local models, personality, and customization
### 💬 Interacting with Teto
- **[Conversation Guide](interactions.md)** - How to chat naturally with Teto
@ -28,10 +28,10 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
- **[Voice Interaction](voice.md)** - Speaking with Teto in voice channels
### 🧠 AI Capabilities
- **[AI Architecture](ai-architecture.md)** - How Teto's AI systems work
- **[Vision System](vision.md)** - Image analysis and visual understanding
- **[Memory System](memory.md)** - How Teto remembers conversations
- **[Personality Engine](personality-engine.md)** - Character consistency and roleplay
- **[AI Architecture](ai-architecture.md)** - How Teto's local AI systems work
- **[Vision System](vision.md)** - Image analysis with local multi-modal models
- **[Memory System](memory.md)** - How Teto remembers conversations locally
+- **Personality Engine](personality-engine.md)** - Character consistency and roleplay
### 🔧 Technical Documentation
- **[Architecture Overview](architecture.md)** - System design and components
@ -41,15 +41,15 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
### 🛠️ Operations & Support
- **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
- **[Performance Tuning](performance.md)** - Optimization for your server
- **[Security & Privacy](security.md)** - Data handling and safety considerations
- **[Performance Tuning](performance.md)** - Optimizing your local AI stack
+- **[Security & Privacy](security.md)** - Data handling and safety in a local-first setup
## 🎯 Quick Navigation by Use Case
### "I want to set up Teto for the first time"
1. [Setup Guide](setup.md) - Installation and configuration
2. [Configuration](configuration.md) - AI API keys and personality setup
3. [Docker Guide](docker.md) - Container deployment
1. [Setup Guide](setup.md) - Installation and local AI stack configuration
2. [Configuration](configuration.md) - vLLM, Piper, and Whisper setup
3. [Docker Guide](docker.md) - Multi-container deployment for AI services
### "I want to understand how to interact with Teto"
1. [Conversation Guide](interactions.md) - Natural chat examples
@ -58,7 +58,7 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
### "I want to understand Teto's capabilities"
1. [Personality Guide](personality.md) - Character traits and style
2. [Vision System](vision.md) - Image and video analysis
2. [Vision System](vision.md) - Image analysis with local models
3. [AI Architecture](ai-architecture.md) - Technical capabilities
### "I want to customize or develop features"
@ -68,8 +68,8 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
### "I'm having issues or want to optimize"
1. [Troubleshooting](troubleshooting.md) - Problem solving
2. [Performance Tuning](performance.md) - Optimization tips
3. [Security & Privacy](security.md) - Best practices
2. [Performance Tuning](performance.md) - Optimizing your local AI stack
+- **[Security & Privacy](security.md)** - Best practices for a local-first setup
## 🌟 Key Features Overview
@ -94,11 +94,12 @@ Carefully crafted personality engine ensures Teto maintains consistent character
## 🔧 Technical Architecture
```
Teto AI System
├── Language Model (GPT-4/Claude) # Natural conversation
├── Vision Model (GPT-4V/CLIP) # Image/video analysis
├── Voice Synthesis (ElevenLabs) # Speech generation
├── Memory System (Vector DB) # Conversation history
Teto Local AI System
├── Language Model (vLLM) # Self-hosted natural conversation
├── Vision Model (vLLM Multi-modal) # Self-hosted image/video analysis
├── Voice Synthesis (Piper TTS) # Local speech generation via Wyoming
├── Speech Recognition (Whisper STT) # Local speech recognition via Wyoming
├── Memory System (Local Vector DB) # Local conversation history
├── Personality Engine # Character consistency
└── Discord Integration # Platform interface
```
@ -106,23 +107,24 @@ Teto AI System
## 📋 System Requirements
### Minimum Requirements
- **RAM**: 4GB (AI model loading)
- **CPU**: Multi-core (real-time inference)
- **Storage**: 10GB (models and data)
- **Network**: Stable connection (AI API calls)
- **VRAM**: 8GB+ for 7B models (required for `vLLM`)
- **RAM**: 16GB+ (for models and system)
- **CPU**: Modern multi-core (for processing)
- **Storage**: 15GB+ SSD (for model weights)
- **Network**: Local network for inter-service communication
### Recommended Setup
- **RAM**: 8GB+ for optimal performance
- **CPU**: Modern multi-core processor
- **Storage**: SSD for fast model access
- **GPU**: Optional but beneficial for local inference
- **VRAM**: 24GB+ for larger models or concurrent tasks
- **RAM**: 32GB+ for smoother operation
- **Storage**: NVMe SSD for fast model loading
- **GPU**: Required for `vLLM` and `Whisper`
## 🚦 Getting Started Checklist
- [ ] Read the [Setup Guide](setup.md)
- [ ] Obtain necessary API keys (OpenAI, ElevenLabs, etc.)
- [ ] Configure Discord token and permissions
- [ ] Deploy using Docker or run locally
- [ ] Download required model weights (LLM, TTS, etc.)
- [ ] Configure local endpoints for `vLLM` and `Wyoming`
- [ ] Deploy multi-container stack using Docker
- [ ] Customize personality settings
- [ ] Test basic conversation features
- [ ] Explore voice and vision capabilities
@ -143,12 +145,12 @@ See the [Development Guide](development.md) for detailed contribution guidelines
- **Technical Issues**: Check [Troubleshooting](troubleshooting.md)
- **Setup Problems**: Review [Setup Guide](setup.md)
- **Feature Questions**: See [Commands Reference](commands.md)
- **AI Behavior**: Read [Personality Guide](personality.md)
+- **AI Behavior**: Read [Personality Guide](personality.md)
### Best Practices
- **Privacy First**: Always respect user consent and data privacy
- **Privacy First**: All data is processed locally, ensuring maximum privacy
- **Appropriate Content**: Maintain family-friendly interactions
- **Resource Management**: Monitor AI API usage and costs
- **Resource Management**: Monitor local GPU and CPU usage
- **Community Guidelines**: Foster positive server environments
## 📊 Documentation Stats
@ -163,10 +165,10 @@ See the [Development Guide](development.md) for detailed contribution guidelines
The documentation will continue to evolve with new features:
- **Advanced Memory Systems** - Long-term relationship building
- **Custom Voice Training** - Personalized Teto voice models
- **Custom Voice Training** - Fine-tuning `Piper` for a unique Teto voice
- **Multi-Server Consistency** - Shared personality across servers
- **Game Integration** - Interactive gaming experiences
- **Creative Tools** - Music and art generation capabilities
- **Creative Tools** - Music and art generation with local models
---

View file

@ -26,34 +26,34 @@ This document provides a comprehensive overview of how Kasane Teto's AI systems
### Core Components
**1. AI Orchestration Layer**
- Coordinates between different AI services
- Coordinates between different local AI services
- Manages context flow and decision routing
- Handles multi-modal input integration
- Ensures personality consistency across modalities
**2. Language Model Integration**
- Primary conversational intelligence (GPT-4/Claude)
- Context-aware response generation
- Personality-guided prompt engineering
**2. Language Model Integration (vLLM)**
- Self-hosted conversational intelligence via `vLLM`
- Context-aware response generation through OpenAI-compatible API
- Personality-guided prompt engineering for local models
- Multi-turn conversation management
**3. Vision Processing System**
- Image analysis and understanding
**3. Vision Processing System (vLLM Multi-modal)**
- Image analysis using local multi-modal models
- Video frame processing for streams
- Visual context integration with conversations
- Automated response generation for visual content
**4. Voice Synthesis & Recognition**
- Text-to-speech with Teto's voice characteristics
- Speech-to-text for voice command processing
- Emotional tone and inflection control
**4. Voice Synthesis & Recognition (Wyoming Protocol)**
- Text-to-speech using `Piper` for Teto's voice characteristics
- Speech-to-text using `Whisper` for voice command processing
- Emotional tone and inflection control via TTS models
- Real-time voice conversation capabilities
**5. Memory & Context System**
- Long-term conversation history storage
**5. Memory & Context System (Local)**
- Local long-term conversation history storage (e.g., ChromaDB)
- User preference and relationship tracking
- Context retrieval for relevant conversations
- Semantic search across past interactions
- Local semantic search across past interactions
**6. Personality Engine**
- Character consistency enforcement
@ -138,24 +138,25 @@ Image Upload → Image Processing → Vision Model → Context Integration → R
### Voice Interaction Flow
```
Voice Channel Join → Audio Processing → Speech Recognition → Text Processing → Voice Synthesis → Audio Output
↓ ↓
Noise Filtering → Intent Detection → LLM Response → Voice Cloning
Voice Channel Join → Audio Processing (Whisper) → Text Processing (vLLM) → Voice Synthesis (Piper) → Audio Output
Noise Filtering → Intent Detection → LLM Response → Voice Model
```
## 🧩 AI Service Integration
### Language Model Configuration
### Language Model Configuration (vLLM)
**Primary Model: GPT-4 Turbo**
**vLLM with OpenAI-Compatible Endpoint:**
```javascript
const LLM_CONFIG = {
model: "gpt-4-turbo-preview",
temperature: 0.8, // Creative but consistent
max_tokens: 1000, // Reasonable response length
top_p: 0.9, // Focused but diverse
frequency_penalty: 0.3, // Reduce repetition
presence_penalty: 0.2 // Encourage topic exploration
const VLLM_CONFIG = {
endpoint: "http://localhost:8000/v1", // Your vLLM server
model: "mistralai/Mistral-7B-Instruct-v0.2", // Or your preferred model
temperature: 0.7, // Creative yet grounded
max_tokens: 1500, // Max response length
top_p: 0.9, // Focused sampling
frequency_penalty: 0.2, // Reduce repetition
presence_penalty: 0.1 // Encourage topic exploration
};
```
@ -166,45 +167,43 @@ USER: Conversation history + current message + visual context (if any)
ASSISTANT: Previous Teto responses for consistency
```
### Vision Model Integration
### Vision Model Integration (vLLM Multi-modal)
**Model Stack:**
- **GPT-4 Vision** - Primary image understanding
- **CLIP** - Image-text similarity for context matching
- **Custom Fine-tuning** - Teto-specific visual preferences
- **Local Multi-modal Model** - (e.g., LLaVA, Idefics) served via `vLLM`
- **CLIP** - Local image-text similarity for context matching
- **Custom Fine-tuning** - Potential for Teto-specific visual preferences
**Processing Pipeline:**
```javascript
const processImage = async (imageUrl, conversationContext) => {
// Multi-model analysis for comprehensive understanding
const gpt4Analysis = await analyzeWithGPT4V(imageUrl);
const clipEmbedding = await getCLIPEmbedding(imageUrl);
// Local multi-modal analysis
const localAnalysis = await analyzeWithVLLM(imageUrl);
const clipEmbedding = await getLocalCLIPEmbedding(imageUrl);
const contextMatch = await findSimilarImages(clipEmbedding);
return {
description: gpt4Analysis.description,
emotions: gpt4Analysis.emotions,
description: localAnalysis.description,
emotions: localAnalysis.emotions,
relevantMemories: contextMatch,
responseStyle: determineResponseStyle(gpt4Analysis, conversationContext)
responseStyle: determineResponseStyle(localAnalysis, conversationContext)
};
};
```
### Voice Synthesis Setup
### Voice I/O Setup (Wyoming Protocol)
**ElevenLabs Configuration:**
**Piper TTS and Whisper STT via Wyoming:**
```javascript
const VOICE_CONFIG = {
voice_id: "kasane_teto_voice_clone",
model_id: "eleven_multilingual_v2",
stability: 0.75, // Consistent voice characteristics
similarity_boost: 0.8, // Maintain Teto's voice signature
style: 0.6, // Moderate emotional expression
use_speaker_boost: true // Enhanced clarity
const WYOMING_CONFIG = {
host: "localhost",
port: 10300,
piper_voice: "en_US-lessac-medium", // Or a custom-trained Teto voice
whisper_model: "base.en" // Or larger model depending on resources
};
```
### Memory System Architecture
### Memory System Architecture (Local)
**Vector Database Structure:**
```javascript
@ -324,10 +323,10 @@ const safetyPipeline = async (content, context) => {
### Privacy Protection
**Data Handling Principles:**
- **Local Memory Storage** - Conversation history stored locally, not sent to external services
- **Anonymized Analytics** - Usage patterns tracked without personal identifiers
- **Selective Context** - Only relevant conversation context sent to AI models
- **User Consent** - Clear communication about data usage and AI processing
- **Complete Privacy** - All data, including conversations, images, and voice, is processed locally.
- **No External Data Transfer** - AI processing does not require sending data to third-party services.
- **Full User Control** - Users have complete control over their data and the AI models.
- **User Consent** - Clear communication that all processing is done on the user's own hardware.
## 📊 Performance Optimization
@ -385,21 +384,18 @@ const processMessageAsync = async (message) => {
### Resource Management
**Model Loading Strategy:**
**Model Loading Strategy (for vLLM):**
```javascript
const MODEL_LOADING = {
// Keep language model always loaded
language_model: "persistent",
// Load vision model on demand
vision_model: "on_demand",
// Pre-load voice synthesis during voice channel activity
voice_synthesis: "predictive",
// Cache embeddings for frequent users
user_embeddings: "lru_cache"
// This is typically managed by the vLLM server instance itself.
// The configuration would involve which models to load on startup.
const VLLM_SERVER_ARGS = {
model: "mistralai/Mistral-7B-Instruct-v0.2",
"tensor-parallel-size": 1, // Or more depending on GPU count
"gpu-memory-utilization": 0.9, // Use 90% of GPU memory
"max-model-len": 4096,
};
// Wyoming services for Piper/Whisper are typically persistent.
```
## 🔧 Configuration & Customization
@ -443,14 +439,14 @@ const TUNABLE_PARAMETERS = {
const getModelConfig = (environment) => {
const configs = {
development: {
model: "gpt-3.5-turbo",
model: "local-dev-model/gguf", // Smaller model for dev
response_time_target: 3000,
logging_level: "debug",
cache_enabled: false
},
production: {
model: "gpt-4-turbo-preview",
model: "mistralai/Mistral-7B-Instruct-v0.2",
response_time_target: 1500,
logging_level: "info",
cache_enabled: true,

View file

@ -303,13 +303,12 @@ How long did this take you to create? I'm in awe! ✨"
**Example Response**:
```
🤖 **Teto Status Report**
💭 AI Systems: All operational!
🎤 Voice: Ready to chat in voice channels
👀 Vision: Image analysis active
🧠 Memory: 1,247 conversations remembered
💭 AI Systems: All local services operational!
🚀 vLLM: `mistralai/Mistral-7B-Instruct-v0.2` (Online)
🎤 Wyoming: Piper TTS & Whisper STT (Online)
🧠 Memory: Local Vector DB (1,247 conversations)
✨ Mood: Cheerful and energetic!
⏰ Been active for 3 hours today
🎵 Currently listening to: Lo-fi beats
```
---
@ -441,16 +440,16 @@ how you finally managed it!"
## ⚠️ Important Notes
### Privacy & Consent
- All interactions are processed through AI systems
- Conversation history is stored locally for continuity
- Visual content is analyzed but not permanently stored
- Voice interactions may be temporarily cached for processing
- All interactions are processed by your self-hosted AI stack. No data is sent to external third-party services.
- Conversation history is stored in your local vector database.
- Visual content is analyzed by your local multi-modal model and is not stored unless recorded.
- Voice is processed locally via the Wyoming protocol (Piper/Whisper).
### Limitations
- Response time varies with AI model load (typically 1-3 seconds)
- Complex image analysis may take slightly longer
- Voice synthesis has brief processing delay
- Memory system focuses on significant interactions
- Response time depends entirely on your local hardware (GPU, CPU, RAM).
- The quality and capabilities of Teto depend on the models you choose to run.
- Requires significant VRAM (8GB+ for basic models, 24GB+ for larger ones).
- Initial setup and configuration of the local AI stack can be complex.
### Ethics & Safety
- Teto is programmed to maintain appropriate, family-friendly interactions

View file

@ -0,0 +1,167 @@
# Docker Compose Examples for Local AI Stack
This document provides production-ready `docker-compose.yml` examples for setting up the self-hosted AI services required by the Teto AI Companion bot. These services should be included in the same `docker-compose.yml` file as the `teto_ai` bot service itself to ensure proper network communication.
> [!IMPORTANT]
> These examples require a host machine with an NVIDIA GPU and properly installed drivers. They use CDI (Container Device Interface) for GPU reservations, which is the modern standard for Docker.
## 🤖 vLLM Service (Language & Vision Model)
This service uses `vLLM` to serve a powerful language model with an OpenAI-compatible API endpoint. This allows Teto to perform natural language understanding and generation locally. If you use a multi-modal model, this service will also provide vision capabilities.
```yaml
services:
vllm-openai:
# This section reserves GPU resources for the container.
# It ensures vLLM has exclusive access to the NVIDIA GPUs.
deploy:
resources:
reservations:
devices:
- driver: cdi
device_ids: ['nvidia.com/gpu=all']
capabilities: ['gpu']
# Mount local directories for model weights and cache.
# This prevents re-downloading models on every container restart.
volumes:
- /path/to/your/llm_models/hf_cache:/root/.cache/huggingface
- /path/to/your/llm_models:/root/LLM_models
# Map the container's port 8000 to a host port (e.g., 11434).
# Your .env file should point to this host port.
ports:
- "11434:8000"
environment:
# (Optional) Add your Hugging Face token if needed for private models.
- HUGGING_FACE_HUB_TOKEN=your_hf_token_here
# Optimizes PyTorch memory allocation, can improve performance.
- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512,garbage_collection_threshold:0.8
# Necessary for multi-GPU communication and performance.
ipc: host
image: vllm/vllm-openai:latest
# --- vLLM Command Line Arguments ---
# These arguments configure how vLLM serves the model.
# Adjust them based on your model and hardware.
command: >
--model jeffcookio/Mistral-Small-3.2-24B-Instruct-2506-awq-sym
--tensor-parallel-size 2 # Number of GPUs to use.
--max-model-len 32256 # Maximum context length.
--limit-mm-per-prompt image=4 # For multi-modal models.
--enable-auto-tool-choice # For models that support tool use.
--tool-call-parser mistral
--enable-chunked-prefill
--disable-log-stats
--gpu-memory-utilization 0.75 # Use 75% of GPU VRAM.
--enable-prefix-caching
--max-num-seqs 4 # Max concurrent sequences.
--served-model-name Mistral-Small-3.2
```
### vLLM Configuration Notes
- **`--model`**: Specify the Hugging Face model identifier you want to serve.
- **`--tensor-parallel-size`**: Set this to the number of GPUs you want to use for a single model. For a single GPU, this should be `1`.
- **`--gpu-memory-utilization`**: Adjust this value based on your VRAM. `0.75` (75%) is a safe starting point.
- Check the [official vLLM documentation](https://docs.vllm.ai/en/latest/) for the latest command-line arguments and supported models.
## 🎤 Wyoming Voice Services (Piper TTS & Whisper STT)
These services provide Text-to-Speech (`Piper`) and Speech-to-Text (`Whisper`) capabilities over the `Wyoming` protocol. They run as separate containers but are managed within the same Docker Compose file.
```yaml
services:
# --- Whisper STT Service ---
# Converts speech from the voice channel into text for Teto to understand.
wyoming-whisper:
image: slackr31337/wyoming-whisper-gpu:latest
container_name: wyoming-whisper
environment:
# Configure the Whisper model size and language.
# Smaller models are faster but less accurate.
- MODEL=base-int8
- LANGUAGE=en
- COMPUTE_TYPE=int8
- BEAM_SIZE=5
ports:
# Exposes the Wyoming protocol port for Whisper.
- "10300:10300"
volumes:
# Mount a volume to persist Whisper model data.
- /path/to/your/whisper_data:/data
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: cdi
device_ids: ['nvidia.com/gpu=all']
capabilities: ['gpu']
# --- Piper TTS Service ---
# Converts Teto's text responses into speech.
wyoming-piper:
image: slackr31337/wyoming-piper-gpu:latest
container_name: wyoming-piper
environment:
# Specify which Piper voice model to use.
- PIPER_VOICE=en_US-amy-medium
ports:
# Exposes the Wyoming protocol port for Piper.
- "10200:10200"
volumes:
# Mount a volume to persist Piper voice models.
- /path/to/your/piper_data:/data
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: cdi
device_ids: ['nvidia.com/gpu=all']
capabilities: ['gpu']
```
### Wyoming Configuration Notes
- **Multiple Ports**: Note that `Whisper` and `Piper` listen on different ports (`10300` and `10200` in this example). Your bot's configuration will need to point to the correct service and port.
- **Voice Models**: You can download different `Piper` voice models and place them in your persistent data directory to change Teto's voice.
- **GPU Usage**: These images are for GPU-accelerated voice processing. If your GPU is dedicated to `vLLM`, you may consider using CPU-based images for Wyoming to conserve VRAM.
## 🌐 Networking
For the services to communicate with each other, they must share a Docker network. Using an external network is a good practice for managing complex applications.
```yaml
# Add this to the bottom of your docker-compose.yml file
networks:
backend:
external: true
```
Before starting your stack, create the network manually:
```bash
docker network create backend
```
Then, ensure each service in your `docker-compose.yml` (including the `teto_ai` bot) is attached to this network:
```yaml
services:
teto_ai:
# ... your bot's configuration
networks:
- backend
vllm-openai:
# ... vllm configuration
networks:
- backend
wyoming-whisper:
# ... whisper configuration
networks:
- backend
wyoming-piper:
# ... piper configuration
networks:
- backend
```
This allows the Teto bot to communicate with `vllm-openai`, `wyoming-whisper`, and `wyoming-piper` using their service names as hostnames.

View file

@ -5,16 +5,22 @@ This guide will walk you through setting up the Discord Teto Bot for video recor
## 📋 Prerequisites
### System Requirements
- **Operating System**: Linux, macOS, or Windows with WSL2
- **Docker**: Version 20.10+ and Docker Compose v2+
- **Disk Space**: Minimum 2GB for container, additional space for recordings
- **Memory**: 4GB RAM recommended (2GB minimum)
- **Network**: Stable internet connection for Discord API
- **Operating System**: Linux is strongly recommended for GPU support. Windows with WSL2 is possible.
- **GPU**: NVIDIA GPU with 8GB+ VRAM is required for local model hosting.
- **Docker**: Version 20.10+ and Docker Compose v2+.
- **Disk Space**: 20GB+ SSD for models and container images.
- **Memory**: 16GB+ RAM recommended.
- **Network**: Local network for inter-service communication.
### Discord Requirements
- Discord account with user token
- Server permissions to join voice channels
- Voice channel access where you want to record
- Discord account with user token.
- Server permissions to join voice channels.
- Voice channel access where you want to record.
### Local AI Requirements
- **LLM/VLM Model**: A downloaded language model compatible with `vLLM` (e.g., from Hugging Face).
- **TTS Voice Model**: A downloaded `Piper` voice model.
- **STT Model**: A downloaded `Whisper` model.
### Development Prerequisites (Optional)
- **Node.js**: Version 20+ for local development
@ -32,14 +38,20 @@ cd discord_teto
### Step 2: Environment Configuration
Create environment variables for your Discord token:
Create environment variables for your Discord token and local AI endpoints:
```bash
# Method 1: Export in terminal session
export USER_TOKEN="your_discord_user_token_here"
export VLLM_ENDPOINT="http://localhost:8000/v1"
export WYOMING_HOST="localhost"
export WYOMING_PORT="10300"
# Method 2: Create .env file (recommended)
echo "USER_TOKEN=your_discord_user_token_here" > .env
echo "VLLM_ENDPOINT=http://localhost:8000/v1" >> .env
echo "WYOMING_HOST=localhost" >> .env
echo "WYOMING_PORT=10300" >> .env
```
**Getting Your Discord Token:**
@ -50,24 +62,38 @@ echo "USER_TOKEN=your_discord_user_token_here" > .env
5. Look for requests to `discord.com/api`
6. Find Authorization header starting with your token
⚠️ **Security Warning**: Never share your Discord token publicly or commit it to version control.
⚠️ **Security Warning**: Never share your Discord token publicly or commit it to version control. The bot operates on a user token and has the same permissions as your user.
### Step 3: Directory Setup
Create the output directory for recordings:
### Step 3: Model & Directory Setup
1. **Create Directories**
Create directories for recordings and for your AI models.
```bash
mkdir -p output
chmod 755 output
mkdir -p output models/piper models/whisper models/llm
chmod 755 output models
```
This `models` directory will be mounted into your AI service containers.
This directory will be mounted into the Docker container to persist recordings.
2. **Download AI Models**
- **Language Model**: Download your chosen GGUF or other `vLLM`-compatible model and place it in `models/llm`.
- **Voice Model (Piper)**: Download a `.onnx` and `.json` voice file for Piper and place them in `models/piper`.
- **Speech-to-Text Model (Whisper)**: The Whisper service will download its model on first run, or you can pre-download it.
### Step 4: Docker Container Setup
This directory will be mounted into the Docker container to persist recordings and provide models to the AI services.
### Step 4: Local AI Stack & Bot Setup
This project uses a multi-container Docker setup for the bot and its local AI services. Your `docker-compose.yml` file should define services for:
- `teto_ai`: The bot itself.
- `vllm-openai`: The language model server, providing an OpenAI-compatible endpoint.
- `wyoming-piper`: The Text-to-Speech (TTS) service.
- `wyoming-whisper`: The Speech-to-Text (STT) service.
Below are sanitized, production-ready examples for these services. For full configuration details and explanations, please see the [Docker Compose Examples](docker-compose-examples.md) guide.
#### Production Setup
```bash
# Build and start the container
# Build and start all containers
docker compose up --build
# Or run in background
@ -110,16 +136,19 @@ docker compose -f docker-compose.dev.yml up --build --no-deps
### Environment Variables
Create a `.env` file in the project root:
Create a `.env` file in the project root to configure the bot and its connections to the local AI services:
```env
# Required
# Required: Discord Token
USER_TOKEN=your_discord_user_token
# Optional
BOT_CLIENT_ID=your_bot_application_id
BOT_CLIENT_SECRET=your_bot_secret
BOT_REDIRECT_URI=https://your-domain.com/auth/callback
# Required: Local AI Service Endpoints
VLLM_ENDPOINT="http://vllm:8000/v1" # Using Docker service name
VLLM_MODEL="mistralai/Mistral-7B-Instruct-v0.2" # Model served by vLLM
WYOMING_HOST="wyoming" # Using Docker service name
WYOMING_PORT="10300"
PIPER_VOICE="en_US-lessac-medium" # Voice model for Piper TTS
# Recording Settings (optional)
RECORDING_TIMEOUT=30000
@ -176,17 +205,14 @@ export const VIDEO_CONFIG = {
## 🔒 Security Considerations
### Token Security
- Store tokens in environment variables, never in code
- Use `.env` files for local development (add to `.gitignore`)
- Consider using Docker secrets for production deployments
- Rotate tokens regularly
### Data Privacy & Security
- **100% Local Processing**: All AI processing, including conversations, voice, and images, happens locally. No data is sent to external third-party services.
- **Token Security**: Your Discord token should still be kept secure in a `.env` file or Docker secrets. Never commit it to version control.
- **Network Isolation**: The AI services (`vLLM`, `Wyoming`) can be configured to only be accessible within the Docker network, preventing outside access.
### Container Security
- Bot runs as non-root user inside container
- Limited system capabilities (only SYS_ADMIN for Discord GUI)
- Isolated filesystem with specific volume mounts
- No network access beyond Discord API requirements
- The bot and AI services run as non-root users inside their respective containers.
- Filesystem access is limited via specific volume mounts for models and output.
### File Permissions
```bash
@ -200,6 +226,36 @@ chmod 644 ./output/*.mkv # For recorded files
## 🐛 Troubleshooting Setup Issues
### Local AI Service Issues
**1. vLLM Container Fails to Start**
```bash
# Check vLLM logs for errors
docker compose logs vllm
# Common issues:
# - Insufficient GPU VRAM for the selected model.
# - Incorrect model path or name.
# - CUDA driver issues on the host machine.
# - Forgetting to build with --pull to get the latest base image.
```
**2. Wyoming Service Not Responding**
```bash
# Check Wyoming protocol server logs
docker compose logs wyoming
# Common issues:
# - Incorrect path to Piper voice models.
# - Port conflicts on the host (port 10300).
# - Whisper model download failure on first run.
```
**3. Teto Bot Can't Connect to AI Services**
- Verify service names in your `.env` file match the service names in `docker-compose.yml` (e.g., `http://vllm:8000/v1`).
- Ensure all containers are on the same Docker network.
- Use `docker compose ps` to see if all containers are running and healthy.
### Common Installation Problems
**1. Docker not found**
@ -273,14 +329,22 @@ npm install
### Container Health
```bash
# Check container status
# Check status of all containers (bot, vllm, wyoming)
docker compose ps
# View resource usage
docker stats teto_ai
# View resource usage for all services
docker stats
# Monitor logs in real-time
docker compose logs -f
# Monitor logs for a specific service in real-time
docker compose logs -f vllm
docker compose logs -f wyoming
docker compose logs -f teto_ai
```
### GPU Resource Monitoring
```bash
# Monitor GPU VRAM and utilization on the host machine
watch -n 1 nvidia-smi
```
### Recording Status

View file

@ -28,7 +28,86 @@ docker inspect teto_ai | grep -A 5 "Mounts"
df -h ./output/
```
## 🐳 Docker Issues
## 🤖 Local AI Stack Issues
### vLLM Service Issues
**Problem**: The `vllm` container fails to start, crashes, or doesn't respond to requests.
**Diagnosis**:
```bash
# Check the vLLM container logs for CUDA errors, model loading issues, etc.
docker compose logs vllm
# Check GPU resource usage on the host
nvidia-smi
```
**Solutions**:
1. **Insufficient VRAM**:
- The most common issue. Check the model's VRAM requirements.
- **Solution**: Use a smaller model (e.g., a 7B model requires ~8-10GB VRAM) or upgrade your GPU.
2. **CUDA & Driver Mismatches**:
- The `vLLM` container requires a specific CUDA version on the host.
- **Solution**: Ensure your NVIDIA drivers are up-to-date and compatible with the CUDA version used in the `vLLM` Docker image.
3. **Incorrect Model Path or Name**:
- The container can't find the model weights.
- **Solution**: Verify the volume mount in `docker-compose.yml` points to the correct local directory containing your models. Double-check the model name in your `.env` file.
### Wyoming (Piper/Whisper) Service Issues
**Problem**: The `wyoming` container is running, but Teto cannot speak or understand voice commands.
**Diagnosis**:
```bash
# Check the Wyoming container logs for errors related to Piper or Whisper
docker compose logs wyoming
# Test the connection from another container
docker exec -it teto_ai nc -zv wyoming 10300
```
**Solutions**:
1. **Incorrect Piper Voice Model Path**:
- The service can't find the `.onnx` and `.json` files for the selected voice.
- **Solution**: Check your volume mounts and the voice name specified in your configuration.
2. **Whisper Model Download Failure**:
- On first run, the service may fail to download the Whisper model.
- **Solution**: Ensure the container has internet access for the initial download, or manually place the model in the correct volume.
3. **Port Conflict**:
- Another service on your host might be using port `10300`.
- **Solution**: Use `netstat -tulpn | grep 10300` to check for conflicts and remap the port in `docker-compose.yml` if needed.
### Bot Can't Connect to Local AI Services
**Problem**: The Teto bot is running but logs errors about being unable to reach `vllm` or `wyoming`.
**Diagnosis**:
```bash
# Check the Teto bot logs for connection refused errors
docker compose logs teto_ai
# Ensure all services are on the same Docker network
docker network inspect <your_network_name>
```
**Solutions**:
1. **Incorrect Endpoint Configuration**:
- The `.env` file points to the wrong service name or port.
- **Solution**: Ensure `VLLM_ENDPOINT` and `WYOMING_HOST` use the correct service names as defined in `docker-compose.yml` (e.g., `vllm`, `wyoming`).
2. **Docker Networking Issues**:
- The containers cannot resolve each other's service names.
- **Solution**: Ensure all services are defined within the same `docker-compose.yml` and share a common network.
## 🐳 General Docker Issues
### Container Won't Start