Updated the docs to focus on a local only stack instead of one relient on services like OpenAI, Eleven labs and so on.

This commit is contained in:
Mikolaj Wojciech Gorski 2025-07-26 14:26:18 +02:00
parent 44b45b7212
commit 2e94820164
7 changed files with 489 additions and 176 deletions

View file

@ -35,14 +35,18 @@ Kasane Teto is your server's AI companion who can:
## 🚀 Quick Start ## 🚀 Quick Start
> [!IMPORTANT]
> This project is designed to run exclusively within Docker containers. Bare-metal installation is not officially supported. All instructions assume a working Docker environment.
1. **Setup Environment** 1. **Setup Environment**
```bash ```bash
git clone <repository-url> git clone <repository-url>
cd discord_teto cd discord_teto
# Configure AI and Discord credentials # Configure Discord credentials & local AI endpoints
export USER_TOKEN="your_discord_token" export USER_TOKEN="your_discord_token"
export OPENAI_API_KEY="your_openai_key" # or other AI provider export VLLM_ENDPOINT="http://localhost:8000" # Or your vLLM server
export WYOMING_ENDPOINT="http://localhost:10300" # Or your Wyoming server
``` ```
2. **Start Teto** 2. **Start Teto**
@ -106,10 +110,11 @@ src/
``` ```
### AI Integration ### AI Integration
- **Language Model**: GPT-4/Claude/Local LLM for conversation - **Language Model**: Self-hosted LLM via `vLLM` (OpenAI compatible endpoint)
- **Vision Model**: CLIP/GPT-4V for image understanding - **Vision Model**: Multi-modal models served through `vLLM`
- **Voice Synthesis**: Eleven Labs/Azure Speech for Teto's voice - **Voice Synthesis**: `Piper` TTS via `Wyoming` protocol
- **Memory System**: Vector database for conversation history - **Speech Recognition**: `Whisper` STT via `Wyoming` protocol
- **Memory System**: Local vector database for conversation history
- **Personality Engine**: Custom prompt engineering for character consistency - **Personality Engine**: Custom prompt engineering for character consistency
## 🎭 Teto's Personality ## 🎭 Teto's Personality
@ -157,21 +162,19 @@ src/
## 🔧 Configuration ## 🔧 Configuration
### AI Provider Setup ### Local AI Provider Setup
```env ```env
# OpenAI (recommended) # Local vLLM Server (OpenAI Compatible)
OPENAI_API_KEY=your_openai_key VLLM_ENDPOINT="http://localhost:8000/v1"
OPENAI_MODEL=gpt-4-turbo-preview LOCAL_MODEL_NAME="mistralai/Mistral-7B-Instruct-v0.2" # Or your preferred model
# Alternative: Anthropic Claude # Wyoming Protocol for Voice (Piper TTS / Whisper STT)
ANTHROPIC_API_KEY=your_claude_key WYOMING_HOST="localhost"
WYOMING_PORT="10300"
PIPER_VOICE="en_US-lessac-medium"
# Voice Synthesis # Vision Capabilities are enabled if the vLLM model is multi-modal
ELEVENLABS_API_KEY=your_elevenlabs_key VISION_ENABLED=true
TETO_VOICE_ID=kasane_teto_voice_clone
# Vision Capabilities
VISION_MODEL=gpt-4-vision-preview
``` ```
### Personality Customization ### Personality Customization
@ -196,6 +199,8 @@ export const TETO_PERSONALITY = {
## 🐳 Docker Deployment ## 🐳 Docker Deployment
This project is officially supported for **Docker deployments only**. The container-first approach is critical for managing the complex local AI stack, ensuring that all services, dependencies, and configurations operate together consistently.
### Production Setup ### Production Setup
```bash ```bash
# Start Teto with all AI capabilities # Start Teto with all AI capabilities
@ -206,10 +211,11 @@ docker compose logs -f teto_ai
``` ```
### Resource Requirements ### Resource Requirements
- **Memory**: 4GB+ recommended for AI processing - **VRAM**: 8GB+ for 7B models, 24GB+ for larger models
- **CPU**: Multi-core for real-time AI inference - **Memory**: 16GB+ RAM recommended
- **Storage**: SSD recommended for fast model loading - **CPU**: Modern multi-core CPU
- **Network**: Stable connection for AI API calls - **Storage**: Fast SSD for model weights (15GB+ per model)
- **Network**: Local network for inter-service communication
## 🔐 Privacy & Ethics ## 🔐 Privacy & Ethics
@ -292,7 +298,7 @@ This project is for educational and community use. Please ensure compliance with
--- ---
**Version**: 3.0.0 (AI-Powered) **Version**: 3.0.0 (AI-Powered)
**AI Models**: GPT-4, CLIP, Eleven Labs **AI Stack**: Local-First (vLLM, Piper, Whisper)
**Runtime**: Node.js 20+ with Docker **Runtime**: Node.js 20+ with Docker
Bring Kasane Teto to life in your Discord server! 🎵✨ Bring Kasane Teto to life in your Discord server! 🎵✨

View file

@ -17,9 +17,9 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
## 📚 Documentation Structure ## 📚 Documentation Structure
### 🚀 Getting Started ### 🚀 Getting Started
- **[Setup Guide](setup.md)** - Complete installation and AI configuration - **[Setup Guide](setup.md)** - Complete installation and local AI stack configuration
- **[Quick Start](../README.md#quick-start)** - Get Teto running in 5 minutes - **[Quick Start](../README.md#quick-start)** - Get Teto running in 5 minutes
- **[Configuration](configuration.md)** - AI models, personality, and customization - **[Configuration](configuration.md)** - Local models, personality, and customization
### 💬 Interacting with Teto ### 💬 Interacting with Teto
- **[Conversation Guide](interactions.md)** - How to chat naturally with Teto - **[Conversation Guide](interactions.md)** - How to chat naturally with Teto
@ -28,10 +28,10 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
- **[Voice Interaction](voice.md)** - Speaking with Teto in voice channels - **[Voice Interaction](voice.md)** - Speaking with Teto in voice channels
### 🧠 AI Capabilities ### 🧠 AI Capabilities
- **[AI Architecture](ai-architecture.md)** - How Teto's AI systems work - **[AI Architecture](ai-architecture.md)** - How Teto's local AI systems work
- **[Vision System](vision.md)** - Image analysis and visual understanding - **[Vision System](vision.md)** - Image analysis with local multi-modal models
- **[Memory System](memory.md)** - How Teto remembers conversations - **[Memory System](memory.md)** - How Teto remembers conversations locally
- **[Personality Engine](personality-engine.md)** - Character consistency and roleplay +- **Personality Engine](personality-engine.md)** - Character consistency and roleplay
### 🔧 Technical Documentation ### 🔧 Technical Documentation
- **[Architecture Overview](architecture.md)** - System design and components - **[Architecture Overview](architecture.md)** - System design and components
@ -41,15 +41,15 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
### 🛠️ Operations & Support ### 🛠️ Operations & Support
- **[Troubleshooting](troubleshooting.md)** - Common issues and solutions - **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
- **[Performance Tuning](performance.md)** - Optimization for your server - **[Performance Tuning](performance.md)** - Optimizing your local AI stack
- **[Security & Privacy](security.md)** - Data handling and safety considerations +- **[Security & Privacy](security.md)** - Data handling and safety in a local-first setup
## 🎯 Quick Navigation by Use Case ## 🎯 Quick Navigation by Use Case
### "I want to set up Teto for the first time" ### "I want to set up Teto for the first time"
1. [Setup Guide](setup.md) - Installation and configuration 1. [Setup Guide](setup.md) - Installation and local AI stack configuration
2. [Configuration](configuration.md) - AI API keys and personality setup 2. [Configuration](configuration.md) - vLLM, Piper, and Whisper setup
3. [Docker Guide](docker.md) - Container deployment 3. [Docker Guide](docker.md) - Multi-container deployment for AI services
### "I want to understand how to interact with Teto" ### "I want to understand how to interact with Teto"
1. [Conversation Guide](interactions.md) - Natural chat examples 1. [Conversation Guide](interactions.md) - Natural chat examples
@ -58,7 +58,7 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
### "I want to understand Teto's capabilities" ### "I want to understand Teto's capabilities"
1. [Personality Guide](personality.md) - Character traits and style 1. [Personality Guide](personality.md) - Character traits and style
2. [Vision System](vision.md) - Image and video analysis 2. [Vision System](vision.md) - Image analysis with local models
3. [AI Architecture](ai-architecture.md) - Technical capabilities 3. [AI Architecture](ai-architecture.md) - Technical capabilities
### "I want to customize or develop features" ### "I want to customize or develop features"
@ -68,8 +68,8 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
### "I'm having issues or want to optimize" ### "I'm having issues or want to optimize"
1. [Troubleshooting](troubleshooting.md) - Problem solving 1. [Troubleshooting](troubleshooting.md) - Problem solving
2. [Performance Tuning](performance.md) - Optimization tips 2. [Performance Tuning](performance.md) - Optimizing your local AI stack
3. [Security & Privacy](security.md) - Best practices +- **[Security & Privacy](security.md)** - Best practices for a local-first setup
## 🌟 Key Features Overview ## 🌟 Key Features Overview
@ -94,11 +94,12 @@ Carefully crafted personality engine ensures Teto maintains consistent character
## 🔧 Technical Architecture ## 🔧 Technical Architecture
``` ```
Teto AI System Teto Local AI System
├── Language Model (GPT-4/Claude) # Natural conversation ├── Language Model (vLLM) # Self-hosted natural conversation
├── Vision Model (GPT-4V/CLIP) # Image/video analysis ├── Vision Model (vLLM Multi-modal) # Self-hosted image/video analysis
├── Voice Synthesis (ElevenLabs) # Speech generation ├── Voice Synthesis (Piper TTS) # Local speech generation via Wyoming
├── Memory System (Vector DB) # Conversation history ├── Speech Recognition (Whisper STT) # Local speech recognition via Wyoming
├── Memory System (Local Vector DB) # Local conversation history
├── Personality Engine # Character consistency ├── Personality Engine # Character consistency
└── Discord Integration # Platform interface └── Discord Integration # Platform interface
``` ```
@ -106,23 +107,24 @@ Teto AI System
## 📋 System Requirements ## 📋 System Requirements
### Minimum Requirements ### Minimum Requirements
- **RAM**: 4GB (AI model loading) - **VRAM**: 8GB+ for 7B models (required for `vLLM`)
- **CPU**: Multi-core (real-time inference) - **RAM**: 16GB+ (for models and system)
- **Storage**: 10GB (models and data) - **CPU**: Modern multi-core (for processing)
- **Network**: Stable connection (AI API calls) - **Storage**: 15GB+ SSD (for model weights)
- **Network**: Local network for inter-service communication
### Recommended Setup ### Recommended Setup
- **RAM**: 8GB+ for optimal performance - **VRAM**: 24GB+ for larger models or concurrent tasks
- **CPU**: Modern multi-core processor - **RAM**: 32GB+ for smoother operation
- **Storage**: SSD for fast model access - **Storage**: NVMe SSD for fast model loading
- **GPU**: Optional but beneficial for local inference - **GPU**: Required for `vLLM` and `Whisper`
## 🚦 Getting Started Checklist ## 🚦 Getting Started Checklist
- [ ] Read the [Setup Guide](setup.md) - [ ] Read the [Setup Guide](setup.md)
- [ ] Obtain necessary API keys (OpenAI, ElevenLabs, etc.) - [ ] Download required model weights (LLM, TTS, etc.)
- [ ] Configure Discord token and permissions - [ ] Configure local endpoints for `vLLM` and `Wyoming`
- [ ] Deploy using Docker or run locally - [ ] Deploy multi-container stack using Docker
- [ ] Customize personality settings - [ ] Customize personality settings
- [ ] Test basic conversation features - [ ] Test basic conversation features
- [ ] Explore voice and vision capabilities - [ ] Explore voice and vision capabilities
@ -143,12 +145,12 @@ See the [Development Guide](development.md) for detailed contribution guidelines
- **Technical Issues**: Check [Troubleshooting](troubleshooting.md) - **Technical Issues**: Check [Troubleshooting](troubleshooting.md)
- **Setup Problems**: Review [Setup Guide](setup.md) - **Setup Problems**: Review [Setup Guide](setup.md)
- **Feature Questions**: See [Commands Reference](commands.md) - **Feature Questions**: See [Commands Reference](commands.md)
- **AI Behavior**: Read [Personality Guide](personality.md) +- **AI Behavior**: Read [Personality Guide](personality.md)
### Best Practices ### Best Practices
- **Privacy First**: Always respect user consent and data privacy - **Privacy First**: All data is processed locally, ensuring maximum privacy
- **Appropriate Content**: Maintain family-friendly interactions - **Appropriate Content**: Maintain family-friendly interactions
- **Resource Management**: Monitor AI API usage and costs - **Resource Management**: Monitor local GPU and CPU usage
- **Community Guidelines**: Foster positive server environments - **Community Guidelines**: Foster positive server environments
## 📊 Documentation Stats ## 📊 Documentation Stats
@ -163,10 +165,10 @@ See the [Development Guide](development.md) for detailed contribution guidelines
The documentation will continue to evolve with new features: The documentation will continue to evolve with new features:
- **Advanced Memory Systems** - Long-term relationship building - **Advanced Memory Systems** - Long-term relationship building
- **Custom Voice Training** - Personalized Teto voice models - **Custom Voice Training** - Fine-tuning `Piper` for a unique Teto voice
- **Multi-Server Consistency** - Shared personality across servers - **Multi-Server Consistency** - Shared personality across servers
- **Game Integration** - Interactive gaming experiences - **Game Integration** - Interactive gaming experiences
- **Creative Tools** - Music and art generation capabilities - **Creative Tools** - Music and art generation with local models
--- ---

View file

@ -26,34 +26,34 @@ This document provides a comprehensive overview of how Kasane Teto's AI systems
### Core Components ### Core Components
**1. AI Orchestration Layer** **1. AI Orchestration Layer**
- Coordinates between different AI services - Coordinates between different local AI services
- Manages context flow and decision routing - Manages context flow and decision routing
- Handles multi-modal input integration - Handles multi-modal input integration
- Ensures personality consistency across modalities - Ensures personality consistency across modalities
**2. Language Model Integration** **2. Language Model Integration (vLLM)**
- Primary conversational intelligence (GPT-4/Claude) - Self-hosted conversational intelligence via `vLLM`
- Context-aware response generation - Context-aware response generation through OpenAI-compatible API
- Personality-guided prompt engineering - Personality-guided prompt engineering for local models
- Multi-turn conversation management - Multi-turn conversation management
**3. Vision Processing System** **3. Vision Processing System (vLLM Multi-modal)**
- Image analysis and understanding - Image analysis using local multi-modal models
- Video frame processing for streams - Video frame processing for streams
- Visual context integration with conversations - Visual context integration with conversations
- Automated response generation for visual content - Automated response generation for visual content
**4. Voice Synthesis & Recognition** **4. Voice Synthesis & Recognition (Wyoming Protocol)**
- Text-to-speech with Teto's voice characteristics - Text-to-speech using `Piper` for Teto's voice characteristics
- Speech-to-text for voice command processing - Speech-to-text using `Whisper` for voice command processing
- Emotional tone and inflection control - Emotional tone and inflection control via TTS models
- Real-time voice conversation capabilities - Real-time voice conversation capabilities
**5. Memory & Context System** **5. Memory & Context System (Local)**
- Long-term conversation history storage - Local long-term conversation history storage (e.g., ChromaDB)
- User preference and relationship tracking - User preference and relationship tracking
- Context retrieval for relevant conversations - Context retrieval for relevant conversations
- Semantic search across past interactions - Local semantic search across past interactions
**6. Personality Engine** **6. Personality Engine**
- Character consistency enforcement - Character consistency enforcement
@ -138,24 +138,25 @@ Image Upload → Image Processing → Vision Model → Context Integration → R
### Voice Interaction Flow ### Voice Interaction Flow
``` ```
Voice Channel Join → Audio Processing → Speech Recognition → Text Processing → Voice Synthesis → Audio Output Voice Channel Join → Audio Processing (Whisper) → Text Processing (vLLM) → Voice Synthesis (Piper) → Audio Output
↓ ↓
Noise Filtering → Intent Detection → LLM Response → Voice Cloning Noise Filtering → Intent Detection → LLM Response → Voice Model
``` ```
## 🧩 AI Service Integration ## 🧩 AI Service Integration
### Language Model Configuration ### Language Model Configuration (vLLM)
**Primary Model: GPT-4 Turbo** **vLLM with OpenAI-Compatible Endpoint:**
```javascript ```javascript
const LLM_CONFIG = { const VLLM_CONFIG = {
model: "gpt-4-turbo-preview", endpoint: "http://localhost:8000/v1", // Your vLLM server
temperature: 0.8, // Creative but consistent model: "mistralai/Mistral-7B-Instruct-v0.2", // Or your preferred model
max_tokens: 1000, // Reasonable response length temperature: 0.7, // Creative yet grounded
top_p: 0.9, // Focused but diverse max_tokens: 1500, // Max response length
frequency_penalty: 0.3, // Reduce repetition top_p: 0.9, // Focused sampling
presence_penalty: 0.2 // Encourage topic exploration frequency_penalty: 0.2, // Reduce repetition
presence_penalty: 0.1 // Encourage topic exploration
}; };
``` ```
@ -166,45 +167,43 @@ USER: Conversation history + current message + visual context (if any)
ASSISTANT: Previous Teto responses for consistency ASSISTANT: Previous Teto responses for consistency
``` ```
### Vision Model Integration ### Vision Model Integration (vLLM Multi-modal)
**Model Stack:** **Model Stack:**
- **GPT-4 Vision** - Primary image understanding - **Local Multi-modal Model** - (e.g., LLaVA, Idefics) served via `vLLM`
- **CLIP** - Image-text similarity for context matching - **CLIP** - Local image-text similarity for context matching
- **Custom Fine-tuning** - Teto-specific visual preferences - **Custom Fine-tuning** - Potential for Teto-specific visual preferences
**Processing Pipeline:** **Processing Pipeline:**
```javascript ```javascript
const processImage = async (imageUrl, conversationContext) => { const processImage = async (imageUrl, conversationContext) => {
// Multi-model analysis for comprehensive understanding // Local multi-modal analysis
const gpt4Analysis = await analyzeWithGPT4V(imageUrl); const localAnalysis = await analyzeWithVLLM(imageUrl);
const clipEmbedding = await getCLIPEmbedding(imageUrl); const clipEmbedding = await getLocalCLIPEmbedding(imageUrl);
const contextMatch = await findSimilarImages(clipEmbedding); const contextMatch = await findSimilarImages(clipEmbedding);
return { return {
description: gpt4Analysis.description, description: localAnalysis.description,
emotions: gpt4Analysis.emotions, emotions: localAnalysis.emotions,
relevantMemories: contextMatch, relevantMemories: contextMatch,
responseStyle: determineResponseStyle(gpt4Analysis, conversationContext) responseStyle: determineResponseStyle(localAnalysis, conversationContext)
}; };
}; };
``` ```
### Voice Synthesis Setup ### Voice I/O Setup (Wyoming Protocol)
**ElevenLabs Configuration:** **Piper TTS and Whisper STT via Wyoming:**
```javascript ```javascript
const VOICE_CONFIG = { const WYOMING_CONFIG = {
voice_id: "kasane_teto_voice_clone", host: "localhost",
model_id: "eleven_multilingual_v2", port: 10300,
stability: 0.75, // Consistent voice characteristics piper_voice: "en_US-lessac-medium", // Or a custom-trained Teto voice
similarity_boost: 0.8, // Maintain Teto's voice signature whisper_model: "base.en" // Or larger model depending on resources
style: 0.6, // Moderate emotional expression
use_speaker_boost: true // Enhanced clarity
}; };
``` ```
### Memory System Architecture ### Memory System Architecture (Local)
**Vector Database Structure:** **Vector Database Structure:**
```javascript ```javascript
@ -324,10 +323,10 @@ const safetyPipeline = async (content, context) => {
### Privacy Protection ### Privacy Protection
**Data Handling Principles:** **Data Handling Principles:**
- **Local Memory Storage** - Conversation history stored locally, not sent to external services - **Complete Privacy** - All data, including conversations, images, and voice, is processed locally.
- **Anonymized Analytics** - Usage patterns tracked without personal identifiers - **No External Data Transfer** - AI processing does not require sending data to third-party services.
- **Selective Context** - Only relevant conversation context sent to AI models - **Full User Control** - Users have complete control over their data and the AI models.
- **User Consent** - Clear communication about data usage and AI processing - **User Consent** - Clear communication that all processing is done on the user's own hardware.
## 📊 Performance Optimization ## 📊 Performance Optimization
@ -385,21 +384,18 @@ const processMessageAsync = async (message) => {
### Resource Management ### Resource Management
**Model Loading Strategy:** **Model Loading Strategy (for vLLM):**
```javascript ```javascript
const MODEL_LOADING = { // This is typically managed by the vLLM server instance itself.
// Keep language model always loaded // The configuration would involve which models to load on startup.
language_model: "persistent", const VLLM_SERVER_ARGS = {
model: "mistralai/Mistral-7B-Instruct-v0.2",
// Load vision model on demand "tensor-parallel-size": 1, // Or more depending on GPU count
vision_model: "on_demand", "gpu-memory-utilization": 0.9, // Use 90% of GPU memory
"max-model-len": 4096,
// Pre-load voice synthesis during voice channel activity
voice_synthesis: "predictive",
// Cache embeddings for frequent users
user_embeddings: "lru_cache"
}; };
// Wyoming services for Piper/Whisper are typically persistent.
``` ```
## 🔧 Configuration & Customization ## 🔧 Configuration & Customization
@ -443,14 +439,14 @@ const TUNABLE_PARAMETERS = {
const getModelConfig = (environment) => { const getModelConfig = (environment) => {
const configs = { const configs = {
development: { development: {
model: "gpt-3.5-turbo", model: "local-dev-model/gguf", // Smaller model for dev
response_time_target: 3000, response_time_target: 3000,
logging_level: "debug", logging_level: "debug",
cache_enabled: false cache_enabled: false
}, },
production: { production: {
model: "gpt-4-turbo-preview", model: "mistralai/Mistral-7B-Instruct-v0.2",
response_time_target: 1500, response_time_target: 1500,
logging_level: "info", logging_level: "info",
cache_enabled: true, cache_enabled: true,

View file

@ -303,13 +303,12 @@ How long did this take you to create? I'm in awe! ✨"
**Example Response**: **Example Response**:
``` ```
🤖 **Teto Status Report** 🤖 **Teto Status Report**
💭 AI Systems: All operational! 💭 AI Systems: All local services operational!
🎤 Voice: Ready to chat in voice channels 🚀 vLLM: `mistralai/Mistral-7B-Instruct-v0.2` (Online)
👀 Vision: Image analysis active 🎤 Wyoming: Piper TTS & Whisper STT (Online)
🧠 Memory: 1,247 conversations remembered 🧠 Memory: Local Vector DB (1,247 conversations)
✨ Mood: Cheerful and energetic! ✨ Mood: Cheerful and energetic!
⏰ Been active for 3 hours today ⏰ Been active for 3 hours today
🎵 Currently listening to: Lo-fi beats
``` ```
--- ---
@ -441,16 +440,16 @@ how you finally managed it!"
## ⚠️ Important Notes ## ⚠️ Important Notes
### Privacy & Consent ### Privacy & Consent
- All interactions are processed through AI systems - All interactions are processed by your self-hosted AI stack. No data is sent to external third-party services.
- Conversation history is stored locally for continuity - Conversation history is stored in your local vector database.
- Visual content is analyzed but not permanently stored - Visual content is analyzed by your local multi-modal model and is not stored unless recorded.
- Voice interactions may be temporarily cached for processing - Voice is processed locally via the Wyoming protocol (Piper/Whisper).
### Limitations ### Limitations
- Response time varies with AI model load (typically 1-3 seconds) - Response time depends entirely on your local hardware (GPU, CPU, RAM).
- Complex image analysis may take slightly longer - The quality and capabilities of Teto depend on the models you choose to run.
- Voice synthesis has brief processing delay - Requires significant VRAM (8GB+ for basic models, 24GB+ for larger ones).
- Memory system focuses on significant interactions - Initial setup and configuration of the local AI stack can be complex.
### Ethics & Safety ### Ethics & Safety
- Teto is programmed to maintain appropriate, family-friendly interactions - Teto is programmed to maintain appropriate, family-friendly interactions

View file

@ -0,0 +1,167 @@
# Docker Compose Examples for Local AI Stack
This document provides production-ready `docker-compose.yml` examples for setting up the self-hosted AI services required by the Teto AI Companion bot. These services should be included in the same `docker-compose.yml` file as the `teto_ai` bot service itself to ensure proper network communication.
> [!IMPORTANT]
> These examples require a host machine with an NVIDIA GPU and properly installed drivers. They use CDI (Container Device Interface) for GPU reservations, which is the modern standard for Docker.
## 🤖 vLLM Service (Language & Vision Model)
This service uses `vLLM` to serve a powerful language model with an OpenAI-compatible API endpoint. This allows Teto to perform natural language understanding and generation locally. If you use a multi-modal model, this service will also provide vision capabilities.
```yaml
services:
vllm-openai:
# This section reserves GPU resources for the container.
# It ensures vLLM has exclusive access to the NVIDIA GPUs.
deploy:
resources:
reservations:
devices:
- driver: cdi
device_ids: ['nvidia.com/gpu=all']
capabilities: ['gpu']
# Mount local directories for model weights and cache.
# This prevents re-downloading models on every container restart.
volumes:
- /path/to/your/llm_models/hf_cache:/root/.cache/huggingface
- /path/to/your/llm_models:/root/LLM_models
# Map the container's port 8000 to a host port (e.g., 11434).
# Your .env file should point to this host port.
ports:
- "11434:8000"
environment:
# (Optional) Add your Hugging Face token if needed for private models.
- HUGGING_FACE_HUB_TOKEN=your_hf_token_here
# Optimizes PyTorch memory allocation, can improve performance.
- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512,garbage_collection_threshold:0.8
# Necessary for multi-GPU communication and performance.
ipc: host
image: vllm/vllm-openai:latest
# --- vLLM Command Line Arguments ---
# These arguments configure how vLLM serves the model.
# Adjust them based on your model and hardware.
command: >
--model jeffcookio/Mistral-Small-3.2-24B-Instruct-2506-awq-sym
--tensor-parallel-size 2 # Number of GPUs to use.
--max-model-len 32256 # Maximum context length.
--limit-mm-per-prompt image=4 # For multi-modal models.
--enable-auto-tool-choice # For models that support tool use.
--tool-call-parser mistral
--enable-chunked-prefill
--disable-log-stats
--gpu-memory-utilization 0.75 # Use 75% of GPU VRAM.
--enable-prefix-caching
--max-num-seqs 4 # Max concurrent sequences.
--served-model-name Mistral-Small-3.2
```
### vLLM Configuration Notes
- **`--model`**: Specify the Hugging Face model identifier you want to serve.
- **`--tensor-parallel-size`**: Set this to the number of GPUs you want to use for a single model. For a single GPU, this should be `1`.
- **`--gpu-memory-utilization`**: Adjust this value based on your VRAM. `0.75` (75%) is a safe starting point.
- Check the [official vLLM documentation](https://docs.vllm.ai/en/latest/) for the latest command-line arguments and supported models.
## 🎤 Wyoming Voice Services (Piper TTS & Whisper STT)
These services provide Text-to-Speech (`Piper`) and Speech-to-Text (`Whisper`) capabilities over the `Wyoming` protocol. They run as separate containers but are managed within the same Docker Compose file.
```yaml
services:
# --- Whisper STT Service ---
# Converts speech from the voice channel into text for Teto to understand.
wyoming-whisper:
image: slackr31337/wyoming-whisper-gpu:latest
container_name: wyoming-whisper
environment:
# Configure the Whisper model size and language.
# Smaller models are faster but less accurate.
- MODEL=base-int8
- LANGUAGE=en
- COMPUTE_TYPE=int8
- BEAM_SIZE=5
ports:
# Exposes the Wyoming protocol port for Whisper.
- "10300:10300"
volumes:
# Mount a volume to persist Whisper model data.
- /path/to/your/whisper_data:/data
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: cdi
device_ids: ['nvidia.com/gpu=all']
capabilities: ['gpu']
# --- Piper TTS Service ---
# Converts Teto's text responses into speech.
wyoming-piper:
image: slackr31337/wyoming-piper-gpu:latest
container_name: wyoming-piper
environment:
# Specify which Piper voice model to use.
- PIPER_VOICE=en_US-amy-medium
ports:
# Exposes the Wyoming protocol port for Piper.
- "10200:10200"
volumes:
# Mount a volume to persist Piper voice models.
- /path/to/your/piper_data:/data
restart: unless-stopped
deploy:
resources:
reservations:
devices:
- driver: cdi
device_ids: ['nvidia.com/gpu=all']
capabilities: ['gpu']
```
### Wyoming Configuration Notes
- **Multiple Ports**: Note that `Whisper` and `Piper` listen on different ports (`10300` and `10200` in this example). Your bot's configuration will need to point to the correct service and port.
- **Voice Models**: You can download different `Piper` voice models and place them in your persistent data directory to change Teto's voice.
- **GPU Usage**: These images are for GPU-accelerated voice processing. If your GPU is dedicated to `vLLM`, you may consider using CPU-based images for Wyoming to conserve VRAM.
## 🌐 Networking
For the services to communicate with each other, they must share a Docker network. Using an external network is a good practice for managing complex applications.
```yaml
# Add this to the bottom of your docker-compose.yml file
networks:
backend:
external: true
```
Before starting your stack, create the network manually:
```bash
docker network create backend
```
Then, ensure each service in your `docker-compose.yml` (including the `teto_ai` bot) is attached to this network:
```yaml
services:
teto_ai:
# ... your bot's configuration
networks:
- backend
vllm-openai:
# ... vllm configuration
networks:
- backend
wyoming-whisper:
# ... whisper configuration
networks:
- backend
wyoming-piper:
# ... piper configuration
networks:
- backend
```
This allows the Teto bot to communicate with `vllm-openai`, `wyoming-whisper`, and `wyoming-piper` using their service names as hostnames.

View file

@ -5,16 +5,22 @@ This guide will walk you through setting up the Discord Teto Bot for video recor
## 📋 Prerequisites ## 📋 Prerequisites
### System Requirements ### System Requirements
- **Operating System**: Linux, macOS, or Windows with WSL2 - **Operating System**: Linux is strongly recommended for GPU support. Windows with WSL2 is possible.
- **Docker**: Version 20.10+ and Docker Compose v2+ - **GPU**: NVIDIA GPU with 8GB+ VRAM is required for local model hosting.
- **Disk Space**: Minimum 2GB for container, additional space for recordings - **Docker**: Version 20.10+ and Docker Compose v2+.
- **Memory**: 4GB RAM recommended (2GB minimum) - **Disk Space**: 20GB+ SSD for models and container images.
- **Network**: Stable internet connection for Discord API - **Memory**: 16GB+ RAM recommended.
- **Network**: Local network for inter-service communication.
### Discord Requirements ### Discord Requirements
- Discord account with user token - Discord account with user token.
- Server permissions to join voice channels - Server permissions to join voice channels.
- Voice channel access where you want to record - Voice channel access where you want to record.
### Local AI Requirements
- **LLM/VLM Model**: A downloaded language model compatible with `vLLM` (e.g., from Hugging Face).
- **TTS Voice Model**: A downloaded `Piper` voice model.
- **STT Model**: A downloaded `Whisper` model.
### Development Prerequisites (Optional) ### Development Prerequisites (Optional)
- **Node.js**: Version 20+ for local development - **Node.js**: Version 20+ for local development
@ -32,14 +38,20 @@ cd discord_teto
### Step 2: Environment Configuration ### Step 2: Environment Configuration
Create environment variables for your Discord token: Create environment variables for your Discord token and local AI endpoints:
```bash ```bash
# Method 1: Export in terminal session # Method 1: Export in terminal session
export USER_TOKEN="your_discord_user_token_here" export USER_TOKEN="your_discord_user_token_here"
export VLLM_ENDPOINT="http://localhost:8000/v1"
export WYOMING_HOST="localhost"
export WYOMING_PORT="10300"
# Method 2: Create .env file (recommended) # Method 2: Create .env file (recommended)
echo "USER_TOKEN=your_discord_user_token_here" > .env echo "USER_TOKEN=your_discord_user_token_here" > .env
echo "VLLM_ENDPOINT=http://localhost:8000/v1" >> .env
echo "WYOMING_HOST=localhost" >> .env
echo "WYOMING_PORT=10300" >> .env
``` ```
**Getting Your Discord Token:** **Getting Your Discord Token:**
@ -50,24 +62,38 @@ echo "USER_TOKEN=your_discord_user_token_here" > .env
5. Look for requests to `discord.com/api` 5. Look for requests to `discord.com/api`
6. Find Authorization header starting with your token 6. Find Authorization header starting with your token
⚠️ **Security Warning**: Never share your Discord token publicly or commit it to version control. ⚠️ **Security Warning**: Never share your Discord token publicly or commit it to version control. The bot operates on a user token and has the same permissions as your user.
### Step 3: Directory Setup ### Step 3: Model & Directory Setup
Create the output directory for recordings: 1. **Create Directories**
Create directories for recordings and for your AI models.
```bash
mkdir -p output models/piper models/whisper models/llm
chmod 755 output models
```
This `models` directory will be mounted into your AI service containers.
```bash 2. **Download AI Models**
mkdir -p output - **Language Model**: Download your chosen GGUF or other `vLLM`-compatible model and place it in `models/llm`.
chmod 755 output - **Voice Model (Piper)**: Download a `.onnx` and `.json` voice file for Piper and place them in `models/piper`.
``` - **Speech-to-Text Model (Whisper)**: The Whisper service will download its model on first run, or you can pre-download it.
This directory will be mounted into the Docker container to persist recordings. This directory will be mounted into the Docker container to persist recordings and provide models to the AI services.
### Step 4: Docker Container Setup ### Step 4: Local AI Stack & Bot Setup
This project uses a multi-container Docker setup for the bot and its local AI services. Your `docker-compose.yml` file should define services for:
- `teto_ai`: The bot itself.
- `vllm-openai`: The language model server, providing an OpenAI-compatible endpoint.
- `wyoming-piper`: The Text-to-Speech (TTS) service.
- `wyoming-whisper`: The Speech-to-Text (STT) service.
Below are sanitized, production-ready examples for these services. For full configuration details and explanations, please see the [Docker Compose Examples](docker-compose-examples.md) guide.
#### Production Setup #### Production Setup
```bash ```bash
# Build and start the container # Build and start all containers
docker compose up --build docker compose up --build
# Or run in background # Or run in background
@ -110,16 +136,19 @@ docker compose -f docker-compose.dev.yml up --build --no-deps
### Environment Variables ### Environment Variables
Create a `.env` file in the project root: Create a `.env` file in the project root to configure the bot and its connections to the local AI services:
```env ```env
# Required # Required: Discord Token
USER_TOKEN=your_discord_user_token USER_TOKEN=your_discord_user_token
# Optional # Required: Local AI Service Endpoints
BOT_CLIENT_ID=your_bot_application_id VLLM_ENDPOINT="http://vllm:8000/v1" # Using Docker service name
BOT_CLIENT_SECRET=your_bot_secret VLLM_MODEL="mistralai/Mistral-7B-Instruct-v0.2" # Model served by vLLM
BOT_REDIRECT_URI=https://your-domain.com/auth/callback
WYOMING_HOST="wyoming" # Using Docker service name
WYOMING_PORT="10300"
PIPER_VOICE="en_US-lessac-medium" # Voice model for Piper TTS
# Recording Settings (optional) # Recording Settings (optional)
RECORDING_TIMEOUT=30000 RECORDING_TIMEOUT=30000
@ -176,17 +205,14 @@ export const VIDEO_CONFIG = {
## 🔒 Security Considerations ## 🔒 Security Considerations
### Token Security ### Data Privacy & Security
- Store tokens in environment variables, never in code - **100% Local Processing**: All AI processing, including conversations, voice, and images, happens locally. No data is sent to external third-party services.
- Use `.env` files for local development (add to `.gitignore`) - **Token Security**: Your Discord token should still be kept secure in a `.env` file or Docker secrets. Never commit it to version control.
- Consider using Docker secrets for production deployments - **Network Isolation**: The AI services (`vLLM`, `Wyoming`) can be configured to only be accessible within the Docker network, preventing outside access.
- Rotate tokens regularly
### Container Security ### Container Security
- Bot runs as non-root user inside container - The bot and AI services run as non-root users inside their respective containers.
- Limited system capabilities (only SYS_ADMIN for Discord GUI) - Filesystem access is limited via specific volume mounts for models and output.
- Isolated filesystem with specific volume mounts
- No network access beyond Discord API requirements
### File Permissions ### File Permissions
```bash ```bash
@ -200,6 +226,36 @@ chmod 644 ./output/*.mkv # For recorded files
## 🐛 Troubleshooting Setup Issues ## 🐛 Troubleshooting Setup Issues
### Local AI Service Issues
**1. vLLM Container Fails to Start**
```bash
# Check vLLM logs for errors
docker compose logs vllm
# Common issues:
# - Insufficient GPU VRAM for the selected model.
# - Incorrect model path or name.
# - CUDA driver issues on the host machine.
# - Forgetting to build with --pull to get the latest base image.
```
**2. Wyoming Service Not Responding**
```bash
# Check Wyoming protocol server logs
docker compose logs wyoming
# Common issues:
# - Incorrect path to Piper voice models.
# - Port conflicts on the host (port 10300).
# - Whisper model download failure on first run.
```
**3. Teto Bot Can't Connect to AI Services**
- Verify service names in your `.env` file match the service names in `docker-compose.yml` (e.g., `http://vllm:8000/v1`).
- Ensure all containers are on the same Docker network.
- Use `docker compose ps` to see if all containers are running and healthy.
### Common Installation Problems ### Common Installation Problems
**1. Docker not found** **1. Docker not found**
@ -273,14 +329,22 @@ npm install
### Container Health ### Container Health
```bash ```bash
# Check container status # Check status of all containers (bot, vllm, wyoming)
docker compose ps docker compose ps
# View resource usage # View resource usage for all services
docker stats teto_ai docker stats
# Monitor logs in real-time # Monitor logs for a specific service in real-time
docker compose logs -f docker compose logs -f vllm
docker compose logs -f wyoming
docker compose logs -f teto_ai
```
### GPU Resource Monitoring
```bash
# Monitor GPU VRAM and utilization on the host machine
watch -n 1 nvidia-smi
``` ```
### Recording Status ### Recording Status

View file

@ -28,7 +28,86 @@ docker inspect teto_ai | grep -A 5 "Mounts"
df -h ./output/ df -h ./output/
``` ```
## 🐳 Docker Issues ## 🤖 Local AI Stack Issues
### vLLM Service Issues
**Problem**: The `vllm` container fails to start, crashes, or doesn't respond to requests.
**Diagnosis**:
```bash
# Check the vLLM container logs for CUDA errors, model loading issues, etc.
docker compose logs vllm
# Check GPU resource usage on the host
nvidia-smi
```
**Solutions**:
1. **Insufficient VRAM**:
- The most common issue. Check the model's VRAM requirements.
- **Solution**: Use a smaller model (e.g., a 7B model requires ~8-10GB VRAM) or upgrade your GPU.
2. **CUDA & Driver Mismatches**:
- The `vLLM` container requires a specific CUDA version on the host.
- **Solution**: Ensure your NVIDIA drivers are up-to-date and compatible with the CUDA version used in the `vLLM` Docker image.
3. **Incorrect Model Path or Name**:
- The container can't find the model weights.
- **Solution**: Verify the volume mount in `docker-compose.yml` points to the correct local directory containing your models. Double-check the model name in your `.env` file.
### Wyoming (Piper/Whisper) Service Issues
**Problem**: The `wyoming` container is running, but Teto cannot speak or understand voice commands.
**Diagnosis**:
```bash
# Check the Wyoming container logs for errors related to Piper or Whisper
docker compose logs wyoming
# Test the connection from another container
docker exec -it teto_ai nc -zv wyoming 10300
```
**Solutions**:
1. **Incorrect Piper Voice Model Path**:
- The service can't find the `.onnx` and `.json` files for the selected voice.
- **Solution**: Check your volume mounts and the voice name specified in your configuration.
2. **Whisper Model Download Failure**:
- On first run, the service may fail to download the Whisper model.
- **Solution**: Ensure the container has internet access for the initial download, or manually place the model in the correct volume.
3. **Port Conflict**:
- Another service on your host might be using port `10300`.
- **Solution**: Use `netstat -tulpn | grep 10300` to check for conflicts and remap the port in `docker-compose.yml` if needed.
### Bot Can't Connect to Local AI Services
**Problem**: The Teto bot is running but logs errors about being unable to reach `vllm` or `wyoming`.
**Diagnosis**:
```bash
# Check the Teto bot logs for connection refused errors
docker compose logs teto_ai
# Ensure all services are on the same Docker network
docker network inspect <your_network_name>
```
**Solutions**:
1. **Incorrect Endpoint Configuration**:
- The `.env` file points to the wrong service name or port.
- **Solution**: Ensure `VLLM_ENDPOINT` and `WYOMING_HOST` use the correct service names as defined in `docker-compose.yml` (e.g., `vllm`, `wyoming`).
2. **Docker Networking Issues**:
- The containers cannot resolve each other's service names.
- **Solution**: Ensure all services are defined within the same `docker-compose.yml` and share a common network.
## 🐳 General Docker Issues
### Container Won't Start ### Container Won't Start