Updated the docs to focus on a local only stack instead of one relient on services like OpenAI, Eleven labs and so on.

2025-07-26 14:26:18 +02:00 · 2025-07-26 14:26:18 +02:00 · 2e94820164
commit 2e94820164
parent 44b45b7212
7 changed files with 489 additions and 176 deletions
--- a/README.md
+++ b/README.md
@ -35,14 +35,18 @@ Kasane Teto is your server's AI companion who can:
 ## 🚀 Quick Start
 > [!IMPORTANT]
 > This project is designed to run exclusively within Docker containers. Bare-metal installation is not officially supported. All instructions assume a working Docker environment.
 1. **Setup Environment**
   ```bash
   git clone <repository-url>
   cd discord_teto
-   # Configure AI and Discord credentials
+   # Configure Discord credentials & local AI endpoints
   export USER_TOKEN="your_discord_token"
-   export OPENAI_API_KEY="your_openai_key"  # or other AI provider
+   export VLLM_ENDPOINT="http://localhost:8000" # Or your vLLM server
   export WYOMING_ENDPOINT="http://localhost:10300" # Or your Wyoming server
   ```
 2. **Start Teto**
@ -106,10 +110,11 @@ src/
 ```
 ### AI Integration
- **Language Model**: GPT-4/Claude/Local LLM for conversation
+- **Language Model**: Self-hosted LLM via `vLLM` (OpenAI compatible endpoint)
- **Vision Model**: CLIP/GPT-4V for image understanding
+- **Vision Model**: Multi-modal models served through `vLLM`
- **Voice Synthesis**: Eleven Labs/Azure Speech for Teto's voice
+- **Voice Synthesis**: `Piper` TTS via `Wyoming` protocol
- **Memory System**: Vector database for conversation history
+- **Speech Recognition**: `Whisper` STT via `Wyoming` protocol
 - **Memory System**: Local vector database for conversation history
 - **Personality Engine**: Custom prompt engineering for character consistency
 ## 🎭 Teto's Personality
@ -157,21 +162,19 @@ src/
 ## 🔧 Configuration
-### AI Provider Setup
+### Local AI Provider Setup
 ```env
-# OpenAI (recommended)
+# Local vLLM Server (OpenAI Compatible)
-OPENAI_API_KEY=your_openai_key
+VLLM_ENDPOINT="http://localhost:8000/v1"
-OPENAI_MODEL=gpt-4-turbo-preview
+LOCAL_MODEL_NAME="mistralai/Mistral-7B-Instruct-v0.2" # Or your preferred model
-# Alternative: Anthropic Claude
+# Wyoming Protocol for Voice (Piper TTS / Whisper STT)
-ANTHROPIC_API_KEY=your_claude_key
+WYOMING_HOST="localhost"
 WYOMING_PORT="10300"
 PIPER_VOICE="en_US-lessac-medium"
-# Voice Synthesis
+# Vision Capabilities are enabled if the vLLM model is multi-modal
-ELEVENLABS_API_KEY=your_elevenlabs_key
+VISION_ENABLED=true
 TETO_VOICE_ID=kasane_teto_voice_clone
 # Vision Capabilities  
 VISION_MODEL=gpt-4-vision-preview
 ```
 ### Personality Customization
@ -196,6 +199,8 @@ export const TETO_PERSONALITY = {
 ## 🐳 Docker Deployment
 This project is officially supported for **Docker deployments only**. The container-first approach is critical for managing the complex local AI stack, ensuring that all services, dependencies, and configurations operate together consistently.
 ### Production Setup
 ```bash
 # Start Teto with all AI capabilities
@ -206,10 +211,11 @@ docker compose logs -f teto_ai
 ```
 ### Resource Requirements
- **Memory**: 4GB+ recommended for AI processing
+- **VRAM**: 8GB+ for 7B models, 24GB+ for larger models
- **CPU**: Multi-core for real-time AI inference
+- **Memory**: 16GB+ RAM recommended
- **Storage**: SSD recommended for fast model loading
+- **CPU**: Modern multi-core CPU
- **Network**: Stable connection for AI API calls
+- **Storage**: Fast SSD for model weights (15GB+ per model)
 - **Network**: Local network for inter-service communication
 ## 🔐 Privacy & Ethics
@ -292,7 +298,7 @@ This project is for educational and community use. Please ensure compliance with
 ---
 **Version**: 3.0.0 (AI-Powered)  
-**AI Models**: GPT-4, CLIP, Eleven Labs  
+**AI Stack**: Local-First (vLLM, Piper, Whisper)
 **Runtime**: Node.js 20+ with Docker  
 Bring Kasane Teto to life in your Discord server! 🎵✨
--- a/docs/README.md
+++ b/docs/README.md
@ -17,9 +17,9 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
 ## 📚 Documentation Structure
 ### 🚀 Getting Started
- **[Setup Guide](setup.md)** - Complete installation and AI configuration
+- **[Setup Guide](setup.md)** - Complete installation and local AI stack configuration
 - **[Quick Start](../README.md#quick-start)** - Get Teto running in 5 minutes
- **[Configuration](configuration.md)** - AI models, personality, and customization
+- **[Configuration](configuration.md)** - Local models, personality, and customization
 ### 💬 Interacting with Teto
 - **[Conversation Guide](interactions.md)** - How to chat naturally with Teto
@ -28,10 +28,10 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
 - **[Voice Interaction](voice.md)** - Speaking with Teto in voice channels
 ### 🧠 AI Capabilities
- **[AI Architecture](ai-architecture.md)** - How Teto's AI systems work
+- **[AI Architecture](ai-architecture.md)** - How Teto's local AI systems work
- **[Vision System](vision.md)** - Image analysis and visual understanding
+- **[Vision System](vision.md)** - Image analysis with local multi-modal models
- **[Memory System](memory.md)** - How Teto remembers conversations
+- **[Memory System](memory.md)** - How Teto remembers conversations locally
- **[Personality Engine](personality-engine.md)** - Character consistency and roleplay
+- **Personality Engine](personality-engine.md)** - Character consistency and roleplay
 ### 🔧 Technical Documentation
 - **[Architecture Overview](architecture.md)** - System design and components
@ -41,15 +41,15 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
 ### 🛠️ Operations & Support
 - **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
- **[Performance Tuning](performance.md)** - Optimization for your server
+- **[Performance Tuning](performance.md)** - Optimizing your local AI stack
- **[Security & Privacy](security.md)** - Data handling and safety considerations
+- **[Security & Privacy](security.md)** - Data handling and safety in a local-first setup
 ## 🎯 Quick Navigation by Use Case
 ### "I want to set up Teto for the first time"
-1. [Setup Guide](setup.md) - Installation and configuration
+1. [Setup Guide](setup.md) - Installation and local AI stack configuration
-2. [Configuration](configuration.md) - AI API keys and personality setup
+2. [Configuration](configuration.md) - vLLM, Piper, and Whisper setup
-3. [Docker Guide](docker.md) - Container deployment
+3. [Docker Guide](docker.md) - Multi-container deployment for AI services
 ### "I want to understand how to interact with Teto"
 1. [Conversation Guide](interactions.md) - Natural chat examples
@ -58,7 +58,7 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
 ### "I want to understand Teto's capabilities"
 1. [Personality Guide](personality.md) - Character traits and style
-2. [Vision System](vision.md) - Image and video analysis
+2. [Vision System](vision.md) - Image analysis with local models
 3. [AI Architecture](ai-architecture.md) - Technical capabilities
 ### "I want to customize or develop features"
@ -68,8 +68,8 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
 ### "I'm having issues or want to optimize"
 1. [Troubleshooting](troubleshooting.md) - Problem solving
-2. [Performance Tuning](performance.md) - Optimization tips
+2. [Performance Tuning](performance.md) - Optimizing your local AI stack
-3. [Security & Privacy](security.md) - Best practices
+- **[Security & Privacy](security.md)** - Best practices for a local-first setup
 ## 🌟 Key Features Overview
@ -94,11 +94,12 @@ Carefully crafted personality engine ensures Teto maintains consistent character
 ## 🔧 Technical Architecture
 ```
-Teto AI System
+Teto Local AI System
-├── Language Model (GPT-4/Claude)    # Natural conversation
+├── Language Model (vLLM)            # Self-hosted natural conversation
-├── Vision Model (GPT-4V/CLIP)       # Image/video analysis  
+├── Vision Model (vLLM Multi-modal)  # Self-hosted image/video analysis  
-├── Voice Synthesis (ElevenLabs)     # Speech generation
+├── Voice Synthesis (Piper TTS)      # Local speech generation via Wyoming
-├── Memory System (Vector DB)        # Conversation history
+├── Speech Recognition (Whisper STT) # Local speech recognition via Wyoming
 ├── Memory System (Local Vector DB)  # Local conversation history
 ├── Personality Engine               # Character consistency
 └── Discord Integration              # Platform interface
 ```
@ -106,23 +107,24 @@ Teto AI System
 ## 📋 System Requirements
 ### Minimum Requirements
- **RAM**: 4GB (AI model loading)
+- **VRAM**: 8GB+ for 7B models (required for `vLLM`)
- **CPU**: Multi-core (real-time inference)
+- **RAM**: 16GB+ (for models and system)
- **Storage**: 10GB (models and data)
+- **CPU**: Modern multi-core (for processing)
- **Network**: Stable connection (AI API calls)
+- **Storage**: 15GB+ SSD (for model weights)
 - **Network**: Local network for inter-service communication
 ### Recommended Setup
- **RAM**: 8GB+ for optimal performance
+- **VRAM**: 24GB+ for larger models or concurrent tasks
- **CPU**: Modern multi-core processor
+- **RAM**: 32GB+ for smoother operation
- **Storage**: SSD for fast model access
+- **Storage**: NVMe SSD for fast model loading
- **GPU**: Optional but beneficial for local inference
+- **GPU**: Required for `vLLM` and `Whisper`
 ## 🚦 Getting Started Checklist
 - [ ] Read the [Setup Guide](setup.md)
- [ ] Obtain necessary API keys (OpenAI, ElevenLabs, etc.)
+- [ ] Download required model weights (LLM, TTS, etc.)
- [ ] Configure Discord token and permissions
+- [ ] Configure local endpoints for `vLLM` and `Wyoming`
- [ ] Deploy using Docker or run locally
+- [ ] Deploy multi-container stack using Docker
 - [ ] Customize personality settings
 - [ ] Test basic conversation features
 - [ ] Explore voice and vision capabilities
@ -143,12 +145,12 @@ See the [Development Guide](development.md) for detailed contribution guidelines
 - **Technical Issues**: Check [Troubleshooting](troubleshooting.md)
 - **Setup Problems**: Review [Setup Guide](setup.md)
 - **Feature Questions**: See [Commands Reference](commands.md)
- **AI Behavior**: Read [Personality Guide](personality.md)
+- **AI Behavior**: Read [Personality Guide](personality.md)
 ### Best Practices
- **Privacy First**: Always respect user consent and data privacy
+- **Privacy First**: All data is processed locally, ensuring maximum privacy
 - **Appropriate Content**: Maintain family-friendly interactions
- **Resource Management**: Monitor AI API usage and costs
+- **Resource Management**: Monitor local GPU and CPU usage
 - **Community Guidelines**: Foster positive server environments
 ## 📊 Documentation Stats
@ -163,10 +165,10 @@ See the [Development Guide](development.md) for detailed contribution guidelines
 The documentation will continue to evolve with new features:
 - **Advanced Memory Systems** - Long-term relationship building
- **Custom Voice Training** - Personalized Teto voice models  
+- **Custom Voice Training** - Fine-tuning `Piper` for a unique Teto voice
 - **Multi-Server Consistency** - Shared personality across servers
 - **Game Integration** - Interactive gaming experiences
- **Creative Tools** - Music and art generation capabilities
+- **Creative Tools** - Music and art generation with local models
 ---
--- a/docs/ai-architecture.md
+++ b/docs/ai-architecture.md
@ -26,34 +26,34 @@ This document provides a comprehensive overview of how Kasane Teto's AI systems
 ### Core Components
 **1. AI Orchestration Layer**
- Coordinates between different AI services
+- Coordinates between different local AI services
 - Manages context flow and decision routing
 - Handles multi-modal input integration
 - Ensures personality consistency across modalities
-**2. Language Model Integration**
+**2. Language Model Integration (vLLM)**
- Primary conversational intelligence (GPT-4/Claude)
+- Self-hosted conversational intelligence via `vLLM`
- Context-aware response generation
+- Context-aware response generation through OpenAI-compatible API
- Personality-guided prompt engineering
+- Personality-guided prompt engineering for local models
 - Multi-turn conversation management
-**3. Vision Processing System**
+**3. Vision Processing System (vLLM Multi-modal)**
- Image analysis and understanding
+- Image analysis using local multi-modal models
 - Video frame processing for streams
 - Visual context integration with conversations
 - Automated response generation for visual content
-**4. Voice Synthesis & Recognition**
+**4. Voice Synthesis & Recognition (Wyoming Protocol)**
- Text-to-speech with Teto's voice characteristics
+- Text-to-speech using `Piper` for Teto's voice characteristics
- Speech-to-text for voice command processing
+- Speech-to-text using `Whisper` for voice command processing
- Emotional tone and inflection control
+- Emotional tone and inflection control via TTS models
 - Real-time voice conversation capabilities
-**5. Memory & Context System**
+**5. Memory & Context System (Local)**
- Long-term conversation history storage
+- Local long-term conversation history storage (e.g., ChromaDB)
 - User preference and relationship tracking
 - Context retrieval for relevant conversations
- Semantic search across past interactions
+- Local semantic search across past interactions
 **6. Personality Engine**
 - Character consistency enforcement
@ -138,24 +138,25 @@ Image Upload → Image Processing → Vision Model → Context Integration → R
 ### Voice Interaction Flow
 ```
-Voice Channel Join → Audio Processing → Speech Recognition → Text Processing → Voice Synthesis → Audio Output
+Voice Channel Join → Audio Processing (Whisper) → Text Processing (vLLM) → Voice Synthesis (Piper) → Audio Output
-                           ↓                  ↓                    ↓               ↓
+                           ↓                        ↓                        ↓
-                    Noise Filtering → Intent Detection → LLM Response → Voice Cloning
+                    Noise Filtering →         Intent Detection →      LLM Response →        Voice Model
 ```
 ## 🧩 AI Service Integration
-### Language Model Configuration
+### Language Model Configuration (vLLM)
-**Primary Model: GPT-4 Turbo**
+**vLLM with OpenAI-Compatible Endpoint:**
 ```javascript
-const LLM_CONFIG = {
+const VLLM_CONFIG = {
-  model: "gpt-4-turbo-preview",
+  endpoint: "http://localhost:8000/v1", // Your vLLM server
-  temperature: 0.8,        // Creative but consistent
+  model: "mistralai/Mistral-7B-Instruct-v0.2", // Or your preferred model
-  max_tokens: 1000,        // Reasonable response length
+  temperature: 0.7,        // Creative yet grounded
-  top_p: 0.9,             // Focused but diverse
+  max_tokens: 1500,        // Max response length
-  frequency_penalty: 0.3,  // Reduce repetition
+  top_p: 0.9,             // Focused sampling
-  presence_penalty: 0.2    // Encourage topic exploration
+  frequency_penalty: 0.2,  // Reduce repetition
  presence_penalty: 0.1    // Encourage topic exploration
 };
 ```
@ -166,45 +167,43 @@ USER: Conversation history + current message + visual context (if any)
 ASSISTANT: Previous Teto responses for consistency
 ```
-### Vision Model Integration
+### Vision Model Integration (vLLM Multi-modal)
 **Model Stack:**
- **GPT-4 Vision** - Primary image understanding
+- **Local Multi-modal Model** - (e.g., LLaVA, Idefics) served via `vLLM`
- **CLIP** - Image-text similarity for context matching
+- **CLIP** - Local image-text similarity for context matching
- **Custom Fine-tuning** - Teto-specific visual preferences
+- **Custom Fine-tuning** - Potential for Teto-specific visual preferences
 **Processing Pipeline:**
 ```javascript
 const processImage = async (imageUrl, conversationContext) => {
-  // Multi-model analysis for comprehensive understanding
+  // Local multi-modal analysis
-  const gpt4Analysis = await analyzeWithGPT4V(imageUrl);
+  const localAnalysis = await analyzeWithVLLM(imageUrl);
-  const clipEmbedding = await getCLIPEmbedding(imageUrl);
+  const clipEmbedding = await getLocalCLIPEmbedding(imageUrl);
  const contextMatch = await findSimilarImages(clipEmbedding);
  return {
-    description: gpt4Analysis.description,
+    description: localAnalysis.description,
-    emotions: gpt4Analysis.emotions,
+    emotions: localAnalysis.emotions,
    relevantMemories: contextMatch,
-    responseStyle: determineResponseStyle(gpt4Analysis, conversationContext)
+    responseStyle: determineResponseStyle(localAnalysis, conversationContext)
  };
 };
 ```
-### Voice Synthesis Setup
+### Voice I/O Setup (Wyoming Protocol)
-**ElevenLabs Configuration:**
+**Piper TTS and Whisper STT via Wyoming:**
 ```javascript
-const VOICE_CONFIG = {
+const WYOMING_CONFIG = {
-  voice_id: "kasane_teto_voice_clone",
+  host: "localhost",
-  model_id: "eleven_multilingual_v2",
+  port: 10300,
-  stability: 0.75,         // Consistent voice characteristics
+  piper_voice: "en_US-lessac-medium", // Or a custom-trained Teto voice
-  similarity_boost: 0.8,   // Maintain Teto's voice signature
+  whisper_model: "base.en" // Or larger model depending on resources
  style: 0.6,             // Moderate emotional expression
  use_speaker_boost: true  // Enhanced clarity
 };
 ```
-### Memory System Architecture
+### Memory System Architecture (Local)
 **Vector Database Structure:**
 ```javascript
@ -324,10 +323,10 @@ const safetyPipeline = async (content, context) => {
 ### Privacy Protection
 **Data Handling Principles:**
- **Local Memory Storage** - Conversation history stored locally, not sent to external services
+- **Complete Privacy** - All data, including conversations, images, and voice, is processed locally.
- **Anonymized Analytics** - Usage patterns tracked without personal identifiers
+- **No External Data Transfer** - AI processing does not require sending data to third-party services.
- **Selective Context** - Only relevant conversation context sent to AI models
+- **Full User Control** - Users have complete control over their data and the AI models.
- **User Consent** - Clear communication about data usage and AI processing
+- **User Consent** - Clear communication that all processing is done on the user's own hardware.
 ## 📊 Performance Optimization
@ -385,21 +384,18 @@ const processMessageAsync = async (message) => {
 ### Resource Management
-**Model Loading Strategy:**
+**Model Loading Strategy (for vLLM):**
 ```javascript
-const MODEL_LOADING = {
+// This is typically managed by the vLLM server instance itself.
-  // Keep language model always loaded
+// The configuration would involve which models to load on startup.
-  language_model: "persistent",
+const VLLM_SERVER_ARGS = {
-  
+  model: "mistralai/Mistral-7B-Instruct-v0.2",
-  // Load vision model on demand
+  "tensor-parallel-size": 1, // Or more depending on GPU count
-  vision_model: "on_demand",
+  "gpu-memory-utilization": 0.9, // Use 90% of GPU memory
-  
+  "max-model-len": 4096,
  // Pre-load voice synthesis during voice channel activity
  voice_synthesis: "predictive",
  // Cache embeddings for frequent users
  user_embeddings: "lru_cache"
 };
 // Wyoming services for Piper/Whisper are typically persistent.
 ```
 ## 🔧 Configuration & Customization
@ -443,14 +439,14 @@ const TUNABLE_PARAMETERS = {
 const getModelConfig = (environment) => {
  const configs = {
    development: {
-      model: "gpt-3.5-turbo",
+      model: "local-dev-model/gguf", // Smaller model for dev
      response_time_target: 3000,
      logging_level: "debug",
      cache_enabled: false
    },
    production: {
-      model: "gpt-4-turbo-preview",
+      model: "mistralai/Mistral-7B-Instruct-v0.2",
      response_time_target: 1500,
      logging_level: "info",
      cache_enabled: true,
--- a/docs/commands.md
+++ b/docs/commands.md
@ -303,13 +303,12 @@ How long did this take you to create? I'm in awe! ✨"
 **Example Response**:
 ```
 🤖 **Teto Status Report**
-💭 AI Systems: All operational! 
+💭 AI Systems: All local services operational!
-🎤 Voice: Ready to chat in voice channels
+🚀 vLLM: `mistralai/Mistral-7B-Instruct-v0.2` (Online)
-👀 Vision: Image analysis active
+🎤 Wyoming: Piper TTS & Whisper STT (Online)
-🧠 Memory: 1,247 conversations remembered
+🧠 Memory: Local Vector DB (1,247 conversations)
 ✨ Mood: Cheerful and energetic!
 ⏰ Been active for 3 hours today
 🎵 Currently listening to: Lo-fi beats
 ```
 ---
@ -441,16 +440,16 @@ how you finally managed it!"
 ## ⚠️ Important Notes
 ### Privacy & Consent
- All interactions are processed through AI systems
+- All interactions are processed by your self-hosted AI stack. No data is sent to external third-party services.
- Conversation history is stored locally for continuity
+- Conversation history is stored in your local vector database.
- Visual content is analyzed but not permanently stored
+- Visual content is analyzed by your local multi-modal model and is not stored unless recorded.
- Voice interactions may be temporarily cached for processing
+- Voice is processed locally via the Wyoming protocol (Piper/Whisper).
 ### Limitations
- Response time varies with AI model load (typically 1-3 seconds)
+- Response time depends entirely on your local hardware (GPU, CPU, RAM).
- Complex image analysis may take slightly longer
+- The quality and capabilities of Teto depend on the models you choose to run.
- Voice synthesis has brief processing delay
+- Requires significant VRAM (8GB+ for basic models, 24GB+ for larger ones).
- Memory system focuses on significant interactions
+- Initial setup and configuration of the local AI stack can be complex.
 ### Ethics & Safety
 - Teto is programmed to maintain appropriate, family-friendly interactions
--- a/docs/docker-compose-examples.md
+++ b/docs/docker-compose-examples.md
@ -0,0 +1,167 @@
 # Docker Compose Examples for Local AI Stack
 This document provides production-ready `docker-compose.yml` examples for setting up the self-hosted AI services required by the Teto AI Companion bot. These services should be included in the same `docker-compose.yml` file as the `teto_ai` bot service itself to ensure proper network communication.
 > [!IMPORTANT]
 > These examples require a host machine with an NVIDIA GPU and properly installed drivers. They use CDI (Container Device Interface) for GPU reservations, which is the modern standard for Docker.
 ## 🤖 vLLM Service (Language & Vision Model)
 This service uses `vLLM` to serve a powerful language model with an OpenAI-compatible API endpoint. This allows Teto to perform natural language understanding and generation locally. If you use a multi-modal model, this service will also provide vision capabilities.
 ```yaml
 services:
  vllm-openai:
    # This section reserves GPU resources for the container.
    # It ensures vLLM has exclusive access to the NVIDIA GPUs.
    deploy:
      resources:
        reservations:
          devices:
            - driver: cdi
              device_ids: ['nvidia.com/gpu=all']
              capabilities: ['gpu']
    # Mount local directories for model weights and cache.
    # This prevents re-downloading models on every container restart.
    volumes:
      - /path/to/your/llm_models/hf_cache:/root/.cache/huggingface
      - /path/to/your/llm_models:/root/LLM_models
    # Map the container's port 8000 to a host port (e.g., 11434).
    # Your .env file should point to this host port.
    ports:
      - "11434:8000"
    environment:
      # (Optional) Add your Hugging Face token if needed for private models.
      - HUGGING_FACE_HUB_TOKEN=your_hf_token_here
      # Optimizes PyTorch memory allocation, can improve performance.
      - PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512,garbage_collection_threshold:0.8
    # Necessary for multi-GPU communication and performance.
    ipc: host
    image: vllm/vllm-openai:latest
    # --- vLLM Command Line Arguments ---
    # These arguments configure how vLLM serves the model.
    # Adjust them based on your model and hardware.
    command: >
      --model jeffcookio/Mistral-Small-3.2-24B-Instruct-2506-awq-sym
      --tensor-parallel-size 2          # Number of GPUs to use.
      --max-model-len 32256             # Maximum context length.
      --limit-mm-per-prompt image=4     # For multi-modal models.
      --enable-auto-tool-choice         # For models that support tool use.
      --tool-call-parser mistral
      --enable-chunked-prefill
      --disable-log-stats
      --gpu-memory-utilization 0.75     # Use 75% of GPU VRAM.
      --enable-prefix-caching
      --max-num-seqs 4                  # Max concurrent sequences.
      --served-model-name Mistral-Small-3.2
 ```
 ### vLLM Configuration Notes
 -   **`--model`**: Specify the Hugging Face model identifier you want to serve.
 -   **`--tensor-parallel-size`**: Set this to the number of GPUs you want to use for a single model. For a single GPU, this should be `1`.
 -   **`--gpu-memory-utilization`**: Adjust this value based on your VRAM. `0.75` (75%) is a safe starting point.
 -   Check the [official vLLM documentation](https://docs.vllm.ai/en/latest/) for the latest command-line arguments and supported models.
 ## 🎤 Wyoming Voice Services (Piper TTS & Whisper STT)
 These services provide Text-to-Speech (`Piper`) and Speech-to-Text (`Whisper`) capabilities over the `Wyoming` protocol. They run as separate containers but are managed within the same Docker Compose file.
 ```yaml
 services:
  # --- Whisper STT Service ---
  # Converts speech from the voice channel into text for Teto to understand.
  wyoming-whisper:
    image: slackr31337/wyoming-whisper-gpu:latest
    container_name: wyoming-whisper
    environment:
      # Configure the Whisper model size and language.
      # Smaller models are faster but less accurate.
      - MODEL=base-int8
      - LANGUAGE=en
      - COMPUTE_TYPE=int8
      - BEAM_SIZE=5
    ports:
      # Exposes the Wyoming protocol port for Whisper.
      - "10300:10300"
    volumes:
      # Mount a volume to persist Whisper model data.
      - /path/to/your/whisper_data:/data
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: cdi
              device_ids: ['nvidia.com/gpu=all']
              capabilities: ['gpu']
  # --- Piper TTS Service ---
  # Converts Teto's text responses into speech.
  wyoming-piper:
    image: slackr31337/wyoming-piper-gpu:latest
    container_name: wyoming-piper
    environment:
      # Specify which Piper voice model to use.
      - PIPER_VOICE=en_US-amy-medium
    ports:
      # Exposes the Wyoming protocol port for Piper.
      - "10200:10200"
    volumes:
      # Mount a volume to persist Piper voice models.
      - /path/to/your/piper_data:/data
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - driver: cdi
              device_ids: ['nvidia.com/gpu=all']
              capabilities: ['gpu']
 ```
 ### Wyoming Configuration Notes
 -   **Multiple Ports**: Note that `Whisper` and `Piper` listen on different ports (`10300` and `10200` in this example). Your bot's configuration will need to point to the correct service and port.
 -   **Voice Models**: You can download different `Piper` voice models and place them in your persistent data directory to change Teto's voice.
 -   **GPU Usage**: These images are for GPU-accelerated voice processing. If your GPU is dedicated to `vLLM`, you may consider using CPU-based images for Wyoming to conserve VRAM.
 ## 🌐 Networking
 For the services to communicate with each other, they must share a Docker network. Using an external network is a good practice for managing complex applications.
 ```yaml
 # Add this to the bottom of your docker-compose.yml file
 networks:
  backend:
    external: true
 ```
 Before starting your stack, create the network manually:
 ```bash
 docker network create backend
 ```
 Then, ensure each service in your `docker-compose.yml` (including the `teto_ai` bot) is attached to this network:
 ```yaml
 services:
  teto_ai:
    # ... your bot's configuration
    networks:
      - backend
  vllm-openai:
    # ... vllm configuration
    networks:
      - backend
  wyoming-whisper:
    # ... whisper configuration
    networks:
      - backend
  wyoming-piper:
    # ... piper configuration
    networks:
      - backend
 ```
 This allows the Teto bot to communicate with `vllm-openai`, `wyoming-whisper`, and `wyoming-piper` using their service names as hostnames.
--- a/docs/setup.md
+++ b/docs/setup.md
@ -5,16 +5,22 @@ This guide will walk you through setting up the Discord Teto Bot for video recor
 ## 📋 Prerequisites
 ### System Requirements
- **Operating System**: Linux, macOS, or Windows with WSL2
+- **Operating System**: Linux is strongly recommended for GPU support. Windows with WSL2 is possible.
- **Docker**: Version 20.10+ and Docker Compose v2+
+- **GPU**: NVIDIA GPU with 8GB+ VRAM is required for local model hosting.
- **Disk Space**: Minimum 2GB for container, additional space for recordings
+- **Docker**: Version 20.10+ and Docker Compose v2+.
- **Memory**: 4GB RAM recommended (2GB minimum)
+- **Disk Space**: 20GB+ SSD for models and container images.
- **Network**: Stable internet connection for Discord API
+- **Memory**: 16GB+ RAM recommended.
 - **Network**: Local network for inter-service communication.
 ### Discord Requirements
- Discord account with user token
+- Discord account with user token.
- Server permissions to join voice channels
+- Server permissions to join voice channels.
- Voice channel access where you want to record
+- Voice channel access where you want to record.
 ### Local AI Requirements
 - **LLM/VLM Model**: A downloaded language model compatible with `vLLM` (e.g., from Hugging Face).
 - **TTS Voice Model**: A downloaded `Piper` voice model.
 - **STT Model**: A downloaded `Whisper` model.
 ### Development Prerequisites (Optional)
 - **Node.js**: Version 20+ for local development
@ -32,14 +38,20 @@ cd discord_teto
 ### Step 2: Environment Configuration
-Create environment variables for your Discord token:
+Create environment variables for your Discord token and local AI endpoints:
 ```bash
 # Method 1: Export in terminal session
 export USER_TOKEN="your_discord_user_token_here"
 export VLLM_ENDPOINT="http://localhost:8000/v1"
 export WYOMING_HOST="localhost"
 export WYOMING_PORT="10300"
 # Method 2: Create .env file (recommended)
 echo "USER_TOKEN=your_discord_user_token_here" > .env
 echo "VLLM_ENDPOINT=http://localhost:8000/v1" >> .env
 echo "WYOMING_HOST=localhost" >> .env
 echo "WYOMING_PORT=10300" >> .env
 ```
 **Getting Your Discord Token:**
@ -50,24 +62,38 @@ echo "USER_TOKEN=your_discord_user_token_here" > .env
 5. Look for requests to `discord.com/api`
 6. Find Authorization header starting with your token
-⚠️ **Security Warning**: Never share your Discord token publicly or commit it to version control.
+⚠️ **Security Warning**: Never share your Discord token publicly or commit it to version control. The bot operates on a user token and has the same permissions as your user.
-### Step 3: Directory Setup
+### Step 3: Model & Directory Setup
-Create the output directory for recordings:
+1. **Create Directories**
   Create directories for recordings and for your AI models.
   ```bash
   mkdir -p output models/piper models/whisper models/llm
   chmod 755 output models
   ```
   This `models` directory will be mounted into your AI service containers.
-```bash
+2. **Download AI Models**
-mkdir -p output
+   - **Language Model**: Download your chosen GGUF or other `vLLM`-compatible model and place it in `models/llm`.
-chmod 755 output
+   - **Voice Model (Piper)**: Download a `.onnx` and `.json` voice file for Piper and place them in `models/piper`.
-```
+   - **Speech-to-Text Model (Whisper)**: The Whisper service will download its model on first run, or you can pre-download it.
-This directory will be mounted into the Docker container to persist recordings.
+This directory will be mounted into the Docker container to persist recordings and provide models to the AI services.
-### Step 4: Docker Container Setup
+### Step 4: Local AI Stack & Bot Setup
 This project uses a multi-container Docker setup for the bot and its local AI services. Your `docker-compose.yml` file should define services for:
 - `teto_ai`: The bot itself.
 - `vllm-openai`: The language model server, providing an OpenAI-compatible endpoint.
 - `wyoming-piper`: The Text-to-Speech (TTS) service.
 - `wyoming-whisper`: The Speech-to-Text (STT) service.
 Below are sanitized, production-ready examples for these services. For full configuration details and explanations, please see the [Docker Compose Examples](docker-compose-examples.md) guide.
 #### Production Setup
 ```bash
-# Build and start the container
+# Build and start all containers
 docker compose up --build
 # Or run in background
@ -110,16 +136,19 @@ docker compose -f docker-compose.dev.yml up --build --no-deps
 ### Environment Variables
-Create a `.env` file in the project root:
+Create a `.env` file in the project root to configure the bot and its connections to the local AI services:
 ```env
-# Required
+# Required: Discord Token
 USER_TOKEN=your_discord_user_token
-# Optional
+# Required: Local AI Service Endpoints
-BOT_CLIENT_ID=your_bot_application_id
+VLLM_ENDPOINT="http://vllm:8000/v1" # Using Docker service name
-BOT_CLIENT_SECRET=your_bot_secret
+VLLM_MODEL="mistralai/Mistral-7B-Instruct-v0.2" # Model served by vLLM
-BOT_REDIRECT_URI=https://your-domain.com/auth/callback
+
 WYOMING_HOST="wyoming" # Using Docker service name
 WYOMING_PORT="10300"
 PIPER_VOICE="en_US-lessac-medium" # Voice model for Piper TTS
 # Recording Settings (optional)
 RECORDING_TIMEOUT=30000
@ -176,17 +205,14 @@ export const VIDEO_CONFIG = {
 ## 🔒 Security Considerations
-### Token Security
+### Data Privacy & Security
- Store tokens in environment variables, never in code
+- **100% Local Processing**: All AI processing, including conversations, voice, and images, happens locally. No data is sent to external third-party services.
- Use `.env` files for local development (add to `.gitignore`)
+- **Token Security**: Your Discord token should still be kept secure in a `.env` file or Docker secrets. Never commit it to version control.
- Consider using Docker secrets for production deployments
+- **Network Isolation**: The AI services (`vLLM`, `Wyoming`) can be configured to only be accessible within the Docker network, preventing outside access.
 - Rotate tokens regularly
 ### Container Security
- Bot runs as non-root user inside container
+- The bot and AI services run as non-root users inside their respective containers.
- Limited system capabilities (only SYS_ADMIN for Discord GUI)
+- Filesystem access is limited via specific volume mounts for models and output.
 - Isolated filesystem with specific volume mounts
 - No network access beyond Discord API requirements
 ### File Permissions
 ```bash
@ -200,6 +226,36 @@ chmod 644 ./output/*.mkv  # For recorded files
 ## 🐛 Troubleshooting Setup Issues
 ### Local AI Service Issues
 **1. vLLM Container Fails to Start**
 ```bash
 # Check vLLM logs for errors
 docker compose logs vllm
 # Common issues:
 # - Insufficient GPU VRAM for the selected model.
 # - Incorrect model path or name.
 # - CUDA driver issues on the host machine.
 # - Forgetting to build with --pull to get the latest base image.
 ```
 **2. Wyoming Service Not Responding**
 ```bash
 # Check Wyoming protocol server logs
 docker compose logs wyoming
 # Common issues:
 # - Incorrect path to Piper voice models.
 # - Port conflicts on the host (port 10300).
 # - Whisper model download failure on first run.
 ```
 **3. Teto Bot Can't Connect to AI Services**
 - Verify service names in your `.env` file match the service names in `docker-compose.yml` (e.g., `http://vllm:8000/v1`).
 - Ensure all containers are on the same Docker network.
 - Use `docker compose ps` to see if all containers are running and healthy.
 ### Common Installation Problems
 **1. Docker not found**
@ -273,14 +329,22 @@ npm install
 ### Container Health
 ```bash
-# Check container status
+# Check status of all containers (bot, vllm, wyoming)
 docker compose ps
-# View resource usage
+# View resource usage for all services
-docker stats teto_ai
+docker stats
-# Monitor logs in real-time
+# Monitor logs for a specific service in real-time
-docker compose logs -f
+docker compose logs -f vllm
 docker compose logs -f wyoming
 docker compose logs -f teto_ai
 ```
 ### GPU Resource Monitoring
 ```bash
 # Monitor GPU VRAM and utilization on the host machine
 watch -n 1 nvidia-smi
 ```
 ### Recording Status
--- a/docs/troubleshooting.md
+++ b/docs/troubleshooting.md
@ -28,7 +28,86 @@ docker inspect teto_ai | grep -A 5 "Mounts"
 df -h ./output/
 ```
-## 🐳 Docker Issues
+## 🤖 Local AI Stack Issues
 ### vLLM Service Issues
 **Problem**: The `vllm` container fails to start, crashes, or doesn't respond to requests.
 **Diagnosis**:
 ```bash
 # Check the vLLM container logs for CUDA errors, model loading issues, etc.
 docker compose logs vllm
 # Check GPU resource usage on the host
 nvidia-smi
 ```
 **Solutions**:
 1. **Insufficient VRAM**:
   - The most common issue. Check the model's VRAM requirements.
   - **Solution**: Use a smaller model (e.g., a 7B model requires ~8-10GB VRAM) or upgrade your GPU.
 2. **CUDA & Driver Mismatches**:
   - The `vLLM` container requires a specific CUDA version on the host.
   - **Solution**: Ensure your NVIDIA drivers are up-to-date and compatible with the CUDA version used in the `vLLM` Docker image.
 3. **Incorrect Model Path or Name**:
   - The container can't find the model weights.
   - **Solution**: Verify the volume mount in `docker-compose.yml` points to the correct local directory containing your models. Double-check the model name in your `.env` file.
 ### Wyoming (Piper/Whisper) Service Issues
 **Problem**: The `wyoming` container is running, but Teto cannot speak or understand voice commands.
 **Diagnosis**:
 ```bash
 # Check the Wyoming container logs for errors related to Piper or Whisper
 docker compose logs wyoming
 # Test the connection from another container
 docker exec -it teto_ai nc -zv wyoming 10300
 ```
 **Solutions**:
 1. **Incorrect Piper Voice Model Path**:
   - The service can't find the `.onnx` and `.json` files for the selected voice.
   - **Solution**: Check your volume mounts and the voice name specified in your configuration.
 2. **Whisper Model Download Failure**:
   - On first run, the service may fail to download the Whisper model.
   - **Solution**: Ensure the container has internet access for the initial download, or manually place the model in the correct volume.
 3. **Port Conflict**:
   - Another service on your host might be using port `10300`.
   - **Solution**: Use `netstat -tulpn | grep 10300` to check for conflicts and remap the port in `docker-compose.yml` if needed.
 ### Bot Can't Connect to Local AI Services
 **Problem**: The Teto bot is running but logs errors about being unable to reach `vllm` or `wyoming`.
 **Diagnosis**:
 ```bash
 # Check the Teto bot logs for connection refused errors
 docker compose logs teto_ai
 # Ensure all services are on the same Docker network
 docker network inspect <your_network_name>
 ```
 **Solutions**:
 1. **Incorrect Endpoint Configuration**:
   - The `.env` file points to the wrong service name or port.
   - **Solution**: Ensure `VLLM_ENDPOINT` and `WYOMING_HOST` use the correct service names as defined in `docker-compose.yml` (e.g., `vllm`, `wyoming`).
 2. **Docker Networking Issues**:
   - The containers cannot resolve each other's service names.
   - **Solution**: Ensure all services are defined within the same `docker-compose.yml` and share a common network.
 ## 🐳 General Docker Issues
 ### Container Won't Start