Updated the docs to focus on a local only stack instead of one relient on services like OpenAI, Eleven labs and so on.
This commit is contained in:
parent
44b45b7212
commit
2e94820164
7 changed files with 489 additions and 176 deletions
52
README.md
52
README.md
|
|
@ -35,14 +35,18 @@ Kasane Teto is your server's AI companion who can:
|
||||||
|
|
||||||
## 🚀 Quick Start
|
## 🚀 Quick Start
|
||||||
|
|
||||||
|
> [!IMPORTANT]
|
||||||
|
> This project is designed to run exclusively within Docker containers. Bare-metal installation is not officially supported. All instructions assume a working Docker environment.
|
||||||
|
|
||||||
1. **Setup Environment**
|
1. **Setup Environment**
|
||||||
```bash
|
```bash
|
||||||
git clone <repository-url>
|
git clone <repository-url>
|
||||||
cd discord_teto
|
cd discord_teto
|
||||||
|
|
||||||
# Configure AI and Discord credentials
|
# Configure Discord credentials & local AI endpoints
|
||||||
export USER_TOKEN="your_discord_token"
|
export USER_TOKEN="your_discord_token"
|
||||||
export OPENAI_API_KEY="your_openai_key" # or other AI provider
|
export VLLM_ENDPOINT="http://localhost:8000" # Or your vLLM server
|
||||||
|
export WYOMING_ENDPOINT="http://localhost:10300" # Or your Wyoming server
|
||||||
```
|
```
|
||||||
|
|
||||||
2. **Start Teto**
|
2. **Start Teto**
|
||||||
|
|
@ -106,10 +110,11 @@ src/
|
||||||
```
|
```
|
||||||
|
|
||||||
### AI Integration
|
### AI Integration
|
||||||
- **Language Model**: GPT-4/Claude/Local LLM for conversation
|
- **Language Model**: Self-hosted LLM via `vLLM` (OpenAI compatible endpoint)
|
||||||
- **Vision Model**: CLIP/GPT-4V for image understanding
|
- **Vision Model**: Multi-modal models served through `vLLM`
|
||||||
- **Voice Synthesis**: Eleven Labs/Azure Speech for Teto's voice
|
- **Voice Synthesis**: `Piper` TTS via `Wyoming` protocol
|
||||||
- **Memory System**: Vector database for conversation history
|
- **Speech Recognition**: `Whisper` STT via `Wyoming` protocol
|
||||||
|
- **Memory System**: Local vector database for conversation history
|
||||||
- **Personality Engine**: Custom prompt engineering for character consistency
|
- **Personality Engine**: Custom prompt engineering for character consistency
|
||||||
|
|
||||||
## 🎭 Teto's Personality
|
## 🎭 Teto's Personality
|
||||||
|
|
@ -157,21 +162,19 @@ src/
|
||||||
|
|
||||||
## 🔧 Configuration
|
## 🔧 Configuration
|
||||||
|
|
||||||
### AI Provider Setup
|
### Local AI Provider Setup
|
||||||
```env
|
```env
|
||||||
# OpenAI (recommended)
|
# Local vLLM Server (OpenAI Compatible)
|
||||||
OPENAI_API_KEY=your_openai_key
|
VLLM_ENDPOINT="http://localhost:8000/v1"
|
||||||
OPENAI_MODEL=gpt-4-turbo-preview
|
LOCAL_MODEL_NAME="mistralai/Mistral-7B-Instruct-v0.2" # Or your preferred model
|
||||||
|
|
||||||
# Alternative: Anthropic Claude
|
# Wyoming Protocol for Voice (Piper TTS / Whisper STT)
|
||||||
ANTHROPIC_API_KEY=your_claude_key
|
WYOMING_HOST="localhost"
|
||||||
|
WYOMING_PORT="10300"
|
||||||
|
PIPER_VOICE="en_US-lessac-medium"
|
||||||
|
|
||||||
# Voice Synthesis
|
# Vision Capabilities are enabled if the vLLM model is multi-modal
|
||||||
ELEVENLABS_API_KEY=your_elevenlabs_key
|
VISION_ENABLED=true
|
||||||
TETO_VOICE_ID=kasane_teto_voice_clone
|
|
||||||
|
|
||||||
# Vision Capabilities
|
|
||||||
VISION_MODEL=gpt-4-vision-preview
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Personality Customization
|
### Personality Customization
|
||||||
|
|
@ -196,6 +199,8 @@ export const TETO_PERSONALITY = {
|
||||||
|
|
||||||
## 🐳 Docker Deployment
|
## 🐳 Docker Deployment
|
||||||
|
|
||||||
|
This project is officially supported for **Docker deployments only**. The container-first approach is critical for managing the complex local AI stack, ensuring that all services, dependencies, and configurations operate together consistently.
|
||||||
|
|
||||||
### Production Setup
|
### Production Setup
|
||||||
```bash
|
```bash
|
||||||
# Start Teto with all AI capabilities
|
# Start Teto with all AI capabilities
|
||||||
|
|
@ -206,10 +211,11 @@ docker compose logs -f teto_ai
|
||||||
```
|
```
|
||||||
|
|
||||||
### Resource Requirements
|
### Resource Requirements
|
||||||
- **Memory**: 4GB+ recommended for AI processing
|
- **VRAM**: 8GB+ for 7B models, 24GB+ for larger models
|
||||||
- **CPU**: Multi-core for real-time AI inference
|
- **Memory**: 16GB+ RAM recommended
|
||||||
- **Storage**: SSD recommended for fast model loading
|
- **CPU**: Modern multi-core CPU
|
||||||
- **Network**: Stable connection for AI API calls
|
- **Storage**: Fast SSD for model weights (15GB+ per model)
|
||||||
|
- **Network**: Local network for inter-service communication
|
||||||
|
|
||||||
## 🔐 Privacy & Ethics
|
## 🔐 Privacy & Ethics
|
||||||
|
|
||||||
|
|
@ -292,7 +298,7 @@ This project is for educational and community use. Please ensure compliance with
|
||||||
---
|
---
|
||||||
|
|
||||||
**Version**: 3.0.0 (AI-Powered)
|
**Version**: 3.0.0 (AI-Powered)
|
||||||
**AI Models**: GPT-4, CLIP, Eleven Labs
|
**AI Stack**: Local-First (vLLM, Piper, Whisper)
|
||||||
**Runtime**: Node.js 20+ with Docker
|
**Runtime**: Node.js 20+ with Docker
|
||||||
|
|
||||||
Bring Kasane Teto to life in your Discord server! 🎵✨
|
Bring Kasane Teto to life in your Discord server! 🎵✨
|
||||||
|
|
|
||||||
|
|
@ -17,9 +17,9 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
|
||||||
## 📚 Documentation Structure
|
## 📚 Documentation Structure
|
||||||
|
|
||||||
### 🚀 Getting Started
|
### 🚀 Getting Started
|
||||||
- **[Setup Guide](setup.md)** - Complete installation and AI configuration
|
- **[Setup Guide](setup.md)** - Complete installation and local AI stack configuration
|
||||||
- **[Quick Start](../README.md#quick-start)** - Get Teto running in 5 minutes
|
- **[Quick Start](../README.md#quick-start)** - Get Teto running in 5 minutes
|
||||||
- **[Configuration](configuration.md)** - AI models, personality, and customization
|
- **[Configuration](configuration.md)** - Local models, personality, and customization
|
||||||
|
|
||||||
### 💬 Interacting with Teto
|
### 💬 Interacting with Teto
|
||||||
- **[Conversation Guide](interactions.md)** - How to chat naturally with Teto
|
- **[Conversation Guide](interactions.md)** - How to chat naturally with Teto
|
||||||
|
|
@ -28,10 +28,10 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
|
||||||
- **[Voice Interaction](voice.md)** - Speaking with Teto in voice channels
|
- **[Voice Interaction](voice.md)** - Speaking with Teto in voice channels
|
||||||
|
|
||||||
### 🧠 AI Capabilities
|
### 🧠 AI Capabilities
|
||||||
- **[AI Architecture](ai-architecture.md)** - How Teto's AI systems work
|
- **[AI Architecture](ai-architecture.md)** - How Teto's local AI systems work
|
||||||
- **[Vision System](vision.md)** - Image analysis and visual understanding
|
- **[Vision System](vision.md)** - Image analysis with local multi-modal models
|
||||||
- **[Memory System](memory.md)** - How Teto remembers conversations
|
- **[Memory System](memory.md)** - How Teto remembers conversations locally
|
||||||
- **[Personality Engine](personality-engine.md)** - Character consistency and roleplay
|
+- **Personality Engine](personality-engine.md)** - Character consistency and roleplay
|
||||||
|
|
||||||
### 🔧 Technical Documentation
|
### 🔧 Technical Documentation
|
||||||
- **[Architecture Overview](architecture.md)** - System design and components
|
- **[Architecture Overview](architecture.md)** - System design and components
|
||||||
|
|
@ -41,15 +41,15 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
|
||||||
|
|
||||||
### 🛠️ Operations & Support
|
### 🛠️ Operations & Support
|
||||||
- **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
|
- **[Troubleshooting](troubleshooting.md)** - Common issues and solutions
|
||||||
- **[Performance Tuning](performance.md)** - Optimization for your server
|
- **[Performance Tuning](performance.md)** - Optimizing your local AI stack
|
||||||
- **[Security & Privacy](security.md)** - Data handling and safety considerations
|
+- **[Security & Privacy](security.md)** - Data handling and safety in a local-first setup
|
||||||
|
|
||||||
## 🎯 Quick Navigation by Use Case
|
## 🎯 Quick Navigation by Use Case
|
||||||
|
|
||||||
### "I want to set up Teto for the first time"
|
### "I want to set up Teto for the first time"
|
||||||
1. [Setup Guide](setup.md) - Installation and configuration
|
1. [Setup Guide](setup.md) - Installation and local AI stack configuration
|
||||||
2. [Configuration](configuration.md) - AI API keys and personality setup
|
2. [Configuration](configuration.md) - vLLM, Piper, and Whisper setup
|
||||||
3. [Docker Guide](docker.md) - Container deployment
|
3. [Docker Guide](docker.md) - Multi-container deployment for AI services
|
||||||
|
|
||||||
### "I want to understand how to interact with Teto"
|
### "I want to understand how to interact with Teto"
|
||||||
1. [Conversation Guide](interactions.md) - Natural chat examples
|
1. [Conversation Guide](interactions.md) - Natural chat examples
|
||||||
|
|
@ -58,7 +58,7 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
|
||||||
|
|
||||||
### "I want to understand Teto's capabilities"
|
### "I want to understand Teto's capabilities"
|
||||||
1. [Personality Guide](personality.md) - Character traits and style
|
1. [Personality Guide](personality.md) - Character traits and style
|
||||||
2. [Vision System](vision.md) - Image and video analysis
|
2. [Vision System](vision.md) - Image analysis with local models
|
||||||
3. [AI Architecture](ai-architecture.md) - Technical capabilities
|
3. [AI Architecture](ai-architecture.md) - Technical capabilities
|
||||||
|
|
||||||
### "I want to customize or develop features"
|
### "I want to customize or develop features"
|
||||||
|
|
@ -68,8 +68,8 @@ Unlike simple command bots, Teto engages in genuine conversations, remembers pas
|
||||||
|
|
||||||
### "I'm having issues or want to optimize"
|
### "I'm having issues or want to optimize"
|
||||||
1. [Troubleshooting](troubleshooting.md) - Problem solving
|
1. [Troubleshooting](troubleshooting.md) - Problem solving
|
||||||
2. [Performance Tuning](performance.md) - Optimization tips
|
2. [Performance Tuning](performance.md) - Optimizing your local AI stack
|
||||||
3. [Security & Privacy](security.md) - Best practices
|
+- **[Security & Privacy](security.md)** - Best practices for a local-first setup
|
||||||
|
|
||||||
## 🌟 Key Features Overview
|
## 🌟 Key Features Overview
|
||||||
|
|
||||||
|
|
@ -94,11 +94,12 @@ Carefully crafted personality engine ensures Teto maintains consistent character
|
||||||
## 🔧 Technical Architecture
|
## 🔧 Technical Architecture
|
||||||
|
|
||||||
```
|
```
|
||||||
Teto AI System
|
Teto Local AI System
|
||||||
├── Language Model (GPT-4/Claude) # Natural conversation
|
├── Language Model (vLLM) # Self-hosted natural conversation
|
||||||
├── Vision Model (GPT-4V/CLIP) # Image/video analysis
|
├── Vision Model (vLLM Multi-modal) # Self-hosted image/video analysis
|
||||||
├── Voice Synthesis (ElevenLabs) # Speech generation
|
├── Voice Synthesis (Piper TTS) # Local speech generation via Wyoming
|
||||||
├── Memory System (Vector DB) # Conversation history
|
├── Speech Recognition (Whisper STT) # Local speech recognition via Wyoming
|
||||||
|
├── Memory System (Local Vector DB) # Local conversation history
|
||||||
├── Personality Engine # Character consistency
|
├── Personality Engine # Character consistency
|
||||||
└── Discord Integration # Platform interface
|
└── Discord Integration # Platform interface
|
||||||
```
|
```
|
||||||
|
|
@ -106,23 +107,24 @@ Teto AI System
|
||||||
## 📋 System Requirements
|
## 📋 System Requirements
|
||||||
|
|
||||||
### Minimum Requirements
|
### Minimum Requirements
|
||||||
- **RAM**: 4GB (AI model loading)
|
- **VRAM**: 8GB+ for 7B models (required for `vLLM`)
|
||||||
- **CPU**: Multi-core (real-time inference)
|
- **RAM**: 16GB+ (for models and system)
|
||||||
- **Storage**: 10GB (models and data)
|
- **CPU**: Modern multi-core (for processing)
|
||||||
- **Network**: Stable connection (AI API calls)
|
- **Storage**: 15GB+ SSD (for model weights)
|
||||||
|
- **Network**: Local network for inter-service communication
|
||||||
|
|
||||||
### Recommended Setup
|
### Recommended Setup
|
||||||
- **RAM**: 8GB+ for optimal performance
|
- **VRAM**: 24GB+ for larger models or concurrent tasks
|
||||||
- **CPU**: Modern multi-core processor
|
- **RAM**: 32GB+ for smoother operation
|
||||||
- **Storage**: SSD for fast model access
|
- **Storage**: NVMe SSD for fast model loading
|
||||||
- **GPU**: Optional but beneficial for local inference
|
- **GPU**: Required for `vLLM` and `Whisper`
|
||||||
|
|
||||||
## 🚦 Getting Started Checklist
|
## 🚦 Getting Started Checklist
|
||||||
|
|
||||||
- [ ] Read the [Setup Guide](setup.md)
|
- [ ] Read the [Setup Guide](setup.md)
|
||||||
- [ ] Obtain necessary API keys (OpenAI, ElevenLabs, etc.)
|
- [ ] Download required model weights (LLM, TTS, etc.)
|
||||||
- [ ] Configure Discord token and permissions
|
- [ ] Configure local endpoints for `vLLM` and `Wyoming`
|
||||||
- [ ] Deploy using Docker or run locally
|
- [ ] Deploy multi-container stack using Docker
|
||||||
- [ ] Customize personality settings
|
- [ ] Customize personality settings
|
||||||
- [ ] Test basic conversation features
|
- [ ] Test basic conversation features
|
||||||
- [ ] Explore voice and vision capabilities
|
- [ ] Explore voice and vision capabilities
|
||||||
|
|
@ -143,12 +145,12 @@ See the [Development Guide](development.md) for detailed contribution guidelines
|
||||||
- **Technical Issues**: Check [Troubleshooting](troubleshooting.md)
|
- **Technical Issues**: Check [Troubleshooting](troubleshooting.md)
|
||||||
- **Setup Problems**: Review [Setup Guide](setup.md)
|
- **Setup Problems**: Review [Setup Guide](setup.md)
|
||||||
- **Feature Questions**: See [Commands Reference](commands.md)
|
- **Feature Questions**: See [Commands Reference](commands.md)
|
||||||
- **AI Behavior**: Read [Personality Guide](personality.md)
|
+- **AI Behavior**: Read [Personality Guide](personality.md)
|
||||||
|
|
||||||
### Best Practices
|
### Best Practices
|
||||||
- **Privacy First**: Always respect user consent and data privacy
|
- **Privacy First**: All data is processed locally, ensuring maximum privacy
|
||||||
- **Appropriate Content**: Maintain family-friendly interactions
|
- **Appropriate Content**: Maintain family-friendly interactions
|
||||||
- **Resource Management**: Monitor AI API usage and costs
|
- **Resource Management**: Monitor local GPU and CPU usage
|
||||||
- **Community Guidelines**: Foster positive server environments
|
- **Community Guidelines**: Foster positive server environments
|
||||||
|
|
||||||
## 📊 Documentation Stats
|
## 📊 Documentation Stats
|
||||||
|
|
@ -163,10 +165,10 @@ See the [Development Guide](development.md) for detailed contribution guidelines
|
||||||
|
|
||||||
The documentation will continue to evolve with new features:
|
The documentation will continue to evolve with new features:
|
||||||
- **Advanced Memory Systems** - Long-term relationship building
|
- **Advanced Memory Systems** - Long-term relationship building
|
||||||
- **Custom Voice Training** - Personalized Teto voice models
|
- **Custom Voice Training** - Fine-tuning `Piper` for a unique Teto voice
|
||||||
- **Multi-Server Consistency** - Shared personality across servers
|
- **Multi-Server Consistency** - Shared personality across servers
|
||||||
- **Game Integration** - Interactive gaming experiences
|
- **Game Integration** - Interactive gaming experiences
|
||||||
- **Creative Tools** - Music and art generation capabilities
|
- **Creative Tools** - Music and art generation with local models
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -26,34 +26,34 @@ This document provides a comprehensive overview of how Kasane Teto's AI systems
|
||||||
### Core Components
|
### Core Components
|
||||||
|
|
||||||
**1. AI Orchestration Layer**
|
**1. AI Orchestration Layer**
|
||||||
- Coordinates between different AI services
|
- Coordinates between different local AI services
|
||||||
- Manages context flow and decision routing
|
- Manages context flow and decision routing
|
||||||
- Handles multi-modal input integration
|
- Handles multi-modal input integration
|
||||||
- Ensures personality consistency across modalities
|
- Ensures personality consistency across modalities
|
||||||
|
|
||||||
**2. Language Model Integration**
|
**2. Language Model Integration (vLLM)**
|
||||||
- Primary conversational intelligence (GPT-4/Claude)
|
- Self-hosted conversational intelligence via `vLLM`
|
||||||
- Context-aware response generation
|
- Context-aware response generation through OpenAI-compatible API
|
||||||
- Personality-guided prompt engineering
|
- Personality-guided prompt engineering for local models
|
||||||
- Multi-turn conversation management
|
- Multi-turn conversation management
|
||||||
|
|
||||||
**3. Vision Processing System**
|
**3. Vision Processing System (vLLM Multi-modal)**
|
||||||
- Image analysis and understanding
|
- Image analysis using local multi-modal models
|
||||||
- Video frame processing for streams
|
- Video frame processing for streams
|
||||||
- Visual context integration with conversations
|
- Visual context integration with conversations
|
||||||
- Automated response generation for visual content
|
- Automated response generation for visual content
|
||||||
|
|
||||||
**4. Voice Synthesis & Recognition**
|
**4. Voice Synthesis & Recognition (Wyoming Protocol)**
|
||||||
- Text-to-speech with Teto's voice characteristics
|
- Text-to-speech using `Piper` for Teto's voice characteristics
|
||||||
- Speech-to-text for voice command processing
|
- Speech-to-text using `Whisper` for voice command processing
|
||||||
- Emotional tone and inflection control
|
- Emotional tone and inflection control via TTS models
|
||||||
- Real-time voice conversation capabilities
|
- Real-time voice conversation capabilities
|
||||||
|
|
||||||
**5. Memory & Context System**
|
**5. Memory & Context System (Local)**
|
||||||
- Long-term conversation history storage
|
- Local long-term conversation history storage (e.g., ChromaDB)
|
||||||
- User preference and relationship tracking
|
- User preference and relationship tracking
|
||||||
- Context retrieval for relevant conversations
|
- Context retrieval for relevant conversations
|
||||||
- Semantic search across past interactions
|
- Local semantic search across past interactions
|
||||||
|
|
||||||
**6. Personality Engine**
|
**6. Personality Engine**
|
||||||
- Character consistency enforcement
|
- Character consistency enforcement
|
||||||
|
|
@ -138,24 +138,25 @@ Image Upload → Image Processing → Vision Model → Context Integration → R
|
||||||
### Voice Interaction Flow
|
### Voice Interaction Flow
|
||||||
|
|
||||||
```
|
```
|
||||||
Voice Channel Join → Audio Processing → Speech Recognition → Text Processing → Voice Synthesis → Audio Output
|
Voice Channel Join → Audio Processing (Whisper) → Text Processing (vLLM) → Voice Synthesis (Piper) → Audio Output
|
||||||
↓ ↓ ↓ ↓
|
↓ ↓ ↓
|
||||||
Noise Filtering → Intent Detection → LLM Response → Voice Cloning
|
Noise Filtering → Intent Detection → LLM Response → Voice Model
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🧩 AI Service Integration
|
## 🧩 AI Service Integration
|
||||||
|
|
||||||
### Language Model Configuration
|
### Language Model Configuration (vLLM)
|
||||||
|
|
||||||
**Primary Model: GPT-4 Turbo**
|
**vLLM with OpenAI-Compatible Endpoint:**
|
||||||
```javascript
|
```javascript
|
||||||
const LLM_CONFIG = {
|
const VLLM_CONFIG = {
|
||||||
model: "gpt-4-turbo-preview",
|
endpoint: "http://localhost:8000/v1", // Your vLLM server
|
||||||
temperature: 0.8, // Creative but consistent
|
model: "mistralai/Mistral-7B-Instruct-v0.2", // Or your preferred model
|
||||||
max_tokens: 1000, // Reasonable response length
|
temperature: 0.7, // Creative yet grounded
|
||||||
top_p: 0.9, // Focused but diverse
|
max_tokens: 1500, // Max response length
|
||||||
frequency_penalty: 0.3, // Reduce repetition
|
top_p: 0.9, // Focused sampling
|
||||||
presence_penalty: 0.2 // Encourage topic exploration
|
frequency_penalty: 0.2, // Reduce repetition
|
||||||
|
presence_penalty: 0.1 // Encourage topic exploration
|
||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
@ -166,45 +167,43 @@ USER: Conversation history + current message + visual context (if any)
|
||||||
ASSISTANT: Previous Teto responses for consistency
|
ASSISTANT: Previous Teto responses for consistency
|
||||||
```
|
```
|
||||||
|
|
||||||
### Vision Model Integration
|
### Vision Model Integration (vLLM Multi-modal)
|
||||||
|
|
||||||
**Model Stack:**
|
**Model Stack:**
|
||||||
- **GPT-4 Vision** - Primary image understanding
|
- **Local Multi-modal Model** - (e.g., LLaVA, Idefics) served via `vLLM`
|
||||||
- **CLIP** - Image-text similarity for context matching
|
- **CLIP** - Local image-text similarity for context matching
|
||||||
- **Custom Fine-tuning** - Teto-specific visual preferences
|
- **Custom Fine-tuning** - Potential for Teto-specific visual preferences
|
||||||
|
|
||||||
**Processing Pipeline:**
|
**Processing Pipeline:**
|
||||||
```javascript
|
```javascript
|
||||||
const processImage = async (imageUrl, conversationContext) => {
|
const processImage = async (imageUrl, conversationContext) => {
|
||||||
// Multi-model analysis for comprehensive understanding
|
// Local multi-modal analysis
|
||||||
const gpt4Analysis = await analyzeWithGPT4V(imageUrl);
|
const localAnalysis = await analyzeWithVLLM(imageUrl);
|
||||||
const clipEmbedding = await getCLIPEmbedding(imageUrl);
|
const clipEmbedding = await getLocalCLIPEmbedding(imageUrl);
|
||||||
const contextMatch = await findSimilarImages(clipEmbedding);
|
const contextMatch = await findSimilarImages(clipEmbedding);
|
||||||
|
|
||||||
return {
|
return {
|
||||||
description: gpt4Analysis.description,
|
description: localAnalysis.description,
|
||||||
emotions: gpt4Analysis.emotions,
|
emotions: localAnalysis.emotions,
|
||||||
relevantMemories: contextMatch,
|
relevantMemories: contextMatch,
|
||||||
responseStyle: determineResponseStyle(gpt4Analysis, conversationContext)
|
responseStyle: determineResponseStyle(localAnalysis, conversationContext)
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
### Voice Synthesis Setup
|
### Voice I/O Setup (Wyoming Protocol)
|
||||||
|
|
||||||
**ElevenLabs Configuration:**
|
**Piper TTS and Whisper STT via Wyoming:**
|
||||||
```javascript
|
```javascript
|
||||||
const VOICE_CONFIG = {
|
const WYOMING_CONFIG = {
|
||||||
voice_id: "kasane_teto_voice_clone",
|
host: "localhost",
|
||||||
model_id: "eleven_multilingual_v2",
|
port: 10300,
|
||||||
stability: 0.75, // Consistent voice characteristics
|
piper_voice: "en_US-lessac-medium", // Or a custom-trained Teto voice
|
||||||
similarity_boost: 0.8, // Maintain Teto's voice signature
|
whisper_model: "base.en" // Or larger model depending on resources
|
||||||
style: 0.6, // Moderate emotional expression
|
|
||||||
use_speaker_boost: true // Enhanced clarity
|
|
||||||
};
|
};
|
||||||
```
|
```
|
||||||
|
|
||||||
### Memory System Architecture
|
### Memory System Architecture (Local)
|
||||||
|
|
||||||
**Vector Database Structure:**
|
**Vector Database Structure:**
|
||||||
```javascript
|
```javascript
|
||||||
|
|
@ -324,10 +323,10 @@ const safetyPipeline = async (content, context) => {
|
||||||
### Privacy Protection
|
### Privacy Protection
|
||||||
|
|
||||||
**Data Handling Principles:**
|
**Data Handling Principles:**
|
||||||
- **Local Memory Storage** - Conversation history stored locally, not sent to external services
|
- **Complete Privacy** - All data, including conversations, images, and voice, is processed locally.
|
||||||
- **Anonymized Analytics** - Usage patterns tracked without personal identifiers
|
- **No External Data Transfer** - AI processing does not require sending data to third-party services.
|
||||||
- **Selective Context** - Only relevant conversation context sent to AI models
|
- **Full User Control** - Users have complete control over their data and the AI models.
|
||||||
- **User Consent** - Clear communication about data usage and AI processing
|
- **User Consent** - Clear communication that all processing is done on the user's own hardware.
|
||||||
|
|
||||||
## 📊 Performance Optimization
|
## 📊 Performance Optimization
|
||||||
|
|
||||||
|
|
@ -385,21 +384,18 @@ const processMessageAsync = async (message) => {
|
||||||
|
|
||||||
### Resource Management
|
### Resource Management
|
||||||
|
|
||||||
**Model Loading Strategy:**
|
**Model Loading Strategy (for vLLM):**
|
||||||
```javascript
|
```javascript
|
||||||
const MODEL_LOADING = {
|
// This is typically managed by the vLLM server instance itself.
|
||||||
// Keep language model always loaded
|
// The configuration would involve which models to load on startup.
|
||||||
language_model: "persistent",
|
const VLLM_SERVER_ARGS = {
|
||||||
|
model: "mistralai/Mistral-7B-Instruct-v0.2",
|
||||||
// Load vision model on demand
|
"tensor-parallel-size": 1, // Or more depending on GPU count
|
||||||
vision_model: "on_demand",
|
"gpu-memory-utilization": 0.9, // Use 90% of GPU memory
|
||||||
|
"max-model-len": 4096,
|
||||||
// Pre-load voice synthesis during voice channel activity
|
|
||||||
voice_synthesis: "predictive",
|
|
||||||
|
|
||||||
// Cache embeddings for frequent users
|
|
||||||
user_embeddings: "lru_cache"
|
|
||||||
};
|
};
|
||||||
|
|
||||||
|
// Wyoming services for Piper/Whisper are typically persistent.
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🔧 Configuration & Customization
|
## 🔧 Configuration & Customization
|
||||||
|
|
@ -443,14 +439,14 @@ const TUNABLE_PARAMETERS = {
|
||||||
const getModelConfig = (environment) => {
|
const getModelConfig = (environment) => {
|
||||||
const configs = {
|
const configs = {
|
||||||
development: {
|
development: {
|
||||||
model: "gpt-3.5-turbo",
|
model: "local-dev-model/gguf", // Smaller model for dev
|
||||||
response_time_target: 3000,
|
response_time_target: 3000,
|
||||||
logging_level: "debug",
|
logging_level: "debug",
|
||||||
cache_enabled: false
|
cache_enabled: false
|
||||||
},
|
},
|
||||||
|
|
||||||
production: {
|
production: {
|
||||||
model: "gpt-4-turbo-preview",
|
model: "mistralai/Mistral-7B-Instruct-v0.2",
|
||||||
response_time_target: 1500,
|
response_time_target: 1500,
|
||||||
logging_level: "info",
|
logging_level: "info",
|
||||||
cache_enabled: true,
|
cache_enabled: true,
|
||||||
|
|
|
||||||
|
|
@ -303,13 +303,12 @@ How long did this take you to create? I'm in awe! ✨"
|
||||||
**Example Response**:
|
**Example Response**:
|
||||||
```
|
```
|
||||||
🤖 **Teto Status Report**
|
🤖 **Teto Status Report**
|
||||||
💭 AI Systems: All operational!
|
💭 AI Systems: All local services operational!
|
||||||
🎤 Voice: Ready to chat in voice channels
|
🚀 vLLM: `mistralai/Mistral-7B-Instruct-v0.2` (Online)
|
||||||
👀 Vision: Image analysis active
|
🎤 Wyoming: Piper TTS & Whisper STT (Online)
|
||||||
🧠 Memory: 1,247 conversations remembered
|
🧠 Memory: Local Vector DB (1,247 conversations)
|
||||||
✨ Mood: Cheerful and energetic!
|
✨ Mood: Cheerful and energetic!
|
||||||
⏰ Been active for 3 hours today
|
⏰ Been active for 3 hours today
|
||||||
🎵 Currently listening to: Lo-fi beats
|
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
@ -441,16 +440,16 @@ how you finally managed it!"
|
||||||
## ⚠️ Important Notes
|
## ⚠️ Important Notes
|
||||||
|
|
||||||
### Privacy & Consent
|
### Privacy & Consent
|
||||||
- All interactions are processed through AI systems
|
- All interactions are processed by your self-hosted AI stack. No data is sent to external third-party services.
|
||||||
- Conversation history is stored locally for continuity
|
- Conversation history is stored in your local vector database.
|
||||||
- Visual content is analyzed but not permanently stored
|
- Visual content is analyzed by your local multi-modal model and is not stored unless recorded.
|
||||||
- Voice interactions may be temporarily cached for processing
|
- Voice is processed locally via the Wyoming protocol (Piper/Whisper).
|
||||||
|
|
||||||
### Limitations
|
### Limitations
|
||||||
- Response time varies with AI model load (typically 1-3 seconds)
|
- Response time depends entirely on your local hardware (GPU, CPU, RAM).
|
||||||
- Complex image analysis may take slightly longer
|
- The quality and capabilities of Teto depend on the models you choose to run.
|
||||||
- Voice synthesis has brief processing delay
|
- Requires significant VRAM (8GB+ for basic models, 24GB+ for larger ones).
|
||||||
- Memory system focuses on significant interactions
|
- Initial setup and configuration of the local AI stack can be complex.
|
||||||
|
|
||||||
### Ethics & Safety
|
### Ethics & Safety
|
||||||
- Teto is programmed to maintain appropriate, family-friendly interactions
|
- Teto is programmed to maintain appropriate, family-friendly interactions
|
||||||
|
|
|
||||||
167
docs/docker-compose-examples.md
Normal file
167
docs/docker-compose-examples.md
Normal file
|
|
@ -0,0 +1,167 @@
|
||||||
|
# Docker Compose Examples for Local AI Stack
|
||||||
|
|
||||||
|
This document provides production-ready `docker-compose.yml` examples for setting up the self-hosted AI services required by the Teto AI Companion bot. These services should be included in the same `docker-compose.yml` file as the `teto_ai` bot service itself to ensure proper network communication.
|
||||||
|
|
||||||
|
> [!IMPORTANT]
|
||||||
|
> These examples require a host machine with an NVIDIA GPU and properly installed drivers. They use CDI (Container Device Interface) for GPU reservations, which is the modern standard for Docker.
|
||||||
|
|
||||||
|
## 🤖 vLLM Service (Language & Vision Model)
|
||||||
|
|
||||||
|
This service uses `vLLM` to serve a powerful language model with an OpenAI-compatible API endpoint. This allows Teto to perform natural language understanding and generation locally. If you use a multi-modal model, this service will also provide vision capabilities.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
vllm-openai:
|
||||||
|
# This section reserves GPU resources for the container.
|
||||||
|
# It ensures vLLM has exclusive access to the NVIDIA GPUs.
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: cdi
|
||||||
|
device_ids: ['nvidia.com/gpu=all']
|
||||||
|
capabilities: ['gpu']
|
||||||
|
# Mount local directories for model weights and cache.
|
||||||
|
# This prevents re-downloading models on every container restart.
|
||||||
|
volumes:
|
||||||
|
- /path/to/your/llm_models/hf_cache:/root/.cache/huggingface
|
||||||
|
- /path/to/your/llm_models:/root/LLM_models
|
||||||
|
# Map the container's port 8000 to a host port (e.g., 11434).
|
||||||
|
# Your .env file should point to this host port.
|
||||||
|
ports:
|
||||||
|
- "11434:8000"
|
||||||
|
environment:
|
||||||
|
# (Optional) Add your Hugging Face token if needed for private models.
|
||||||
|
- HUGGING_FACE_HUB_TOKEN=your_hf_token_here
|
||||||
|
# Optimizes PyTorch memory allocation, can improve performance.
|
||||||
|
- PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512,garbage_collection_threshold:0.8
|
||||||
|
# Necessary for multi-GPU communication and performance.
|
||||||
|
ipc: host
|
||||||
|
image: vllm/vllm-openai:latest
|
||||||
|
# --- vLLM Command Line Arguments ---
|
||||||
|
# These arguments configure how vLLM serves the model.
|
||||||
|
# Adjust them based on your model and hardware.
|
||||||
|
command: >
|
||||||
|
--model jeffcookio/Mistral-Small-3.2-24B-Instruct-2506-awq-sym
|
||||||
|
--tensor-parallel-size 2 # Number of GPUs to use.
|
||||||
|
--max-model-len 32256 # Maximum context length.
|
||||||
|
--limit-mm-per-prompt image=4 # For multi-modal models.
|
||||||
|
--enable-auto-tool-choice # For models that support tool use.
|
||||||
|
--tool-call-parser mistral
|
||||||
|
--enable-chunked-prefill
|
||||||
|
--disable-log-stats
|
||||||
|
--gpu-memory-utilization 0.75 # Use 75% of GPU VRAM.
|
||||||
|
--enable-prefix-caching
|
||||||
|
--max-num-seqs 4 # Max concurrent sequences.
|
||||||
|
--served-model-name Mistral-Small-3.2
|
||||||
|
```
|
||||||
|
|
||||||
|
### vLLM Configuration Notes
|
||||||
|
- **`--model`**: Specify the Hugging Face model identifier you want to serve.
|
||||||
|
- **`--tensor-parallel-size`**: Set this to the number of GPUs you want to use for a single model. For a single GPU, this should be `1`.
|
||||||
|
- **`--gpu-memory-utilization`**: Adjust this value based on your VRAM. `0.75` (75%) is a safe starting point.
|
||||||
|
- Check the [official vLLM documentation](https://docs.vllm.ai/en/latest/) for the latest command-line arguments and supported models.
|
||||||
|
|
||||||
|
## 🎤 Wyoming Voice Services (Piper TTS & Whisper STT)
|
||||||
|
|
||||||
|
These services provide Text-to-Speech (`Piper`) and Speech-to-Text (`Whisper`) capabilities over the `Wyoming` protocol. They run as separate containers but are managed within the same Docker Compose file.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
# --- Whisper STT Service ---
|
||||||
|
# Converts speech from the voice channel into text for Teto to understand.
|
||||||
|
wyoming-whisper:
|
||||||
|
image: slackr31337/wyoming-whisper-gpu:latest
|
||||||
|
container_name: wyoming-whisper
|
||||||
|
environment:
|
||||||
|
# Configure the Whisper model size and language.
|
||||||
|
# Smaller models are faster but less accurate.
|
||||||
|
- MODEL=base-int8
|
||||||
|
- LANGUAGE=en
|
||||||
|
- COMPUTE_TYPE=int8
|
||||||
|
- BEAM_SIZE=5
|
||||||
|
ports:
|
||||||
|
# Exposes the Wyoming protocol port for Whisper.
|
||||||
|
- "10300:10300"
|
||||||
|
volumes:
|
||||||
|
# Mount a volume to persist Whisper model data.
|
||||||
|
- /path/to/your/whisper_data:/data
|
||||||
|
restart: unless-stopped
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: cdi
|
||||||
|
device_ids: ['nvidia.com/gpu=all']
|
||||||
|
capabilities: ['gpu']
|
||||||
|
|
||||||
|
# --- Piper TTS Service ---
|
||||||
|
# Converts Teto's text responses into speech.
|
||||||
|
wyoming-piper:
|
||||||
|
image: slackr31337/wyoming-piper-gpu:latest
|
||||||
|
container_name: wyoming-piper
|
||||||
|
environment:
|
||||||
|
# Specify which Piper voice model to use.
|
||||||
|
- PIPER_VOICE=en_US-amy-medium
|
||||||
|
ports:
|
||||||
|
# Exposes the Wyoming protocol port for Piper.
|
||||||
|
- "10200:10200"
|
||||||
|
volumes:
|
||||||
|
# Mount a volume to persist Piper voice models.
|
||||||
|
- /path/to/your/piper_data:/data
|
||||||
|
restart: unless-stopped
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: cdi
|
||||||
|
device_ids: ['nvidia.com/gpu=all']
|
||||||
|
capabilities: ['gpu']
|
||||||
|
```
|
||||||
|
|
||||||
|
### Wyoming Configuration Notes
|
||||||
|
- **Multiple Ports**: Note that `Whisper` and `Piper` listen on different ports (`10300` and `10200` in this example). Your bot's configuration will need to point to the correct service and port.
|
||||||
|
- **Voice Models**: You can download different `Piper` voice models and place them in your persistent data directory to change Teto's voice.
|
||||||
|
- **GPU Usage**: These images are for GPU-accelerated voice processing. If your GPU is dedicated to `vLLM`, you may consider using CPU-based images for Wyoming to conserve VRAM.
|
||||||
|
|
||||||
|
## 🌐 Networking
|
||||||
|
|
||||||
|
For the services to communicate with each other, they must share a Docker network. Using an external network is a good practice for managing complex applications.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Add this to the bottom of your docker-compose.yml file
|
||||||
|
networks:
|
||||||
|
backend:
|
||||||
|
external: true
|
||||||
|
```
|
||||||
|
|
||||||
|
Before starting your stack, create the network manually:
|
||||||
|
```bash
|
||||||
|
docker network create backend
|
||||||
|
```
|
||||||
|
|
||||||
|
Then, ensure each service in your `docker-compose.yml` (including the `teto_ai` bot) is attached to this network:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
services:
|
||||||
|
teto_ai:
|
||||||
|
# ... your bot's configuration
|
||||||
|
networks:
|
||||||
|
- backend
|
||||||
|
|
||||||
|
vllm-openai:
|
||||||
|
# ... vllm configuration
|
||||||
|
networks:
|
||||||
|
- backend
|
||||||
|
|
||||||
|
wyoming-whisper:
|
||||||
|
# ... whisper configuration
|
||||||
|
networks:
|
||||||
|
- backend
|
||||||
|
|
||||||
|
wyoming-piper:
|
||||||
|
# ... piper configuration
|
||||||
|
networks:
|
||||||
|
- backend
|
||||||
|
```
|
||||||
|
This allows the Teto bot to communicate with `vllm-openai`, `wyoming-whisper`, and `wyoming-piper` using their service names as hostnames.
|
||||||
140
docs/setup.md
140
docs/setup.md
|
|
@ -5,16 +5,22 @@ This guide will walk you through setting up the Discord Teto Bot for video recor
|
||||||
## 📋 Prerequisites
|
## 📋 Prerequisites
|
||||||
|
|
||||||
### System Requirements
|
### System Requirements
|
||||||
- **Operating System**: Linux, macOS, or Windows with WSL2
|
- **Operating System**: Linux is strongly recommended for GPU support. Windows with WSL2 is possible.
|
||||||
- **Docker**: Version 20.10+ and Docker Compose v2+
|
- **GPU**: NVIDIA GPU with 8GB+ VRAM is required for local model hosting.
|
||||||
- **Disk Space**: Minimum 2GB for container, additional space for recordings
|
- **Docker**: Version 20.10+ and Docker Compose v2+.
|
||||||
- **Memory**: 4GB RAM recommended (2GB minimum)
|
- **Disk Space**: 20GB+ SSD for models and container images.
|
||||||
- **Network**: Stable internet connection for Discord API
|
- **Memory**: 16GB+ RAM recommended.
|
||||||
|
- **Network**: Local network for inter-service communication.
|
||||||
|
|
||||||
### Discord Requirements
|
### Discord Requirements
|
||||||
- Discord account with user token
|
- Discord account with user token.
|
||||||
- Server permissions to join voice channels
|
- Server permissions to join voice channels.
|
||||||
- Voice channel access where you want to record
|
- Voice channel access where you want to record.
|
||||||
|
|
||||||
|
### Local AI Requirements
|
||||||
|
- **LLM/VLM Model**: A downloaded language model compatible with `vLLM` (e.g., from Hugging Face).
|
||||||
|
- **TTS Voice Model**: A downloaded `Piper` voice model.
|
||||||
|
- **STT Model**: A downloaded `Whisper` model.
|
||||||
|
|
||||||
### Development Prerequisites (Optional)
|
### Development Prerequisites (Optional)
|
||||||
- **Node.js**: Version 20+ for local development
|
- **Node.js**: Version 20+ for local development
|
||||||
|
|
@ -32,14 +38,20 @@ cd discord_teto
|
||||||
|
|
||||||
### Step 2: Environment Configuration
|
### Step 2: Environment Configuration
|
||||||
|
|
||||||
Create environment variables for your Discord token:
|
Create environment variables for your Discord token and local AI endpoints:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Method 1: Export in terminal session
|
# Method 1: Export in terminal session
|
||||||
export USER_TOKEN="your_discord_user_token_here"
|
export USER_TOKEN="your_discord_user_token_here"
|
||||||
|
export VLLM_ENDPOINT="http://localhost:8000/v1"
|
||||||
|
export WYOMING_HOST="localhost"
|
||||||
|
export WYOMING_PORT="10300"
|
||||||
|
|
||||||
# Method 2: Create .env file (recommended)
|
# Method 2: Create .env file (recommended)
|
||||||
echo "USER_TOKEN=your_discord_user_token_here" > .env
|
echo "USER_TOKEN=your_discord_user_token_here" > .env
|
||||||
|
echo "VLLM_ENDPOINT=http://localhost:8000/v1" >> .env
|
||||||
|
echo "WYOMING_HOST=localhost" >> .env
|
||||||
|
echo "WYOMING_PORT=10300" >> .env
|
||||||
```
|
```
|
||||||
|
|
||||||
**Getting Your Discord Token:**
|
**Getting Your Discord Token:**
|
||||||
|
|
@ -50,24 +62,38 @@ echo "USER_TOKEN=your_discord_user_token_here" > .env
|
||||||
5. Look for requests to `discord.com/api`
|
5. Look for requests to `discord.com/api`
|
||||||
6. Find Authorization header starting with your token
|
6. Find Authorization header starting with your token
|
||||||
|
|
||||||
⚠️ **Security Warning**: Never share your Discord token publicly or commit it to version control.
|
⚠️ **Security Warning**: Never share your Discord token publicly or commit it to version control. The bot operates on a user token and has the same permissions as your user.
|
||||||
|
|
||||||
### Step 3: Directory Setup
|
### Step 3: Model & Directory Setup
|
||||||
|
|
||||||
Create the output directory for recordings:
|
|
||||||
|
|
||||||
|
1. **Create Directories**
|
||||||
|
Create directories for recordings and for your AI models.
|
||||||
```bash
|
```bash
|
||||||
mkdir -p output
|
mkdir -p output models/piper models/whisper models/llm
|
||||||
chmod 755 output
|
chmod 755 output models
|
||||||
```
|
```
|
||||||
|
This `models` directory will be mounted into your AI service containers.
|
||||||
|
|
||||||
This directory will be mounted into the Docker container to persist recordings.
|
2. **Download AI Models**
|
||||||
|
- **Language Model**: Download your chosen GGUF or other `vLLM`-compatible model and place it in `models/llm`.
|
||||||
|
- **Voice Model (Piper)**: Download a `.onnx` and `.json` voice file for Piper and place them in `models/piper`.
|
||||||
|
- **Speech-to-Text Model (Whisper)**: The Whisper service will download its model on first run, or you can pre-download it.
|
||||||
|
|
||||||
### Step 4: Docker Container Setup
|
This directory will be mounted into the Docker container to persist recordings and provide models to the AI services.
|
||||||
|
|
||||||
|
### Step 4: Local AI Stack & Bot Setup
|
||||||
|
|
||||||
|
This project uses a multi-container Docker setup for the bot and its local AI services. Your `docker-compose.yml` file should define services for:
|
||||||
|
- `teto_ai`: The bot itself.
|
||||||
|
- `vllm-openai`: The language model server, providing an OpenAI-compatible endpoint.
|
||||||
|
- `wyoming-piper`: The Text-to-Speech (TTS) service.
|
||||||
|
- `wyoming-whisper`: The Speech-to-Text (STT) service.
|
||||||
|
|
||||||
|
Below are sanitized, production-ready examples for these services. For full configuration details and explanations, please see the [Docker Compose Examples](docker-compose-examples.md) guide.
|
||||||
|
|
||||||
#### Production Setup
|
#### Production Setup
|
||||||
```bash
|
```bash
|
||||||
# Build and start the container
|
# Build and start all containers
|
||||||
docker compose up --build
|
docker compose up --build
|
||||||
|
|
||||||
# Or run in background
|
# Or run in background
|
||||||
|
|
@ -110,16 +136,19 @@ docker compose -f docker-compose.dev.yml up --build --no-deps
|
||||||
|
|
||||||
### Environment Variables
|
### Environment Variables
|
||||||
|
|
||||||
Create a `.env` file in the project root:
|
Create a `.env` file in the project root to configure the bot and its connections to the local AI services:
|
||||||
|
|
||||||
```env
|
```env
|
||||||
# Required
|
# Required: Discord Token
|
||||||
USER_TOKEN=your_discord_user_token
|
USER_TOKEN=your_discord_user_token
|
||||||
|
|
||||||
# Optional
|
# Required: Local AI Service Endpoints
|
||||||
BOT_CLIENT_ID=your_bot_application_id
|
VLLM_ENDPOINT="http://vllm:8000/v1" # Using Docker service name
|
||||||
BOT_CLIENT_SECRET=your_bot_secret
|
VLLM_MODEL="mistralai/Mistral-7B-Instruct-v0.2" # Model served by vLLM
|
||||||
BOT_REDIRECT_URI=https://your-domain.com/auth/callback
|
|
||||||
|
WYOMING_HOST="wyoming" # Using Docker service name
|
||||||
|
WYOMING_PORT="10300"
|
||||||
|
PIPER_VOICE="en_US-lessac-medium" # Voice model for Piper TTS
|
||||||
|
|
||||||
# Recording Settings (optional)
|
# Recording Settings (optional)
|
||||||
RECORDING_TIMEOUT=30000
|
RECORDING_TIMEOUT=30000
|
||||||
|
|
@ -176,17 +205,14 @@ export const VIDEO_CONFIG = {
|
||||||
|
|
||||||
## 🔒 Security Considerations
|
## 🔒 Security Considerations
|
||||||
|
|
||||||
### Token Security
|
### Data Privacy & Security
|
||||||
- Store tokens in environment variables, never in code
|
- **100% Local Processing**: All AI processing, including conversations, voice, and images, happens locally. No data is sent to external third-party services.
|
||||||
- Use `.env` files for local development (add to `.gitignore`)
|
- **Token Security**: Your Discord token should still be kept secure in a `.env` file or Docker secrets. Never commit it to version control.
|
||||||
- Consider using Docker secrets for production deployments
|
- **Network Isolation**: The AI services (`vLLM`, `Wyoming`) can be configured to only be accessible within the Docker network, preventing outside access.
|
||||||
- Rotate tokens regularly
|
|
||||||
|
|
||||||
### Container Security
|
### Container Security
|
||||||
- Bot runs as non-root user inside container
|
- The bot and AI services run as non-root users inside their respective containers.
|
||||||
- Limited system capabilities (only SYS_ADMIN for Discord GUI)
|
- Filesystem access is limited via specific volume mounts for models and output.
|
||||||
- Isolated filesystem with specific volume mounts
|
|
||||||
- No network access beyond Discord API requirements
|
|
||||||
|
|
||||||
### File Permissions
|
### File Permissions
|
||||||
```bash
|
```bash
|
||||||
|
|
@ -200,6 +226,36 @@ chmod 644 ./output/*.mkv # For recorded files
|
||||||
|
|
||||||
## 🐛 Troubleshooting Setup Issues
|
## 🐛 Troubleshooting Setup Issues
|
||||||
|
|
||||||
|
### Local AI Service Issues
|
||||||
|
|
||||||
|
**1. vLLM Container Fails to Start**
|
||||||
|
```bash
|
||||||
|
# Check vLLM logs for errors
|
||||||
|
docker compose logs vllm
|
||||||
|
|
||||||
|
# Common issues:
|
||||||
|
# - Insufficient GPU VRAM for the selected model.
|
||||||
|
# - Incorrect model path or name.
|
||||||
|
# - CUDA driver issues on the host machine.
|
||||||
|
# - Forgetting to build with --pull to get the latest base image.
|
||||||
|
```
|
||||||
|
|
||||||
|
**2. Wyoming Service Not Responding**
|
||||||
|
```bash
|
||||||
|
# Check Wyoming protocol server logs
|
||||||
|
docker compose logs wyoming
|
||||||
|
|
||||||
|
# Common issues:
|
||||||
|
# - Incorrect path to Piper voice models.
|
||||||
|
# - Port conflicts on the host (port 10300).
|
||||||
|
# - Whisper model download failure on first run.
|
||||||
|
```
|
||||||
|
|
||||||
|
**3. Teto Bot Can't Connect to AI Services**
|
||||||
|
- Verify service names in your `.env` file match the service names in `docker-compose.yml` (e.g., `http://vllm:8000/v1`).
|
||||||
|
- Ensure all containers are on the same Docker network.
|
||||||
|
- Use `docker compose ps` to see if all containers are running and healthy.
|
||||||
|
|
||||||
### Common Installation Problems
|
### Common Installation Problems
|
||||||
|
|
||||||
**1. Docker not found**
|
**1. Docker not found**
|
||||||
|
|
@ -273,14 +329,22 @@ npm install
|
||||||
|
|
||||||
### Container Health
|
### Container Health
|
||||||
```bash
|
```bash
|
||||||
# Check container status
|
# Check status of all containers (bot, vllm, wyoming)
|
||||||
docker compose ps
|
docker compose ps
|
||||||
|
|
||||||
# View resource usage
|
# View resource usage for all services
|
||||||
docker stats teto_ai
|
docker stats
|
||||||
|
|
||||||
# Monitor logs in real-time
|
# Monitor logs for a specific service in real-time
|
||||||
docker compose logs -f
|
docker compose logs -f vllm
|
||||||
|
docker compose logs -f wyoming
|
||||||
|
docker compose logs -f teto_ai
|
||||||
|
```
|
||||||
|
|
||||||
|
### GPU Resource Monitoring
|
||||||
|
```bash
|
||||||
|
# Monitor GPU VRAM and utilization on the host machine
|
||||||
|
watch -n 1 nvidia-smi
|
||||||
```
|
```
|
||||||
|
|
||||||
### Recording Status
|
### Recording Status
|
||||||
|
|
|
||||||
|
|
@ -28,7 +28,86 @@ docker inspect teto_ai | grep -A 5 "Mounts"
|
||||||
df -h ./output/
|
df -h ./output/
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🐳 Docker Issues
|
## 🤖 Local AI Stack Issues
|
||||||
|
|
||||||
|
### vLLM Service Issues
|
||||||
|
|
||||||
|
**Problem**: The `vllm` container fails to start, crashes, or doesn't respond to requests.
|
||||||
|
|
||||||
|
**Diagnosis**:
|
||||||
|
```bash
|
||||||
|
# Check the vLLM container logs for CUDA errors, model loading issues, etc.
|
||||||
|
docker compose logs vllm
|
||||||
|
|
||||||
|
# Check GPU resource usage on the host
|
||||||
|
nvidia-smi
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
|
||||||
|
1. **Insufficient VRAM**:
|
||||||
|
- The most common issue. Check the model's VRAM requirements.
|
||||||
|
- **Solution**: Use a smaller model (e.g., a 7B model requires ~8-10GB VRAM) or upgrade your GPU.
|
||||||
|
|
||||||
|
2. **CUDA & Driver Mismatches**:
|
||||||
|
- The `vLLM` container requires a specific CUDA version on the host.
|
||||||
|
- **Solution**: Ensure your NVIDIA drivers are up-to-date and compatible with the CUDA version used in the `vLLM` Docker image.
|
||||||
|
|
||||||
|
3. **Incorrect Model Path or Name**:
|
||||||
|
- The container can't find the model weights.
|
||||||
|
- **Solution**: Verify the volume mount in `docker-compose.yml` points to the correct local directory containing your models. Double-check the model name in your `.env` file.
|
||||||
|
|
||||||
|
### Wyoming (Piper/Whisper) Service Issues
|
||||||
|
|
||||||
|
**Problem**: The `wyoming` container is running, but Teto cannot speak or understand voice commands.
|
||||||
|
|
||||||
|
**Diagnosis**:
|
||||||
|
```bash
|
||||||
|
# Check the Wyoming container logs for errors related to Piper or Whisper
|
||||||
|
docker compose logs wyoming
|
||||||
|
|
||||||
|
# Test the connection from another container
|
||||||
|
docker exec -it teto_ai nc -zv wyoming 10300
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
|
||||||
|
1. **Incorrect Piper Voice Model Path**:
|
||||||
|
- The service can't find the `.onnx` and `.json` files for the selected voice.
|
||||||
|
- **Solution**: Check your volume mounts and the voice name specified in your configuration.
|
||||||
|
|
||||||
|
2. **Whisper Model Download Failure**:
|
||||||
|
- On first run, the service may fail to download the Whisper model.
|
||||||
|
- **Solution**: Ensure the container has internet access for the initial download, or manually place the model in the correct volume.
|
||||||
|
|
||||||
|
3. **Port Conflict**:
|
||||||
|
- Another service on your host might be using port `10300`.
|
||||||
|
- **Solution**: Use `netstat -tulpn | grep 10300` to check for conflicts and remap the port in `docker-compose.yml` if needed.
|
||||||
|
|
||||||
|
### Bot Can't Connect to Local AI Services
|
||||||
|
|
||||||
|
**Problem**: The Teto bot is running but logs errors about being unable to reach `vllm` or `wyoming`.
|
||||||
|
|
||||||
|
**Diagnosis**:
|
||||||
|
```bash
|
||||||
|
# Check the Teto bot logs for connection refused errors
|
||||||
|
docker compose logs teto_ai
|
||||||
|
|
||||||
|
# Ensure all services are on the same Docker network
|
||||||
|
docker network inspect <your_network_name>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions**:
|
||||||
|
|
||||||
|
1. **Incorrect Endpoint Configuration**:
|
||||||
|
- The `.env` file points to the wrong service name or port.
|
||||||
|
- **Solution**: Ensure `VLLM_ENDPOINT` and `WYOMING_HOST` use the correct service names as defined in `docker-compose.yml` (e.g., `vllm`, `wyoming`).
|
||||||
|
|
||||||
|
2. **Docker Networking Issues**:
|
||||||
|
- The containers cannot resolve each other's service names.
|
||||||
|
- **Solution**: Ensure all services are defined within the same `docker-compose.yml` and share a common network.
|
||||||
|
|
||||||
|
## 🐳 General Docker Issues
|
||||||
|
|
||||||
### Container Won't Start
|
### Container Won't Start
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue