17 KiB
AI Architecture Overview
This document provides a comprehensive overview of how Kasane Teto's AI systems work together to create a natural, engaging, and authentic virtual companion experience.
🧠 System Architecture
High-Level Overview
┌─────────────────────────────────────────────────────────────┐
│ Discord Interface Layer │
├─────────────────────────────────────────────────────────────┤
│ Event Processing │ Command Routing │ Response Handling │
├─────────────────────────────────────────────────────────────┤
│ AI Orchestration │
├─────────────────────────────────────────────────────────────┤
│ Language │ Vision │ Voice │ Memory │
│ Model │ System │ System │ System │
├─────────────────────────────────────────────────────────────┤
│ Personality Engine & Context Manager │
├─────────────────────────────────────────────────────────────┤
│ Configuration │ Prompt Mgmt │ Safety │ Learning │
└─────────────────────────────────────────────────────────────┘
Core Components
1. AI Orchestration Layer
- Coordinates between different local AI services
- Manages context flow and decision routing
- Handles multi-modal input integration
- Ensures personality consistency across modalities
2. Language Model Integration (vLLM)
- Self-hosted conversational intelligence via
vLLM - Context-aware response generation through OpenAI-compatible API
- Personality-guided prompt engineering for local models
- Multi-turn conversation management
3. Vision Processing System (vLLM Multi-modal)
- Image analysis using local multi-modal models
- Video frame processing for streams
- Visual context integration with conversations
- Automated response generation for visual content
4. Voice Synthesis & Recognition (Wyoming Protocol)
- Text-to-speech using
Piperfor Teto's voice characteristics - Speech-to-text using
Whisperfor voice command processing - Emotional tone and inflection control via TTS models
- Real-time voice conversation capabilities
5. Memory & Context System (Local)
- Local long-term conversation history storage (e.g., ChromaDB)
- User preference and relationship tracking
- Context retrieval for relevant conversations
- Local semantic search across past interactions
6. Personality Engine
- Character consistency enforcement
- Response style and tone management
- Emotional state tracking and expression
- Behavioral pattern maintenance
🔄 Processing Flow
Text Message Processing
Discord Message → Content Analysis → Context Retrieval → Personality Filter → LLM Processing → Response Generation → Discord Output
↓ ↓ ↓ ↓ ↓
Intent Detection → Memory Query → Character Prompts → Safety Check → Formatting
Step-by-Step Breakdown:
-
Message Reception
- Discord message event captured
- Basic preprocessing (user identification, channel context)
- Spam/abuse filtering
-
Content Analysis
- Intent classification (question, statement, command, emotional expression)
- Entity extraction (people, topics, references)
- Sentiment analysis and emotional context
-
Context Retrieval
- Recent conversation history (last 10-20 messages)
- Relevant long-term memories about users/topics
- Server-specific context and culture
-
Personality Application
- Character-appropriate response style selection
- Emotional state consideration
- Teto-specific mannerisms and speech patterns
-
LLM Processing
- Structured prompt construction with context
- Language model inference with personality constraints
- Multi-turn conversation awareness
-
Response Generation
- Safety and appropriateness filtering
- Response formatting for Discord
- Emoji and formatting enhancement
Image Analysis Flow
Image Upload → Image Processing → Vision Model → Context Integration → Response Generation → Discord Output
↓ ↓ ↓ ↓
Format Detection → Object/Scene → Conversation → Personality
Recognition Context Application
Processing Steps:
-
Image Reception & Preprocessing
- Image format validation and conversion
- Resolution optimization for vision models
- Metadata extraction (if available)
-
Vision Model Analysis
- Object detection and scene understanding
- Text recognition (OCR) if present
- Artistic style and composition analysis
- Emotional/aesthetic assessment
-
Context Integration
- Combine visual analysis with conversation context
- User preference consideration (known interests)
- Recent conversation topic correlation
-
Response Generation
- Generate personality-appropriate commentary
- Ask relevant follow-up questions
- Express genuine interest and engagement
Voice Interaction Flow
Voice Channel Join → Audio Processing (Whisper) → Text Processing (vLLM) → Voice Synthesis (Piper) → Audio Output
↓ ↓ ↓
Noise Filtering → Intent Detection → LLM Response → Voice Model
🧩 AI Service Integration
Language Model Configuration (vLLM)
vLLM with OpenAI-Compatible Endpoint:
const VLLM_CONFIG = {
endpoint: "http://localhost:8000/v1", // Your vLLM server
model: "mistralai/Mistral-7B-Instruct-v0.2", // Or your preferred model
temperature: 0.7, // Creative yet grounded
max_tokens: 1500, // Max response length
top_p: 0.9, // Focused sampling
frequency_penalty: 0.2, // Reduce repetition
presence_penalty: 0.1 // Encourage topic exploration
};
Prompt Engineering Structure:
SYSTEM: Character definition + personality traits + current context
USER: Conversation history + current message + visual context (if any)
ASSISTANT: Previous Teto responses for consistency
Vision Model Integration (vLLM Multi-modal)
Model Stack:
- Local Multi-modal Model - (e.g., LLaVA, Idefics) served via
vLLM - CLIP - Local image-text similarity for context matching
- Custom Fine-tuning - Potential for Teto-specific visual preferences
Processing Pipeline:
const processImage = async (imageUrl, conversationContext) => {
// Local multi-modal analysis
const localAnalysis = await analyzeWithVLLM(imageUrl);
const clipEmbedding = await getLocalCLIPEmbedding(imageUrl);
const contextMatch = await findSimilarImages(clipEmbedding);
return {
description: localAnalysis.description,
emotions: localAnalysis.emotions,
relevantMemories: contextMatch,
responseStyle: determineResponseStyle(localAnalysis, conversationContext)
};
};
Voice I/O Setup (Wyoming Protocol)
Piper TTS and Whisper STT via Wyoming:
const WYOMING_CONFIG = {
host: "localhost",
port: 10300,
piper_voice: "en_US-lessac-medium", // Or a custom-trained Teto voice
whisper_model: "base.en" // Or larger model depending on resources
};
Memory System Architecture (Local)
Vector Database Structure:
const MEMORY_SCHEMA = {
conversation_id: "unique_identifier",
timestamp: "iso_datetime",
participants: ["user_ids"],
content: {
text: "conversation_content",
summary: "ai_generated_summary",
topics: ["extracted_topics"],
emotions: ["detected_emotions"],
context_type: "casual|support|creative|gaming"
},
embeddings: {
content_vector: [768_dimensions],
topic_vector: [384_dimensions]
},
relationships: {
mentioned_users: ["user_ids"],
referenced_memories: ["memory_ids"],
follow_up_needed: boolean
}
};
🎭 Personality Engine Implementation
Character Consistency System
Core Personality Traits:
const TETO_PERSONALITY = {
base_traits: {
cheerfulness: 0.9, // Always upbeat and positive
helpfulness: 0.85, // Genuinely wants to assist
musicality: 0.8, // Strong musical interests
playfulness: 0.7, // Light humor and teasing
empathy: 0.9 // High emotional intelligence
},
speech_patterns: {
excitement_markers: ["Yay!", "Ooh!", "That's so cool!", "*bounces*"],
agreement_expressions: ["Exactly!", "Yes yes!", "Totally!"],
curiosity_phrases: ["Really?", "Tell me more!", "How so?"],
support_responses: ["*virtual hug*", "I'm here for you!", "You've got this!"]
},
interests: {
primary: ["music", "singing", "creativity", "friends"],
secondary: ["technology", "art", "games", "learning"],
conversation_starters: {
music: "What kind of music have you been listening to lately?",
creativity: "Are you working on any creative projects?",
friendship: "How has your day been treating you?"
}
}
};
Response Style Adaptation
Context-Aware Personality Adjustment:
const adaptPersonalityToContext = (context, basePersonality) => {
const adaptations = {
support_needed: {
cheerfulness: basePersonality.cheerfulness * 0.7, // More gentle
empathy: Math.min(basePersonality.empathy * 1.2, 1.0),
playfulness: basePersonality.playfulness * 0.5 // Less jokes
},
celebration: {
cheerfulness: Math.min(basePersonality.cheerfulness * 1.3, 1.0),
playfulness: Math.min(basePersonality.playfulness * 1.2, 1.0),
excitement_level: 1.0
},
creative_discussion: {
musicality: Math.min(basePersonality.musicality * 1.2, 1.0),
curiosity: 0.9,
engagement_depth: "high"
}
};
return adaptations[context.type] || basePersonality;
};
🔐 Safety & Ethics Implementation
Content Filtering Pipeline
Multi-Layer Safety System:
const safetyPipeline = async (content, context) => {
// Layer 1: Automated content filtering
const toxicityCheck = await analyzeToxicity(content);
if (toxicityCheck.score > 0.7) return { safe: false, reason: "toxicity" };
// Layer 2: Context appropriateness
const contextCheck = validateContextAppropriate(content, context);
if (!contextCheck.appropriate) return { safe: false, reason: "context" };
// Layer 3: Character consistency
const characterCheck = validateCharacterConsistency(content, TETO_PERSONALITY);
if (!characterCheck.consistent) return { safe: false, reason: "character" };
// Layer 4: Privacy protection
const privacyCheck = detectPrivateInformation(content);
if (privacyCheck.hasPrivateInfo) return { safe: false, reason: "privacy" };
return { safe: true };
};
Privacy Protection
Data Handling Principles:
- Complete Privacy - All data, including conversations, images, and voice, is processed locally.
- No External Data Transfer - AI processing does not require sending data to third-party services.
- Full User Control - Users have complete control over their data and the AI models.
- User Consent - Clear communication that all processing is done on the user's own hardware.
📊 Performance Optimization
Response Time Optimization
Caching Strategy:
const CACHE_CONFIG = {
// Frequently accessed personality responses
personality_responses: {
ttl: 3600, // 1 hour cache
max_entries: 1000
},
// Vision analysis results
image_analysis: {
ttl: 86400, // 24 hour cache
max_entries: 500
},
// User preference data
user_preferences: {
ttl: 604800, // 1 week cache
max_entries: 10000
}
};
Async Processing Pipeline:
const processMessageAsync = async (message) => {
// Start multiple processes concurrently
const [
contextData,
memoryData,
userPrefs,
intentAnalysis
] = await Promise.all([
getConversationContext(message.channel_id),
retrieveRelevantMemories(message.content),
getUserPreferences(message.author.id),
analyzeMessageIntent(message.content)
]);
// Generate response with all context
return generateResponse({
message,
context: contextData,
memories: memoryData,
preferences: userPrefs,
intent: intentAnalysis
});
};
Resource Management
Model Loading Strategy (for vLLM):
// This is typically managed by the vLLM server instance itself.
// The configuration would involve which models to load on startup.
const VLLM_SERVER_ARGS = {
model: "mistralai/Mistral-7B-Instruct-v0.2",
"tensor-parallel-size": 1, // Or more depending on GPU count
"gpu-memory-utilization": 0.9, // Use 90% of GPU memory
"max-model-len": 4096,
};
// Wyoming services for Piper/Whisper are typically persistent.
🔧 Configuration & Customization
Personality Tuning Parameters
Adjustable Personality Aspects:
const TUNABLE_PARAMETERS = {
response_length: {
min: 50,
max: 500,
preferred: 150,
adapt_to_context: true
},
emoji_usage: {
frequency: 0.3, // 30% of messages
variety: "high", // Use diverse emoji
context_appropriate: true
},
reference_frequency: {
past_conversations: 0.2, // Reference 20% of the time
user_interests: 0.4, // Reference 40% of the time
server_culture: 0.6 // Adapt 60% of the time
},
interaction_style: {
formality: 0.2, // Very casual
playfulness: 0.7, // Quite playful
supportiveness: 0.9 // Very supportive
}
};
Model Configuration
Environment-Based Configuration:
const getModelConfig = (environment) => {
const configs = {
development: {
model: "local-dev-model/gguf", // Smaller model for dev
response_time_target: 3000,
logging_level: "debug",
cache_enabled: false
},
production: {
model: "mistralai/Mistral-7B-Instruct-v0.2",
response_time_target: 1500,
logging_level: "info",
cache_enabled: true,
fallback_model: "gpt-3.5-turbo"
},
testing: {
model: "mock",
response_time_target: 100,
logging_level: "verbose",
deterministic: true
}
};
return configs[environment] || configs.production;
};
📈 Monitoring & Analytics
Performance Metrics
Key Performance Indicators:
- Response Time - Average time from message to response
- Personality Consistency - Measure of character trait adherence
- User Engagement - Conversation length and frequency metrics
- Multi-modal Success - Success rate of image/voice processing
- Memory Accuracy - Correctness of referenced past conversations
Analytics Dashboard Data:
const METRICS_TRACKING = {
response_times: {
text_only: "avg_ms",
with_image: "avg_ms",
with_voice: "avg_ms",
complex_context: "avg_ms"
},
personality_scores: {
cheerfulness_consistency: "percentage",
helpfulness_rating: "user_feedback_score",
character_authenticity: "consistency_score"
},
feature_usage: {
voice_interactions: "daily_count",
image_analysis: "daily_count",
memory_references: "accuracy_percentage",
emotional_support: "satisfaction_rating"
}
};
🚀 Future Enhancements
Planned AI Improvements
Advanced Memory System:
- Graph-based relationship mapping
- Emotional memory weighting
- Cross-server personality consistency
- Predictive conversation preparation
Enhanced Multimodal Capabilities:
- Real-time video stream analysis
- Live drawing/art creation feedback
- Music generation and composition
- Interactive storytelling with visuals
Adaptive Learning:
- Server-specific personality adaptations
- Individual user relationship modeling
- Cultural context learning
- Improved humor and timing
Technical Optimizations:
- Local LLM deployment options
- Edge computing for faster responses
- Improved caching strategies
- Better resource utilization
This AI architecture provides the foundation for Kasane Teto's natural, engaging personality while maintaining safety, consistency, and performance. The modular design allows for continuous improvement and feature expansion while preserving the core character experience users love.
For implementation details, see the Development Guide. For configuration options, see Configuration.