teto_ai/README.md

306 lines
No EOL
11 KiB
Markdown

# Kasane Teto AI Companion Bot
An AI-powered Discord bot that roleplays as Kasane Teto, providing natural conversation, voice interaction, image analysis, and multimedia engagement for your Discord server. Built with advanced AI capabilities and a modular architecture.
## 🎭 Meet Teto
Kasane Teto is your server's AI companion who can:
- 💬 **Chat naturally** in text channels with Teto's personality
- 🎤 **Join voice channels** and speak with voice synthesis
- 👀 **Analyze images** and visual content you share
- 📹 **Watch video streams** and provide commentary
- 🎥 **Record memorable moments** for later review
- 🤖 **Roleplay authentically** as the beloved virtual singer
## ✨ Core Features
### 🧠 AI-Powered Interaction
- **Natural Language Processing** - Understands context and maintains conversations
- **Character Roleplay** - Authentic Kasane Teto personality and mannerisms
- **Memory System** - Remembers past interactions and user preferences
- **Contextual Responses** - Adapts to server culture and ongoing conversations
### 🎥 Multimedia Capabilities
- **Image Recognition** - Analyzes and comments on shared images
- **Video Stream Watching** - Can observe and react to Discord streams
- **Webcam Integration** - Potential to interact with video feeds
- **Screen Recording** - Capture and save interesting moments
- **Voice Synthesis** - Speaks in voice channels as Teto
### 🎵 Teto-Specific Features
- **Character Consistency** - Maintains Teto's cheerful, energetic personality
- **Music Knowledge** - Discusses Vocaloid, UTAU, and music topics
- **Community Integration** - Learns your friend group's dynamics
- **Emotional Intelligence** - Responds appropriately to mood and context
## 🚀 Quick Start
> [!IMPORTANT]
> This project is designed to run exclusively within Docker containers. Bare-metal installation is not officially supported. All instructions assume a working Docker environment.
1. **Setup Environment**
```bash
git clone <repository-url>
cd discord_teto
# Configure Discord credentials & local AI endpoints
export USER_TOKEN="your_discord_token"
export VLLM_ENDPOINT="http://localhost:8000" # Or your vLLM server
export WYOMING_ENDPOINT="http://localhost:10300" # Or your Wyoming server
```
2. **Start Teto**
```bash
docker compose up --build
```
3. **Invite Teto to interact**
```
# In text chat
"Hey Teto, how are you today?"
# In voice channel
"teto join" - Teto joins and can start talking
# Share an image
Teto will automatically analyze and comment on images
```
## 🎯 Interaction Examples
### Text Chat Personality
```
User: "Teto, what do you think of this song?"
Teto: "Ooh! *listens intently* That's such a catchy melody! It reminds me of some of the UTAU songs I've heard. The harmonies in the chorus are really well done! 🎵 Does the composer have other works like this?"
User: "I'm feeling down today..."
Teto: "Aww, I'm sorry you're not feeling great! *virtual hug* Want to talk about it? Or maybe I could sing something cheerful to help brighten your day? I'm here for you! 💙"
```
### Voice Channel Interaction
- Joins voice channels when requested
- Provides commentary on ongoing conversations
- Can sing or hum when appropriate
- Reacts to what's happening in real-time
### Visual Analysis
```
User: *shares screenshot of game*
Teto: "Oh wow, you're playing that new RPG! I love the art style - those character designs are so colorful! 🎮 How are you finding the story so far? That boss in the background looks pretty intimidating!"
```
## 🛠️ AI Architecture
### Core AI Services
```
src/
├── ai/
│ ├── personality/ # Teto's character traits and responses
│ ├── vision/ # Image and video analysis
│ ├── voice/ # Speech synthesis and recognition
│ ├── memory/ # Conversation and user memory
│ └── llm/ # Language model integration
├── services/
│ ├── chatHandler.js # Text conversation management
│ ├── voiceHandler.js # Voice channel interaction
│ ├── visionHandler.js # Image/video processing
│ └── recordingService.js # Video recording capabilities
└── config/
└── tetoPersonality.js # Character configuration
```
### AI Integration
- **Language Model**: Self-hosted LLM via `vLLM` (OpenAI compatible endpoint)
- **Vision Model**: Multi-modal models served through `vLLM`
- **Voice Synthesis**: `Piper` TTS via `Wyoming` protocol
- **Speech Recognition**: `Whisper` STT via `Wyoming` protocol
- **Memory System**: Local vector database for conversation history
- **Personality Engine**: Custom prompt engineering for character consistency
## 🎭 Teto's Personality
### Character Traits
- **Cheerful & Energetic** - Always upbeat and enthusiastic
- **Helpful & Caring** - Genuinely interested in helping friends
- **Musically Inclined** - Loves discussing and creating music
- **Slightly Mischievous** - Playful sense of humor
- **Community-Focused** - Values friendships and group dynamics
### Conversation Style
- Uses casual, friendly language
- Includes emoji and expressions naturally
- References UTAU/Vocaloid culture appropriately
- Maintains consistency across interactions
- Adapts to server's communication style
## 📋 Available Commands
### AI Interaction
| Command | Description | Example |
|---------|-------------|---------|
| `@Teto` or `teto` | Natural conversation | `@Teto what's your favorite song?` |
| `teto join` | Join voice channel | Teto joins and can start talking |
| `teto leave` | Leave voice channel | Teto says goodbye and leaves |
| `teto sing [song]` | Sing or hum | `teto sing happy birthday` |
| `teto analyze` | Analyze shared image | Automatically triggers on image uploads |
### Utility Commands
| Command | Description | Usage |
|---------|-------------|-------|
| `teto record` | Start recording moments | Records current activity |
| `teto stop` | Stop recording | Ends current recording |
| `teto status` | Show Teto's current state | Health and activity check |
| `teto memory` | Check conversation history | Shows recent interactions |
### Fun Commands
| Command | Description | Usage |
|---------|-------------|-------|
| `teto mood` | Check/set Teto's mood | `teto mood excited` |
| `teto story` | Tell a random story | Creative storytelling |
| `teto joke` | Tell a joke | Light humor |
| `teto compliment @user` | Compliment someone | Spread positivity |
## 🔧 Configuration
### Local AI Provider Setup
```env
# Local vLLM Server (OpenAI Compatible)
VLLM_ENDPOINT="http://localhost:8000/v1"
LOCAL_MODEL_NAME="mistralai/Mistral-7B-Instruct-v0.2" # Or your preferred model
# Wyoming Protocol for Voice (Piper TTS / Whisper STT)
WYOMING_HOST="localhost"
WYOMING_PORT="10300"
PIPER_VOICE="en_US-lessac-medium"
# Vision Capabilities are enabled if the vLLM model is multi-modal
VISION_ENABLED=true
```
### Personality Customization
```javascript
// config/tetoPersonality.js
export const TETO_PERSONALITY = {
core_traits: [
"cheerful", "energetic", "helpful", "musical", "friendly"
],
speech_patterns: {
excitement: ["Yay!", "Ooh!", "That's so cool!", "Amazing!"],
agreement: ["Exactly!", "Yes yes!", "I totally agree!", "For sure!"],
curiosity: ["Really?", "Tell me more!", "That's interesting!", "Ooh, how so?"]
},
interests: [
"music", "singing", "UTAU", "Vocaloid", "friends", "creativity", "technology"
]
};
```
## 🐳 Docker Deployment
This project is officially supported for **Docker deployments only**. The container-first approach is critical for managing the complex local AI stack, ensuring that all services, dependencies, and configurations operate together consistently.
### Production Setup
```bash
# Start Teto with all AI capabilities
docker compose up -d --build
# Monitor Teto's activity
docker compose logs -f teto_ai
```
### Resource Requirements
- **VRAM**: 8GB+ for 7B models, 24GB+ for larger models
- **Memory**: 16GB+ RAM recommended
- **CPU**: Modern multi-core CPU
- **Storage**: Fast SSD for model weights (15GB+ per model)
- **Network**: Local network for inter-service communication
## 🔐 Privacy & Ethics
### Data Handling
- **Conversation Memory**: Stored locally, not shared externally
- **Image Analysis**: Processed securely, no permanent storage
- **Voice Data**: Synthesized locally when possible
- **User Consent**: Respects privacy preferences
### AI Safety
- **Content Filtering**: Appropriate responses only
- **Bias Mitigation**: Regular personality consistency checks
- **User Boundaries**: Respects individual preferences
- **Transparency**: Clear about AI nature when asked
## 📚 Documentation
### User Guides
- **[Setup Guide](docs/setup.md)** - Installation and AI configuration
- **[Interaction Guide](docs/interactions.md)** - How to talk with Teto
- **[Personality Guide](docs/personality.md)** - Understanding Teto's character
### Technical Documentation
- **[AI Architecture](docs/ai-architecture.md)** - AI system design
- **[Vision System](docs/vision.md)** - Image and video processing
- **[Voice System](docs/voice.md)** - Speech synthesis and recognition
- **[Memory System](docs/memory.md)** - Conversation persistence
### Development
- **[Contributing](docs/development.md)** - How to extend Teto's capabilities
- **[API Reference](docs/api.md)** - Service interfaces
- **[Troubleshooting](docs/troubleshooting.md)** - Common issues and solutions
## 🌟 Roadmap
### Phase 1 (Current)
- [x] Basic AI conversation
- [x] Image analysis
- [x] Voice channel joining
- [x] Recording capabilities
- [ ] Voice synthesis integration
### Phase 2 (Planned)
- [ ] Advanced memory system
- [ ] Custom voice training
- [ ] Stream watching capabilities
- [ ] Personality learning/adaptation
- [ ] Multi-modal conversation
### Phase 3 (Future)
- [ ] Webcam interaction
- [ ] Game integration
- [ ] Music generation
- [ ] Advanced emotional intelligence
- [ ] Cross-server personality consistency
## 🤝 Community
### Contributing
We welcome contributions to make Teto even better:
- **AI Personality** - Help refine Teto's character
- **New Capabilities** - Add multimedia features
- **Quality Improvements** - Better responses and interactions
- **Documentation** - Help others understand Teto
### Ethics & Guidelines
- Respect user privacy and boundaries
- Maintain appropriate content standards
- Preserve Teto's positive, helpful personality
- Consider accessibility in all features
## 📄 License
This project is for educational and community use. Please ensure compliance with:
- Discord Terms of Service
- AI provider terms and conditions
- Local privacy and data protection laws
- Intellectual property rights for Kasane Teto character
---
**Version**: 3.0.0 (AI-Powered)
**AI Stack**: Local-First (vLLM, Piper, Whisper)
**Runtime**: Node.js 20+ with Docker
Bring Kasane Teto to life in your Discord server! 🎵✨
For detailed setup and interaction guides, visit the [`./docs/`](docs/) directory.