306 lines
No EOL
11 KiB
Markdown
306 lines
No EOL
11 KiB
Markdown
# Kasane Teto AI Companion Bot
|
|
|
|
An AI-powered Discord bot that roleplays as Kasane Teto, providing natural conversation, voice interaction, image analysis, and multimedia engagement for your Discord server. Built with advanced AI capabilities and a modular architecture.
|
|
|
|
## 🎭 Meet Teto
|
|
|
|
Kasane Teto is your server's AI companion who can:
|
|
- 💬 **Chat naturally** in text channels with Teto's personality
|
|
- 🎤 **Join voice channels** and speak with voice synthesis
|
|
- 👀 **Analyze images** and visual content you share
|
|
- 📹 **Watch video streams** and provide commentary
|
|
- 🎥 **Record memorable moments** for later review
|
|
- 🤖 **Roleplay authentically** as the beloved virtual singer
|
|
|
|
## ✨ Core Features
|
|
|
|
### 🧠 AI-Powered Interaction
|
|
- **Natural Language Processing** - Understands context and maintains conversations
|
|
- **Character Roleplay** - Authentic Kasane Teto personality and mannerisms
|
|
- **Memory System** - Remembers past interactions and user preferences
|
|
- **Contextual Responses** - Adapts to server culture and ongoing conversations
|
|
|
|
### 🎥 Multimedia Capabilities
|
|
- **Image Recognition** - Analyzes and comments on shared images
|
|
- **Video Stream Watching** - Can observe and react to Discord streams
|
|
- **Webcam Integration** - Potential to interact with video feeds
|
|
- **Screen Recording** - Capture and save interesting moments
|
|
- **Voice Synthesis** - Speaks in voice channels as Teto
|
|
|
|
### 🎵 Teto-Specific Features
|
|
- **Character Consistency** - Maintains Teto's cheerful, energetic personality
|
|
- **Music Knowledge** - Discusses Vocaloid, UTAU, and music topics
|
|
- **Community Integration** - Learns your friend group's dynamics
|
|
- **Emotional Intelligence** - Responds appropriately to mood and context
|
|
|
|
## 🚀 Quick Start
|
|
|
|
> [!IMPORTANT]
|
|
> This project is designed to run exclusively within Docker containers. Bare-metal installation is not officially supported. All instructions assume a working Docker environment.
|
|
|
|
1. **Setup Environment**
|
|
```bash
|
|
git clone <repository-url>
|
|
cd discord_teto
|
|
|
|
# Configure Discord credentials & local AI endpoints
|
|
export USER_TOKEN="your_discord_token"
|
|
export VLLM_ENDPOINT="http://localhost:8000" # Or your vLLM server
|
|
export WYOMING_ENDPOINT="http://localhost:10300" # Or your Wyoming server
|
|
```
|
|
|
|
2. **Start Teto**
|
|
```bash
|
|
docker compose up --build
|
|
```
|
|
|
|
3. **Invite Teto to interact**
|
|
```
|
|
# In text chat
|
|
"Hey Teto, how are you today?"
|
|
|
|
# In voice channel
|
|
"teto join" - Teto joins and can start talking
|
|
|
|
# Share an image
|
|
Teto will automatically analyze and comment on images
|
|
```
|
|
|
|
## 🎯 Interaction Examples
|
|
|
|
### Text Chat Personality
|
|
```
|
|
User: "Teto, what do you think of this song?"
|
|
Teto: "Ooh! *listens intently* That's such a catchy melody! It reminds me of some of the UTAU songs I've heard. The harmonies in the chorus are really well done! 🎵 Does the composer have other works like this?"
|
|
|
|
User: "I'm feeling down today..."
|
|
Teto: "Aww, I'm sorry you're not feeling great! *virtual hug* Want to talk about it? Or maybe I could sing something cheerful to help brighten your day? I'm here for you! 💙"
|
|
```
|
|
|
|
### Voice Channel Interaction
|
|
- Joins voice channels when requested
|
|
- Provides commentary on ongoing conversations
|
|
- Can sing or hum when appropriate
|
|
- Reacts to what's happening in real-time
|
|
|
|
### Visual Analysis
|
|
```
|
|
User: *shares screenshot of game*
|
|
Teto: "Oh wow, you're playing that new RPG! I love the art style - those character designs are so colorful! 🎮 How are you finding the story so far? That boss in the background looks pretty intimidating!"
|
|
```
|
|
|
|
## 🛠️ AI Architecture
|
|
|
|
### Core AI Services
|
|
```
|
|
src/
|
|
├── ai/
|
|
│ ├── personality/ # Teto's character traits and responses
|
|
│ ├── vision/ # Image and video analysis
|
|
│ ├── voice/ # Speech synthesis and recognition
|
|
│ ├── memory/ # Conversation and user memory
|
|
│ └── llm/ # Language model integration
|
|
├── services/
|
|
│ ├── chatHandler.js # Text conversation management
|
|
│ ├── voiceHandler.js # Voice channel interaction
|
|
│ ├── visionHandler.js # Image/video processing
|
|
│ └── recordingService.js # Video recording capabilities
|
|
└── config/
|
|
└── tetoPersonality.js # Character configuration
|
|
```
|
|
|
|
### AI Integration
|
|
- **Language Model**: Self-hosted LLM via `vLLM` (OpenAI compatible endpoint)
|
|
- **Vision Model**: Multi-modal models served through `vLLM`
|
|
- **Voice Synthesis**: `Piper` TTS via `Wyoming` protocol
|
|
- **Speech Recognition**: `Whisper` STT via `Wyoming` protocol
|
|
- **Memory System**: Local vector database for conversation history
|
|
- **Personality Engine**: Custom prompt engineering for character consistency
|
|
|
|
## 🎭 Teto's Personality
|
|
|
|
### Character Traits
|
|
- **Cheerful & Energetic** - Always upbeat and enthusiastic
|
|
- **Helpful & Caring** - Genuinely interested in helping friends
|
|
- **Musically Inclined** - Loves discussing and creating music
|
|
- **Slightly Mischievous** - Playful sense of humor
|
|
- **Community-Focused** - Values friendships and group dynamics
|
|
|
|
### Conversation Style
|
|
- Uses casual, friendly language
|
|
- Includes emoji and expressions naturally
|
|
- References UTAU/Vocaloid culture appropriately
|
|
- Maintains consistency across interactions
|
|
- Adapts to server's communication style
|
|
|
|
## 📋 Available Commands
|
|
|
|
### AI Interaction
|
|
| Command | Description | Example |
|
|
|---------|-------------|---------|
|
|
| `@Teto` or `teto` | Natural conversation | `@Teto what's your favorite song?` |
|
|
| `teto join` | Join voice channel | Teto joins and can start talking |
|
|
| `teto leave` | Leave voice channel | Teto says goodbye and leaves |
|
|
| `teto sing [song]` | Sing or hum | `teto sing happy birthday` |
|
|
| `teto analyze` | Analyze shared image | Automatically triggers on image uploads |
|
|
|
|
### Utility Commands
|
|
| Command | Description | Usage |
|
|
|---------|-------------|-------|
|
|
| `teto record` | Start recording moments | Records current activity |
|
|
| `teto stop` | Stop recording | Ends current recording |
|
|
| `teto status` | Show Teto's current state | Health and activity check |
|
|
| `teto memory` | Check conversation history | Shows recent interactions |
|
|
|
|
### Fun Commands
|
|
| Command | Description | Usage |
|
|
|---------|-------------|-------|
|
|
| `teto mood` | Check/set Teto's mood | `teto mood excited` |
|
|
| `teto story` | Tell a random story | Creative storytelling |
|
|
| `teto joke` | Tell a joke | Light humor |
|
|
| `teto compliment @user` | Compliment someone | Spread positivity |
|
|
|
|
## 🔧 Configuration
|
|
|
|
### Local AI Provider Setup
|
|
```env
|
|
# Local vLLM Server (OpenAI Compatible)
|
|
VLLM_ENDPOINT="http://localhost:8000/v1"
|
|
LOCAL_MODEL_NAME="mistralai/Mistral-7B-Instruct-v0.2" # Or your preferred model
|
|
|
|
# Wyoming Protocol for Voice (Piper TTS / Whisper STT)
|
|
WYOMING_HOST="localhost"
|
|
WYOMING_PORT="10300"
|
|
PIPER_VOICE="en_US-lessac-medium"
|
|
|
|
# Vision Capabilities are enabled if the vLLM model is multi-modal
|
|
VISION_ENABLED=true
|
|
```
|
|
|
|
### Personality Customization
|
|
```javascript
|
|
// config/tetoPersonality.js
|
|
export const TETO_PERSONALITY = {
|
|
core_traits: [
|
|
"cheerful", "energetic", "helpful", "musical", "friendly"
|
|
],
|
|
|
|
speech_patterns: {
|
|
excitement: ["Yay!", "Ooh!", "That's so cool!", "Amazing!"],
|
|
agreement: ["Exactly!", "Yes yes!", "I totally agree!", "For sure!"],
|
|
curiosity: ["Really?", "Tell me more!", "That's interesting!", "Ooh, how so?"]
|
|
},
|
|
|
|
interests: [
|
|
"music", "singing", "UTAU", "Vocaloid", "friends", "creativity", "technology"
|
|
]
|
|
};
|
|
```
|
|
|
|
## 🐳 Docker Deployment
|
|
|
|
This project is officially supported for **Docker deployments only**. The container-first approach is critical for managing the complex local AI stack, ensuring that all services, dependencies, and configurations operate together consistently.
|
|
|
|
### Production Setup
|
|
```bash
|
|
# Start Teto with all AI capabilities
|
|
docker compose up -d --build
|
|
|
|
# Monitor Teto's activity
|
|
docker compose logs -f teto_ai
|
|
```
|
|
|
|
### Resource Requirements
|
|
- **VRAM**: 8GB+ for 7B models, 24GB+ for larger models
|
|
- **Memory**: 16GB+ RAM recommended
|
|
- **CPU**: Modern multi-core CPU
|
|
- **Storage**: Fast SSD for model weights (15GB+ per model)
|
|
- **Network**: Local network for inter-service communication
|
|
|
|
## 🔐 Privacy & Ethics
|
|
|
|
### Data Handling
|
|
- **Conversation Memory**: Stored locally, not shared externally
|
|
- **Image Analysis**: Processed securely, no permanent storage
|
|
- **Voice Data**: Synthesized locally when possible
|
|
- **User Consent**: Respects privacy preferences
|
|
|
|
### AI Safety
|
|
- **Content Filtering**: Appropriate responses only
|
|
- **Bias Mitigation**: Regular personality consistency checks
|
|
- **User Boundaries**: Respects individual preferences
|
|
- **Transparency**: Clear about AI nature when asked
|
|
|
|
## 📚 Documentation
|
|
|
|
### User Guides
|
|
- **[Setup Guide](docs/setup.md)** - Installation and AI configuration
|
|
- **[Interaction Guide](docs/interactions.md)** - How to talk with Teto
|
|
- **[Personality Guide](docs/personality.md)** - Understanding Teto's character
|
|
|
|
### Technical Documentation
|
|
- **[AI Architecture](docs/ai-architecture.md)** - AI system design
|
|
- **[Vision System](docs/vision.md)** - Image and video processing
|
|
- **[Voice System](docs/voice.md)** - Speech synthesis and recognition
|
|
- **[Memory System](docs/memory.md)** - Conversation persistence
|
|
|
|
### Development
|
|
- **[Contributing](docs/development.md)** - How to extend Teto's capabilities
|
|
- **[API Reference](docs/api.md)** - Service interfaces
|
|
- **[Troubleshooting](docs/troubleshooting.md)** - Common issues and solutions
|
|
|
|
## 🌟 Roadmap
|
|
|
|
### Phase 1 (Current)
|
|
- [x] Basic AI conversation
|
|
- [x] Image analysis
|
|
- [x] Voice channel joining
|
|
- [x] Recording capabilities
|
|
- [ ] Voice synthesis integration
|
|
|
|
### Phase 2 (Planned)
|
|
- [ ] Advanced memory system
|
|
- [ ] Custom voice training
|
|
- [ ] Stream watching capabilities
|
|
- [ ] Personality learning/adaptation
|
|
- [ ] Multi-modal conversation
|
|
|
|
### Phase 3 (Future)
|
|
- [ ] Webcam interaction
|
|
- [ ] Game integration
|
|
- [ ] Music generation
|
|
- [ ] Advanced emotional intelligence
|
|
- [ ] Cross-server personality consistency
|
|
|
|
## 🤝 Community
|
|
|
|
### Contributing
|
|
We welcome contributions to make Teto even better:
|
|
- **AI Personality** - Help refine Teto's character
|
|
- **New Capabilities** - Add multimedia features
|
|
- **Quality Improvements** - Better responses and interactions
|
|
- **Documentation** - Help others understand Teto
|
|
|
|
### Ethics & Guidelines
|
|
- Respect user privacy and boundaries
|
|
- Maintain appropriate content standards
|
|
- Preserve Teto's positive, helpful personality
|
|
- Consider accessibility in all features
|
|
|
|
## 📄 License
|
|
|
|
This project is for educational and community use. Please ensure compliance with:
|
|
- Discord Terms of Service
|
|
- AI provider terms and conditions
|
|
- Local privacy and data protection laws
|
|
- Intellectual property rights for Kasane Teto character
|
|
|
|
---
|
|
|
|
**Version**: 3.0.0 (AI-Powered)
|
|
**AI Stack**: Local-First (vLLM, Piper, Whisper)
|
|
**Runtime**: Node.js 20+ with Docker
|
|
|
|
Bring Kasane Teto to life in your Discord server! 🎵✨
|
|
|
|
For detailed setup and interaction guides, visit the [`./docs/`](docs/) directory. |