teto_ai/docs/troubleshooting.md

626 lines
No EOL
14 KiB
Markdown

# Troubleshooting Guide
This guide helps you diagnose and resolve common issues with the Discord Teto Bot. Issues are organized by category with step-by-step solutions.
## 🔍 Quick Diagnostics
### Basic Health Check
```bash
# Check if container is running
docker compose ps
# Check recent logs
docker compose logs --tail=50
# Check bot status in Discord
# Send: teto status
```
### File System Check
```bash
# Verify output directory
ls -la ./output/
# Check container volume mount
docker inspect teto_ai | grep -A 5 "Mounts"
# Check disk space
df -h ./output/
```
## 🤖 Local AI Stack Issues
### vLLM Service Issues
**Problem**: The `vllm` container fails to start, crashes, or doesn't respond to requests.
**Diagnosis**:
```bash
# Check the vLLM container logs for CUDA errors, model loading issues, etc.
docker compose logs vllm
# Check GPU resource usage on the host
nvidia-smi
```
**Solutions**:
1. **Insufficient VRAM**:
- The most common issue. Check the model's VRAM requirements.
- **Solution**: Use a smaller model (e.g., a 7B model requires ~8-10GB VRAM) or upgrade your GPU.
2. **CUDA & Driver Mismatches**:
- The `vLLM` container requires a specific CUDA version on the host.
- **Solution**: Ensure your NVIDIA drivers are up-to-date and compatible with the CUDA version used in the `vLLM` Docker image.
3. **Incorrect Model Path or Name**:
- The container can't find the model weights.
- **Solution**: Verify the volume mount in `docker-compose.yml` points to the correct local directory containing your models. Double-check the model name in your `.env` file.
### Wyoming (Piper/Whisper) Service Issues
**Problem**: The `wyoming` container is running, but Teto cannot speak or understand voice commands.
**Diagnosis**:
```bash
# Check the Wyoming container logs for errors related to Piper or Whisper
docker compose logs wyoming
# Test the connection from another container
docker exec -it teto_ai nc -zv wyoming 10300
```
**Solutions**:
1. **Incorrect Piper Voice Model Path**:
- The service can't find the `.onnx` and `.json` files for the selected voice.
- **Solution**: Check your volume mounts and the voice name specified in your configuration.
2. **Whisper Model Download Failure**:
- On first run, the service may fail to download the Whisper model.
- **Solution**: Ensure the container has internet access for the initial download, or manually place the model in the correct volume.
3. **Port Conflict**:
- Another service on your host might be using port `10300`.
- **Solution**: Use `netstat -tulpn | grep 10300` to check for conflicts and remap the port in `docker-compose.yml` if needed.
### Bot Can't Connect to Local AI Services
**Problem**: The Teto bot is running but logs errors about being unable to reach `vllm` or `wyoming`.
**Diagnosis**:
```bash
# Check the Teto bot logs for connection refused errors
docker compose logs teto_ai
# Ensure all services are on the same Docker network
docker network inspect <your_network_name>
```
**Solutions**:
1. **Incorrect Endpoint Configuration**:
- The `.env` file points to the wrong service name or port.
- **Solution**: Ensure `VLLM_ENDPOINT` and `WYOMING_HOST` use the correct service names as defined in `docker-compose.yml` (e.g., `vllm`, `wyoming`).
2. **Docker Networking Issues**:
- The containers cannot resolve each other's service names.
- **Solution**: Ensure all services are defined within the same `docker-compose.yml` and share a common network.
## 🐳 General Docker Issues
### Container Won't Start
**Problem**: Container fails to start or exits immediately.
**Diagnosis**:
```bash
# Check container logs
docker compose logs teto_ai
# Check if ports are in use
netstat -tulpn | grep -E '(5901|3000)'
# Verify environment variables
docker compose config
```
**Solutions**:
1. **Missing Environment Variables**:
```bash
# Ensure USER_TOKEN is set
echo $USER_TOKEN
# Create .env file if missing
echo "USER_TOKEN=your_token_here" > .env
```
2. **Port Conflicts**:
```bash
# Kill processes using required ports
sudo lsof -ti:5901 | xargs kill -9
sudo lsof -ti:3000 | xargs kill -9
# Or modify docker-compose.yml to use different ports
```
3. **Insufficient Resources**:
```bash
# Check available memory
free -h
# Check disk space
df -h
# Increase Docker memory limits if needed
```
### Container Crashes During Runtime
**Problem**: Container starts but crashes during operation.
**Common Causes**:
- Out of memory during video processing
- Discord API rate limiting
- Audio/video dependency issues
- Volume mount problems
**Solutions**:
1. **Memory Issues**:
```bash
# Monitor container memory usage
docker stats teto_ai
# Increase container memory in docker-compose.yml
services:
teto_ai:
mem_limit: 2g
memswap_limit: 2g
```
2. **Audio/Video Dependencies**:
```bash
# Rebuild container with fresh dependencies
docker compose down
docker compose build --no-cache
docker compose up
```
### Volume Mount Issues
**Problem**: Recordings not appearing in `./output/` directory.
**Diagnosis**:
```bash
# Check if volume is mounted
docker inspect teto_ai | grep -A 10 "Mounts"
# Check container can write to volume
docker exec -it teto_ai touch /tmp/output/test_file
ls -la ./output/test_file
```
**Solutions**:
1. **Permission Issues**:
```bash
# Fix ownership
sudo chown -R $(id -u):$(id -g) ./output/
# Set proper permissions
chmod 755 ./output/
```
2. **Volume Not Mounted**:
```bash
# Verify docker-compose.yml has correct volume mapping
volumes:
- ./output:/tmp/output
# Recreate container
docker compose down
docker compose up --build
```
## 🎥 Recording Issues
### Bot Won't Join Voice Channel
**Problem**: `xbox record that` fails with voice connection errors.
**Error Messages**:
- "Failed to join voice channel"
- "You need to be in a voice channel"
- No response from bot
**Solutions**:
1. **User Not in Voice Channel**:
- Ensure you're connected to a voice channel before commanding
- Try leaving and rejoining the voice channel
- Check if voice channel has user limits
2. **Permission Issues**:
```bash
# Verify bot has required permissions:
# - Connect to voice channels
# - Use voice activity
# - View channels
```
3. **Discord API Issues**:
```bash
# Check Discord status
curl -s https://discordstatus.com/api/v2/status.json
# Restart container to refresh connections
docker compose restart teto_ai
```
4. **Voice Dependencies**:
```bash
# Check voice libraries in container
docker exec -it teto_ai npm list | grep -E "(opus|sodium)"
# Verify FFmpeg installation
docker exec -it teto_ai which ffmpeg
```
### No Audio/Video Captured
**Problem**: Recording starts but produces empty or no files.
**Diagnosis**:
```bash
# Check if file was created
ls -la ./output/recording-*.mkv
# Check file size
du -h ./output/recording-*.mkv
# Check container logs during recording
docker compose logs -f teto_ai
```
**Solutions**:
1. **User Has No Video/Audio**:
- Ensure the recorded user has their camera on
- Verify user is speaking/has audio activity
- Check Discord's "Camera" and "Go Live" permissions
2. **FFmpeg Issues**:
```bash
# Test FFmpeg in container
docker exec -it teto_ai ffmpeg -version
# Check codec support
docker exec -it teto_ai ffmpeg -codecs | grep -E "(opus|h264|vp8)"
```
3. **Stream Connection Problems**:
```bash
# Look for stream errors in logs
docker compose logs teto_ai | grep -i "stream\|ffmpeg\|error"
# Restart and try again
docker compose restart teto_ai
```
### Recording Stops Immediately
**Problem**: Recording starts but stops within seconds.
**Common Causes**:
- Target user leaves voice channel
- Voice connection drops
- Stream connection fails
- Container resource limits
**Solutions**:
1. **User Connectivity**:
- Ensure target user stays in voice channel
- Check user's internet connection stability
- Verify user hasn't muted/disabled video
2. **Container Resources**:
```bash
# Check resource usage during recording
docker stats teto_ai
# Increase limits if needed
# In docker-compose.yml:
mem_limit: 2g
```
## 🔌 Discord API Issues
### Bot Not Responding
**Problem**: Bot doesn't respond to any commands.
**Diagnosis**:
```bash
# Check if bot is online in Discord
# Check container logs
docker compose logs --tail=100 teto_ai
# Check Discord connection
docker compose logs teto_ai | grep -i "discord\|login\|ready"
```
**Solutions**:
1. **Invalid Token**:
```bash
# Verify token is correct and not expired
echo $USER_TOKEN | cut -c1-20
# Update token if needed
export USER_TOKEN="new_token_here"
docker compose restart teto_ai
```
2. **Rate Limiting**:
```bash
# Check for rate limit messages in logs
docker compose logs teto_ai | grep -i "rate\|limit"
# Wait and restart if rate limited
sleep 300
docker compose restart teto_ai
```
3. **Discord Outage**:
```bash
# Check Discord status
curl -s https://discordstatus.com/api/v2/status.json | jq '.status.description'
```
### Commands Not Working
**Problem**: Bot responds but specific commands don't work.
**Diagnosis**:
```bash
# Check command handler logs
docker compose logs teto_ai | grep -i "command\|execute\|error"
# Test each command individually
# 1. hello teto
# 2. teto status
# 3. xbox record that (in voice channel)
```
**Solutions**:
1. **Case Sensitivity**:
- Commands are case-insensitive, but try exact case
- Ensure no extra spaces or characters
2. **Permission Context**:
- `teto` (DM pickup) only works in Direct Messages
- Recording commands require voice channel membership
- Some commands may require specific server permissions
3. **Command Conflicts**:
```bash
# Check if multiple bots are responding
# Ensure command syntax is exact
# Verify bot has message read permissions
```
## 🔧 Development Issues
### Local Development Setup
**Problem**: Issues running bot locally without Docker.
**Solutions**:
1. **Node.js Version**:
```bash
# Ensure Node.js 20+
node --version
# Install correct version with nvm
nvm install 20
nvm use 20
```
2. **Native Dependencies**:
```bash
# Install build tools (Ubuntu/Debian)
sudo apt install build-essential python3-dev
# Install build tools (macOS)
xcode-select --install
# Rebuild native modules
npm rebuild
```
3. **Audio Dependencies**:
```bash
# Install system audio libraries
# Ubuntu/Debian:
sudo apt install libopus-dev libsodium-dev
# macOS:
brew install opus libsodium
```
### Module Import Errors
**Problem**: ES6 import/export errors.
**Solutions**:
1. **Package.json Configuration**:
```json
{
"type": "module",
"engines": {
"node": ">=20.0.0"
}
}
```
2. **File Extensions**:
```javascript
// Always use .js extension in imports
import videoService from './services/videoRecording.js';
```
## 📊 Performance Issues
### High CPU Usage
**Problem**: Container uses excessive CPU during recording.
**Solutions**:
1. **FFmpeg Optimization**:
```javascript
// In videoConfig.js, optimize encoding settings
FFMPEG_OPTIONS: {
video: {
preset: "ultrafast", // Faster encoding
crf: 28, // Lower quality = less CPU
}
}
```
2. **Resource Limits**:
```yaml
# In docker-compose.yml
services:
teto_ai:
cpus: '2.0'
mem_limit: 2g
```
### Large File Sizes
**Problem**: Recording files are too large.
**Solutions**:
1. **Encoding Settings**:
```javascript
// Adjust quality settings in videoConfig.js
FFMPEG_OPTIONS: {
video: {
crf: 28, // Higher = smaller files
preset: "medium", // Better compression
},
audio: {
bitrate: "96k" // Lower bitrate
}
}
```
2. **Recording Duration**:
```javascript
// Reduce auto-stop timeout
AUTO_STOP_TIMEOUT: 15_000 // 15 seconds instead of 30
```
## 🆘 Emergency Recovery
### Complete System Reset
If all else fails, perform a complete reset:
```bash
# 1. Stop all containers
docker compose down
# 2. Remove containers and images
docker system prune -a
# 3. Remove volumes (WARNING: deletes recordings)
docker volume prune
# 4. Reset repository
git reset --hard HEAD
git clean -fd
# 5. Rebuild from scratch
docker compose up --build
```
### Backup Recovery
```bash
# Backup current recordings before reset
cp -r ./output ./output_backup_$(date +%Y%m%d_%H%M%S)
# Restore after reset
cp -r ./output_backup_*/* ./output/
```
## 📞 Getting Further Help
### Information to Gather
When seeking help, provide:
1. **System Information**:
```bash
docker --version
docker compose version
uname -a
```
2. **Container Logs**:
```bash
docker compose logs --tail=200 teto_ai > bot_logs.txt
```
3. **Configuration**:
```bash
docker compose config > compose_config.yml
```
4. **Error Context**:
- Exact command that failed
- Error messages received
- Steps leading to the issue
- Expected vs actual behavior
### Debug Mode
Enable verbose logging:
```bash
# Set debug environment variable
export DEBUG=discord*,bot*
# Restart with debug logging
docker compose restart teto_ai
# Monitor detailed logs
docker compose logs -f teto_ai
```
## ✅ Verification Checklist
After resolving issues, verify everything works:
- [ ] Container starts without errors
- [ ] Bot appears online in Discord
- [ ] `hello teto` responds correctly
- [ ] `teto status` shows system information
- [ ] Can join voice channel with `xbox record that`
- [ ] Recording files appear in `./output/`
- [ ] Can stop recording manually
- [ ] DM pickup works (`teto` in DMs)
- [ ] Logs show no persistent errors
---
If problems persist after following this guide, consider reviewing the [architecture documentation](architecture.md) for deeper system understanding or checking the [setup guide](setup.md) for potential configuration issues.