Powered by Ollama local models
XTTS-v2 — Multi-language voice synthesis
Faster-Whisper Large-v3 — Accurate speech recognition
Drop audio file here or click to browse
YOLOv11x — Real-time object detection
Drop an image here or click to browse
Florence-2 — Image captioning, OCR, and more
Drop an image here or click to browse
FLUX.1 Dev — via ComfyUI (must be running on port 8188)