Switching the Text-to-Speech (TTS)¶

GhostBrain uses OpenAI's tts-1 model by default. If you want hyper-realistic voices (like ElevenLabs) or extremely low latency models (like Cartesia), you can easily swap the provider.

Step 1: Update Dependencies¶

Install the necessary Pipecat extra. For ElevenLabs:

hatch run pip install "pipecat-ai[elevenlabs]"

Step 2: Add the API Key¶

In .env:

GHOST_BRAIN_ELEVENLABS_API_KEY=your_key_here
GHOST_BRAIN_ELEVENLABS_VOICE_ID=your_voice_id_here

In src/ghost_brain/config.py:

class Settings(BaseSettings):
    # ...
    elevenlabs_api_key: str = Field(default="")
    elevenlabs_voice_id: str = Field(default="21m00Tcm4TlvDq8ikWAM") # Example voice

Step 3: Modify the TTS Factory¶

Open src/ghost_brain/services/tts.py. Swap OpenAITTSService for ElevenLabsTTSService.

# from pipecat.services.openai import OpenAITTSService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from ghost_brain.config import Settings

def create_tts(settings: Settings) -> ElevenLabsTTSService:
    """
    Create ElevenLabs TTS service.
    """
    return ElevenLabsTTSService(
        api_key=settings.elevenlabs_api_key,
        voice_id=settings.elevenlabs_voice_id,
        # Elevenlabs requires passing the exact expected output sample rate
        # Make sure this matches your pipeline configuration!
        output_format="pcm_16000"
    )

The rest of the Pipecat pipeline will automatically consume the audio frames generated by ElevenLabs and stream them directly back to the user via Twilio or your local microphone.