Skip to content

Switching the Text-to-Speech (TTS)

GhostBrain uses OpenAI's tts-1 model by default. If you want hyper-realistic voices (like ElevenLabs) or extremely low latency models (like Cartesia), you can easily swap the provider.

Step 1: Update Dependencies

Install the necessary Pipecat extra. For ElevenLabs:

hatch run pip install "pipecat-ai[elevenlabs]"

Step 2: Add the API Key

  1. In .env:

    GHOST_BRAIN_ELEVENLABS_API_KEY=your_key_here
    GHOST_BRAIN_ELEVENLABS_VOICE_ID=your_voice_id_here
    

  2. In src/ghost_brain/config.py:

    class Settings(BaseSettings):
        # ...
        elevenlabs_api_key: str = Field(default="")
        elevenlabs_voice_id: str = Field(default="21m00Tcm4TlvDq8ikWAM") # Example voice
    

Step 3: Modify the TTS Factory

Open src/ghost_brain/services/tts.py. Swap OpenAITTSService for ElevenLabsTTSService.

# from pipecat.services.openai import OpenAITTSService
from pipecat.services.elevenlabs import ElevenLabsTTSService
from ghost_brain.config import Settings

def create_tts(settings: Settings) -> ElevenLabsTTSService:
    """
    Create ElevenLabs TTS service.
    """
    return ElevenLabsTTSService(
        api_key=settings.elevenlabs_api_key,
        voice_id=settings.elevenlabs_voice_id,
        # Elevenlabs requires passing the exact expected output sample rate
        # Make sure this matches your pipeline configuration!
        output_format="pcm_16000"
    )

The rest of the Pipecat pipeline will automatically consume the audio frames generated by ElevenLabs and stream them directly back to the user via Twilio or your local microphone.