Skip to content

Architecture

System Overview

AI Music Studio is a multi-agent system that generates MIDI backing tracks by simulating a session band. LLM-powered musician agents collaborate under a session orchestrator to produce genre-appropriate parts, which are then processed by audio engineering agents for mixing and mastering.


High-Level Architecture

graph TB
    subgraph "Client Layer"
        CLI[CLI Tool]
        API_CLIENT[REST Client / curl]
    end

    subgraph "API Layer"
        API[FastAPI REST API]
    end

    subgraph "Orchestration Layer"
        SO[SessionOrchestrator]
        SC[SessionContext]
    end

    subgraph "Musician Agents"
        DR[DrummerAgent]
        BA[BassistAgent]
        GU[GuitaristAgent]
        KY[KeyboardistAgent]
    end

    subgraph "Engineering Agents"
        MX[MixerAgent]
        MA[MasteringAgent]
    end

    subgraph "Core Engine"
        ME[MidiEngine]
        MT[MusicTheory]
        DM[Pydantic Models]
        PR[Pattern Library]
    end

    subgraph "DAW Integration"
        T1[FluidSynth / TiMidity]
        T2[GarageBand / Logic Pro]
        T3[MIDI / WAV Export]
    end

    subgraph "LLM Layer"
        LLM[OpenAI / Anthropic / Local]
    end

    CLI --> SO
    API_CLIENT --> API
    API --> SO
    SO --> DR
    SO --> BA
    SO --> GU
    SO --> KY
    SO --> MX
    SO --> MA
    DR --> ME
    BA --> ME
    GU --> ME
    KY --> ME
    ME --> MT
    ME --> PR
    ME --> DM
    MX --> T3
    MA --> T3
    T3 --> T1
    T3 --> T2
    DR -.->|optional| LLM
    BA -.->|optional| LLM
    GU -.->|optional| LLM
    KY -.->|optional| LLM

Orchestration Order

Agents run sequentially so each can react to what came before:

1. DrummerAgent    → establishes the groove
2. BassistAgent    → locks to the kick drum + chord root
3. GuitaristAgent  → fills harmonic space around bass
4. KeyboardistAgent (optional) → adds pads/voicings in remaining space
5. MixerAgent      → assigns levels, pan, EQ to all tracks
6. MasteringAgent  → applies final loudness metadata

This order is intentional and stable. Do not change it unless the task explicitly requires it.


Data Flow

sequenceDiagram
    participant Client
    participant Orchestrator
    participant Agents
    participant MidiEngine
    participant DAW

    Client->>Orchestrator: SessionRequest(genre, key, mode, tempo)
    Orchestrator->>Orchestrator: build chord_progression per section
    loop For each agent
        Orchestrator->>Agents: generate(context, chord_progression)
        Agents->>MidiEngine: build_track(notes, velocities, durations)
        MidiEngine-->>Agents: MidiTrackData
        Agents-->>Orchestrator: MidiTrackData
    end
    Orchestrator->>MidiEngine: merge_tracks()
    MidiEngine-->>Orchestrator: combined MidiFile
    Orchestrator->>DAW: export(session_id, midi_file)
    DAW-->>Client: file paths

Module Map

src/audio_engineer/
├── agents/
│   ├── base.py              BaseMusician, BaseEngineer, SessionContext
│   ├── orchestrator.py      SessionOrchestrator
│   ├── musician/
│   │   ├── drummer.py       DrummerAgent
│   │   ├── bassist.py       BassistAgent
│   │   ├── guitarist.py     GuitaristAgent
│   │   └── keyboardist.py   KeyboardistAgent
│   └── engineer/
│       ├── mixer.py         MixerAgent
│       └── mastering.py     MasteringAgent
├── core/
│   ├── models.py            Pydantic models
│   ├── music_theory.py      Scales, chords, progressions
│   ├── midi_engine.py       MIDI file construction (mido)
│   ├── patterns.py          Genre-specific pattern library
│   ├── rhythm.py            Rhythmic utilities
│   └── constants.py         TICKS_PER_BEAT, MIDI note maps
├── daw/
│   ├── base.py              AbstractDAWBackend
│   ├── export.py            Raw MIDI / WAV file export
│   ├── fluidsynth.py        FluidSynth rendering
│   ├── timidity.py          TiMidity rendering
│   ├── garageband.py        GarageBand AppleScript integration
│   └── logic_pro.py         Logic Pro AppleScript integration
├── api/
│   ├── app.py               FastAPI application factory
│   └── routes/              Sessions, tracks, exports
└── config/
    ├── settings.py          pydantic-settings configuration
    └── logging.py           Logging setup

Key Design Decisions

Why Sequential Generation?

Each instrument in a real session band listens to what has already been played. The drummer sets the groove; the bassist locks to the kick; the guitarist fills harmonic space around the bass. Sequential generation naturally models this dependency chain.

Why Pydantic Models?

All external boundaries (API requests/responses, agent outputs, config) use Pydantic v2 models. This gives us: - Validated inputs at runtime - Clear, serializable data contracts between agents - Auto-generated OpenAPI schema for the REST API

Why mido?

mido is a lightweight, pure-Python MIDI library. It provides direct control over MIDI message construction without heavyweight abstractions, which keeps the MIDI engine deterministic and testable.

DAW Integration Tiers

Tier Backends Method
1 FluidSynth, TiMidity Subprocess call — automated, cross-platform
2 GarageBand, Logic Pro AppleScript / OSA — macOS only, semi-automated
3 MIDI export, WAV export Manual import — universal fallback