Persistent Conversation Context Over Stateless Messaging APIs

A.C. Jokela

2025-12-04

Abstract

Modern AI assistants like ChatGPT have fundamentally changed user expectations around conversational interfaces. Users now expect to have coherent, multi-turn conversations where the AI remembers what was said earlier in the discussion. However, when building AI-powered bots on top of messaging platforms like Signal, Telegram, or SMS, developers face a fundamental architectural challenge: these platforms are inherently stateless. Each message arrives as an independent event with no built-in mechanism for maintaining conversational context.

This paper examines a production implementation that bridges this gap, enabling persistent multi-turn AI conversations over Signal's stateless messaging protocol. We explore the database schema design, the command parsing architecture, and a novel inline image reference system that allows users to incorporate visual context into ongoing conversations.

1. Introduction

1.1 The Statefulness Problem

Large Language Models (LLMs) like GPT-4 and GPT-5 are stateless by design. Each API call is independent—the model has no memory of previous interactions unless the developer explicitly includes conversation history in each request. Services like ChatGPT create the illusion of memory by maintaining conversation state server-side and replaying the full message history with each new user input.

When building a bot on a messaging platform, developers must solve this same problem, but with additional constraints:

Message Independence: Each incoming message from Signal (or similar platforms) arrives as a discrete event with no connection to previous messages.
Multi-User Environments: In group chats, multiple users may be conducting separate conversations with the bot simultaneously.
Asynchronous Delivery: Messages may arrive out of order or with significant delays.
Platform Limitations: Most messaging APIs provide no native support for threading or conversation tracking.
Resource Constraints: Storing complete conversation histories for every interaction can become expensive, both in terms of storage and API costs (since longer histories mean more tokens per request).

1.2 Design Goals

Our implementation targets the following objectives:

Conversation Continuity: Users should be able to continue previous conversations by referencing a conversation ID.
New Conversation Simplicity: Starting a fresh conversation should require no special syntax—just send a message.
Multi-Modal Support: Users should be able to reference images stored in the system within their conversational context.
Cost Transparency: Each response should report the API cost and attribute it correctly for multi-user billing scenarios.
Thread Safety: The system must handle concurrent conversations from multiple users without data corruption.

2. Database Schema Design

2.1 Conversation Tables

The persistence layer uses SQLite with a straightforward two-table design:

CREATE TABLE gpt_conversations (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    created_at TEXT NOT NULL
);

CREATE TABLE gpt_conversation_messages (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    conversation_id INTEGER NOT NULL,
    created_at TEXT NOT NULL,
    role TEXT NOT NULL,
    content TEXT NOT NULL,
    FOREIGN KEY (conversation_id) REFERENCES gpt_conversations(id)
);

The gpt_conversations table serves as a lightweight header, storing only the conversation ID and creation timestamp. The actual message content lives in gpt_conversation_messages, which maintains the full history of each conversation.

2.2 Schema Rationale

Several design decisions merit explanation:

Minimal Conversation Metadata: The gpt_conversations table intentionally stores minimal information. We considered adding fields like user_id, title, or summary, but found these complicated the implementation without providing sufficient value. The conversation ID alone is enough to retrieve and continue any conversation.

Text Storage for Timestamps: Rather than using SQLite's native datetime types, we store ISO 8601 formatted strings. This provides timezone awareness (critical for a system serving users across time zones) and human readability when debugging.

Content as Plain Text: The content field stores the raw message text, not a structured format. This keeps the schema simple and avoids premature optimization. When multi-modal content (like inline images) is needed, we resolve references at query time rather than storing binary data in the conversation history.

Foreign Key Constraints: The foreign key relationship between messages and conversations ensures referential integrity and enables cascading deletes if conversation cleanup is needed.

3. Conversation Management API

3.1 Core Operations

The database abstraction layer exposes three primary operations:

def create_gpt_conversation(first_message: GPTMessage) -> int:
    """Create a new conversation and return its ID."""
    with get_db_connection() as conn:
        cur = conn.cursor()
        cur.execute(
            "INSERT INTO gpt_conversations (created_at) VALUES (?)",
            (pendulum.now("America/Chicago").isoformat(),),
        )
        new_id = cur.lastrowid
        conn.commit()
        add_message_to_conversation(new_id, first_message)
        return new_id

The create_gpt_conversation function atomically creates both the conversation record and its first message. This ensures that no conversation exists without at least one message, maintaining data consistency.

def add_message_to_conversation(conversation_id: int, message: GPTMessage):
    """Append a message to an existing conversation."""
    with get_db_connection() as conn:
        cur = conn.cursor()
        cur.execute(
            """INSERT INTO gpt_conversation_messages
               (conversation_id, created_at, role, content)
               VALUES (?, ?, ?, ?)""",
            (conversation_id, pendulum.now().isoformat(),
             message.role, message.content),
        )
        conn.commit()

def get_messages_for_conversation(conversation_id: int) -> List[GPTMessage]:
    """Retrieve all messages in chronological order."""
    with get_db_connection() as conn:
        cur = conn.cursor()
        cur.execute(
            """SELECT created_at, role, content
               FROM gpt_conversation_messages
               WHERE conversation_id = ?
               ORDER BY created_at ASC""",
            (conversation_id,),
        )
        rows = cur.fetchall()
        return [GPTMessage(role=row[1], content=row[2]) for row in rows]

3.2 The GPTMessage Data Class

Messages are represented using a simple data class that mirrors the OpenAI API's message format:

@dataclass
class GPTMessage:
    role: str      # "user", "assistant", or "system"
    content: str   # The message text (or structured content for multi-modal)

This alignment with the OpenAI API structure means messages can be retrieved from the database and passed directly to the API without transformation, reducing complexity and potential for bugs.

4. Command Parsing and Conversation Flow

4.1 Command Syntax

The bot supports an optional conversation ID in its command syntax:

gpt <prompt>                    # Start new conversation
gpt <conversation_id> <prompt>  # Continue existing conversation

This is implemented via a regex pattern that makes the conversation ID optional:

def _process_gpt_command(text: str, command: str, model: GPTModel) -> bool:
    pat = rf"^{command} (\d+ )?\s?(.*)"
    m = re.search(pat, text, flags=re.IGNORECASE | re.DOTALL)
    if not m:
        return False

    conversation_id = m.groups()[0]  # None if not provided
    prompt = m.groups()[1]

4.2 Conversation Branching Logic

The command handler implements distinct paths for new versus continued conversations:

if conversation_id:
    # Continue existing conversation
    signal_archive_db.add_message_to_conversation(
        conversation_id, GPTMessage(role="user", content=prompt)
    )
    messages = signal_archive_db.get_messages_for_conversation(conversation_id)
    conv_id = conversation_id
else:
    # Start new conversation
    first_message = GPTMessage(role="user", content=prompt)
    conv_id = signal_archive_db.create_gpt_conversation(first_message)
    messages = signal_archive_db.get_messages_for_conversation(conv_id)

For continued conversations, we first persist the new user message, then retrieve the complete history. For new conversations, we create the conversation record (which automatically adds the first message), then retrieve it back. This ensures consistency—what we send to the API exactly matches what's stored in the database.

4.3 Response Handling and Storage

After receiving the AI's response, we store it as an assistant message:

gpt_response = gpt_api.gpt_completion(api_messages, model=model)
response_text = gpt_response.get("text", "Error: No text in response")

bot_message = GPTMessage(role="assistant", content=response_text)
signal_archive_db.add_message_to_conversation(conv_id, bot_message)

send_message(
    f"[conversation {conv_id}] {response_text}\n"
    f"cost: \${cost:.4f}, payer: {payer}"
)

The response always includes the conversation ID, making it easy for users to continue the conversation later. Including cost and payer information provides transparency in multi-user environments where API expenses are shared or attributed.

5. Multi-Modal Conversations: Inline Image References

5.1 The Challenge

Signal allows sending images as attachments, but these are ephemeral—they arrive with the message and aren't easily referenced later. For AI conversations, users often want to ask follow-up questions about an image discussed earlier, or reference images from the bot's archive in new conversations.

5.2 The `imageid=` Syntax

We implemented a lightweight markup syntax that lets users embed image references in their prompts:

gpt imageid=123 What's happening in this image?
gpt 42 imageid=123 imageid=456 Compare these two images

The syntax is intentionally simple—imageid= followed by a numeric ID. Multiple images can be included in a single prompt.

5.3 Implementation

Image references are resolved at request time through a two-stage process:

IMAGE_ID_REGEX = re.compile(r"imageid=(\d+)", re.IGNORECASE)

def _build_inline_image_content(prompt: str) -> tuple[list | str, list[int]]:
    """Convert imageid= references to OpenAI API image payloads."""

    image_ids = IMAGE_ID_REGEX.findall(prompt)
    if not image_ids:
        return prompt, []

    contents: list[dict] = []
    cleaned_prompt = IMAGE_ID_REGEX.sub("", prompt).strip()
    contents.append({"type": "text", "text": cleaned_prompt})

    embedded_ids: list[int] = []
    for raw_id in image_ids:
        image_id = int(raw_id)
        image_result = image_manager.get_image_by_id(image_id)
        if not image_result:
            raise ValueError(f"Image ID {image_id} not found")

        _, image_path = image_result
        image_bytes = image_manager.read_image_bytes(image_path)
        image_b64 = base64.b64encode(image_bytes).decode("utf-8")

        contents.append({
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_b64}"}
        })
        embedded_ids.append(image_id)

    return contents, embedded_ids

The function extracts image IDs from the prompt, removes the imageid= markers from the text, loads each referenced image from disk, base64-encodes it, and constructs the multi-modal content structure expected by the OpenAI API.

5.4 Applying to Full Conversations

Since conversations may span multiple messages with image references, we apply this transformation to the entire message history:

def _prepare_messages_with_inline_images(
    messages: list[GPTMessage],
) -> tuple[list[GPTMessage], list[int]]:
    """Transform all messages, resolving image references."""

    prepared: list[GPTMessage] = []
    referenced_image_ids: list[int] = []

    for message in messages:
        content = message.content
        if message.role == "user" and isinstance(content, str):
            content, ids = _build_inline_image_content(content)
            referenced_image_ids.extend(ids)

        prepared.append(GPTMessage(role=message.role, content=content))

    return prepared, referenced_image_ids

This approach means the database stores the original imageid= references as plain text, while the actual image data is resolved fresh for each API call. This has several advantages:

Storage Efficiency: We don't duplicate image data in conversation history.
Image Updates: If an image is re-processed or corrected, subsequent conversation continuations automatically use the updated version.
Auditability: The stored conversation clearly shows which images were referenced.

6. Concurrency and Thread Safety

6.1 Threading Model

Each command runs in its own daemon thread to avoid blocking the main message processing loop:

def _process_gpt_command(text: str, command: str, model: GPTModel) -> bool:
    # ... validation ...

    current_user_context = gpt_api.get_user_context()

    def my_func():
        try:
            gpt_api.set_user_context(current_user_context)
            # ... conversation processing ...
        finally:
            gpt_api.clear_user_context()

    thread = threading.Thread(target=my_func)
    thread.daemon = True
    thread.start()
    return True

6.2 User Context Propagation

The system tracks which user initiated each request for cost attribution. Since this context is stored in thread-local storage, we must capture it before spawning the worker thread and restore it inside the thread:

current_user_context = gpt_api.get_user_context()

def my_func():
    try:
        gpt_api.set_user_context(current_user_context)
        # ... API calls use this context for billing ...
    finally:
        gpt_api.clear_user_context()

6.3 Database Connection Safety

SQLite connections are managed via context managers, ensuring proper cleanup even if exceptions occur:

with get_db_connection() as conn:
    cur = conn.cursor()
    # ... operations ...
    conn.commit()

Each database operation acquires its own connection, avoiding issues with SQLite's threading limitations while maintaining data consistency.

7. Practical Considerations

7.1 Conversation Length and Token Limits

As conversations grow, they consume more tokens per API call. The current implementation sends the complete history with each request, which can become expensive for long conversations. Production deployments might consider:

Summarization: Periodically summarizing older messages to reduce token count.
Windowing: Only sending the N most recent messages.
Smart Truncation: Using the model to identify and retain the most relevant context.

7.2 Error Handling

The implementation includes robust error handling for common failure modes:

try:
    api_messages, embedded_images = _prepare_messages_with_inline_images(messages)
except ValueError as e:
    logger.error(f"Failed to attach images for GPT request: {e}")
    send_message(str(e))
    return

Invalid image references fail fast with clear error messages rather than sending malformed requests to the API.

7.3 User Experience

The response format provides all information users need to continue conversations:

[conversation 42] Here's my analysis of the image...
cost: \$0.0234, payer: jon

Users can immediately reference conversation 42 in their next message to continue the discussion.

8. Conclusion

Building persistent conversational AI over stateless messaging platforms requires careful consideration of data modeling, state management, and user experience. Our implementation demonstrates that a relatively simple database schema combined with thoughtful command parsing can provide a seamless multi-turn conversation experience.

The inline image reference system shows how platform limitations can be overcome through creative syntax design, allowing users to build rich multi-modal conversations without the messaging platform's native support.

This architecture has proven robust in production, handling concurrent users, long-running conversations, and multi-modal content while maintaining data consistency and providing transparency into API costs. The patterns described here are applicable beyond Signal to any stateless messaging platform where persistent AI conversations are desired.