What This Does

This server lets you send requests to OpenAI’s models (like gpt-3.5-turbo) and get back streamed responses in real time. It’s perfect if you want to build a chatbot, automate customer support, or add AI into your app.

The whole point? Take a structured request, process it with OpenAI’s API, and send back the result in bite-sized chunks so you can display or process it as it’s happening.


The Code (Core of Everything)

Here’s the main piece of code:

@app.post("/chat/completions")
async def chat_completion_stream(vapi_payload: ChatRequest):
    try:
        response = await client.chat.completions.create(
            model=vapi_payload.model,
            messages=vapi_payload.messages,
            temperature=vapi_payload.temperature,
            tools=vapi_payload.tools,
            stream=True,
        )

        async def event_stream():
            try:
                async for chunk in response:
                    yield f"data: {json.dumps(chunk.model_dump())}\\n\\n"
                yield "data: [DONE]\\n\\n"
            except Exception as e:
                print(f"Error during response streaming: {e}")
                yield f"data: {json.dumps({'error': str(e)})}\\n\\n"

        return StreamingResponse(event_stream(), media_type="text/event-stream")

    except Exception as e:
        return StreamingResponse(
            f"data: {json.dumps({'error': str(e)})}\\n\\n", media_type="text/event-stream"
        )

This code sets up a FastAPI endpoint (/chat/completions). It streams responses back to the client using OpenAI’s API and wraps everything nicely in Server-Sent Events (SSE). If anything goes wrong, it catches the error and sends it back too.


How to Use It

Sending a Request

Send a POST request to /chat/completions with a JSON payload. Here’s the structure of what you need to send:

What to Send (Input)

Field Description
model The OpenAI model you want to use (e.g., gpt-3.5-turbo).
messages A list of conversation messages (e.g., user messages + assistant replies).
temperature Controls creativity (higher = more random, lower = focused).
tools Optional. Add functions the AI can use during the chat.
stream Set to true for streamed responses.
call Metadata about the call (session details).
phoneNumber Optional. Phone metadata if relevant.
customer Optional. Info about the customer.
metadata Add any extra context.

Here’s a concrete example:

{
  "model": "gpt-3.5-turbo",
  "messages": [{"role": "user", "content": "Tell me a joke"}],
  "temperature": 0.7,
  "tools": [],
  "stream": true,
  "call": {
    "id": "12345",
    "orgId": "org001",
    "createdAt": "2024-11-29T10:00:00Z",
    "updatedAt": "2024-11-29T10:00:00Z",
    "type": "query",
    "status": "pending",
    "assistantId": "assist001"
  },
  "metadata": {
    "context": "chat_query"
  }
}

What You’ll Get (Output)

The response comes back in chunks (real-time streaming). Each chunk looks like this:

{
  "id": "chatcmpl-1234",
  "object": "chat.completion",
  "created": 1732291256,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "index": 0,
      "delta": {"content": "Why did the chicken cross the road?"},
      "finish_reason": "stop"
    }
  ]
}