ChatGPT Guide for Developers

Q: How do I limit token usage per request?

Set the max_tokens parameter in the request payload. For example, max_tokens: 150 will stop generation after roughly 150 tokens.

Q: Can I stream responses instead of waiting for the full completion?

Yes. Include stream: true in the request body. The API returns a Server‑Sent Events (SSE) stream you can read line‑by‑line.

ChatGPT is a versatile language model that developers can embed in apps, bots, and data pipelines. This guide walks you through the core concepts, how to set up the OpenAI API, typical request‑response flows, advanced patterns like function calling, and the pitfalls that trip up most newcomers.

Conceptual Overview
Setup & Installation
Core Workflows
Advanced Patterns
Common Mistakes
FAQ

Conceptual Overview

ChatGPT is built on the GPT‑4 architecture. It predicts the next token based on the prompt you send. Tokens are roughly 4 characters of English text, so a 100‑word sentence uses about 75 tokens.

Stateless vs. Stateful Calls

Stateless calls treat each request as independent. You include the entire conversation in the messages array. Stateful designs keep a short‑lived session ID on the client and only send new user messages, reducing payload size.

Pricing Snapshot (April 2024)

Model	Input ($/1k tokens)	Output ($/1k tokens)
gpt‑4o	0.005	0.015
gpt‑4‑turbo	0.003	0.006
gpt‑3.5‑turbo	0.0005	0.0015

Choosing the right model balances cost and capability. For most dev tools, gpt‑4‑turbo offers the best trade‑off.

Setup & Installation

1. Obtain an API Key

Log in to platform.openai.com, generate a key, and store it in .env as OPENAI_API_KEY. Never commit the key to source control.

2. Install the Official SDK

# Python
pip install openai

# Node.js
npm install openai

3. Verify Connectivity

Run a quick test to ensure the key works.

# Python
import openai, os
openai.api_key = os.getenv("OPENAI_API_KEY")
print(openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role":"user","content":"Hello"}]
))

Core Workflows

Simple Completion

Send a single user message and receive a reply.

# Node.js
import OpenAI from "openai";
const client = new OpenAI({apiKey: process.env.OPENAI_API_KEY});
const response = await client.chat.completions.create({
  model: "gpt-4-turbo",
  messages: [{role:"user",content:"Explain HTTP status 418"}],
  max_tokens: 120
});
console.log(response.choices[0].message.content);

Multi‑turn Conversation

Maintain context by appending prior messages.

let history = [
  {role:"system",content:"You are a helpful coding assistant."},
  {role:"user",content:"How do I sort an array in JavaScript?"}
];
history.push({role:"assistant",content:"Use Array.prototype.sort()."});
history.push({role:"user",content:"Show me an example with numbers."});
const result = await client.chat.completions.create({
  model:"gpt-4-turbo",
  messages:history,
  max_tokens:150
});

Streaming Responses

Enable real‑time UI updates by setting stream:true. The SDK returns an async iterator.

for await (const chunk of client.chat.completions.create({
  model:"gpt-4-turbo",
  messages:[{role:"user",content:"Write a haiku about code."}],
  stream:true
})) {
  process.stdout.write(chunk.choices[0].delta.content || "");
}

Advanced Patterns

Function Calling (Structured Output)

Define a JSON schema and let the model return data that matches it.

const functions = [{
  name:"get_weather",
  description:"Fetch current weather for a city",
  parameters:{
    type:"object",
    properties:{
      city:{type:"string",description:"City name"},
      unit:{type:"string",enum:["celsius","fahrenheit"]}
    },
    required:["city"]
  }
}];
const resp = await client.chat.completions.create({
  model:"gpt-4o",
  messages:[{role:"user",content:"What’s the weather in Berlin?"}],
  functions,
  function_call:"auto"
});

The response includes function_call with arguments ready for your backend.

Tool Use with Retrieval Augmented Generation (RAG)

Combine vector search (e.g., Pinecone) with ChatGPT to answer domain‑specific questions.

Embed documents with text-embedding-3-large.
Store embeddings in a vector DB.
When a query arrives, retrieve top‑k chunks.
Pass retrieved text in the system prompt: “Use only the following excerpts …”.

Parallel Batch Requests

When generating many short completions (e.g., bulk summarization), use Promise.all in Node or asyncio.gather in Python to send up to 20 requests concurrently without hitting rate limits.

Cost‑Control Strategies

Set max_tokens based on UI constraints.
Use gpt-3.5-turbo for drafts, upgrade to gpt-4‑turbo only for final polishing.
Enable logprobs once per day to monitor token efficiency.

Common Mistakes

Hard‑coding the API Key

Embedding the key in source files leads to accidental leaks. Always read from environment variables or secret managers.

Ignoring Token Limits

Requests that exceed the model’s context window (e.g., 128k tokens for gpt‑4o) are truncated, causing loss of important context. Trim older messages or summarize them.

Over‑relying on Temperature=1 for factual output

Higher temperature yields creativity but reduces consistency. For deterministic answers, set temperature:0 and optionally top_p:1.

Missing Error Handling

The API returns HTTP 429 for rate limits and 500 for internal errors. Implement exponential backoff and retry logic.

Neglecting System Prompt Design

A vague system prompt leads to unpredictable tone. Example of a good prompt: “You are a concise, friendly developer assistant. Answer in plain JavaScript unless otherwise requested.”

FAQ

Do I need an OpenAI API key to use ChatGPT locally?

Yes. The API key authenticates every request. You can create one in the OpenAI dashboard and store it securely as an environment variable.

Which programming languages have official OpenAI client libraries?

Python, Node.js, Java, .NET, Go, and Ruby have first‑party libraries. Community SDKs exist for Rust, PHP, and Swift.

How do I limit token usage per request?

Set the max_tokens parameter in the request payload. For example, max_tokens: 150 will stop generation after roughly 150 tokens.

Can I stream responses instead of waiting for the full completion?

Yes. Include stream: true in the request body. The API returns a Server‑Sent Events (SSE) stream you can read line‑by‑line.

What are the most common reasons for “Rate limit exceeded” errors?

Exceeding the per‑minute request quota, sending too many tokens, or using a shared API key without a higher tier plan.

With this guide you can start building with ChatGPT today, avoid typical pitfalls, and scale your AI features responsibly.