I wanted the members of an online membership club to be able to ask the instructor a question right inside the course, without opening another tab or paying for a separate SaaS. That’s where the job came from: integrate AI into WordPress so the assistant lived inside the theme, with the instructor’s identity and the course context.
Google Gemini supplies the model. I supplied everything else. And as usual, the weight wasn’t in the features but in the decisions: what to send the model, what to do when it fails, where to store the history, and how much personality to give it.
That’s what this article is about: the decisions I made building the assistant, and the piece of architecture each one touches.
The problem
The instructor can’t be live in every unit and every session. A student has a question at eleven at night and there’s nobody on the other end. That’s the gap I wanted to close.
The easy way out would have been to bolt on an Intercom, a Crisp, or any third-party chatbot. I ruled that out fast. It means pulling student data outside the system, depending on an external service that can change its pricing or its policy, and paying a monthly fee for something I only need inside my own platform.
The assistant had to work in two places: group sessions and course units, which in the theme are two separate custom post types. Same logic, two contexts. That alone pushed me to build it inside the theme rather than slapping an external widget on top.
The architecture in one sentence
In one sentence: the browser talks to a PHP proxy, and the proxy talks to the Gemini API.
The browser never touches the API directly. It sends its message to the theme’s own endpoint (api/chat.php), and that PHP is what builds the request, adds the key, and calls Google. That way the key never reaches the frontend, and I have a single place to validate who’s asking.
Three pieces. The first is the proxy itself, which validates identity: every request requires a WordPress nonce and an authenticated user. Without both, the endpoint doesn’t respond. The second is a custom MySQL table where I store the conversation history. The third is the system prompt, the text that gives the model its role as instructor; more on that below, because that’s where there was a decision to make.
The assistant shows up in the interface with an avatar and the instructor’s identity, not as an anonymous bot. And I gave it a voice: input and output through the browser’s Web Speech API, which is free, leaving it ready to plug in paid voices if needed.
The decisions that weren’t obvious
Sending only the last 20 messages
Gemini charges per input token, and a conversation just keeps growing. If I send the full history with every question, cost and latency climb with each turn without the answer getting any better.
So I only pass it the last 20 messages as context. The full history stays in the database (I keep up to 100 messages per user and post), but only the recent window travels to the model. For a question about the unit you’re looking at, 20 messages is plenty. On top of that I set a rate limit of one request every 2 seconds per user, using WordPress transients, so nobody hammers the API by accident or on purpose.
Assuming Gemini would fail
Model APIs return 503 and 429 more often than you’d like: the service is saturated, your quota is maxed out. If I treat that as an error and show the student a “something went wrong,” the assistant looks unreliable when the problem is just temporary.
So I treated failure as the normal case. Each request retries up to 10 times with exponential backoff: it waits 300 ms and, if it fails again, doubles the wait (0.6s, 1.2s…) up to a cap of 8s per attempt. Most 503s clear up on the second or third try without the user noticing a thing.
set_time_limit(120);
$maxAttempts = 10;
for ($attempt = 1; $attempt <= $maxAttempts; $attempt++) {
$ch = curl_init($apiUrl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Content-Type: application/json']);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
$response = curl_exec($ch);
$curlError = curl_error($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($curlError) break;
$decoded = json_decode($response, true);
if ($httpCode !== 503 && $httpCode !== 429) break;
if ($attempt < $maxAttempts) {
$wait = min(300000 * (2 ** ($attempt - 1)), 8000000); // 0.3s→0.6→1.2→...→8s
usleep($wait);
}
}
An inheritable system prompt
The system prompt is what turns Gemini into “the instructor of this course.” I made it configurable from the WordPress panel, per session or per unit. But filling in a prompt by hand for every unit of a long course is tedious and easy to forget.
The rule: if a unit has no system prompt of its own, it inherits the one from the parent course. So you define the tone and context once at the course level, and only write a specific prompt where something actually changes. Fewer fields to maintain and fewer ways for something to end up empty by oversight.
Forcing plain text
By default, Gemini answers in markdown: asterisks for bold, hashes for headings, dashes for lists. In a chat inside the course that renders as noise: either the symbols show up raw, or I have to build a parser just to strip them.
The fix was one of the cheapest in the project: an instruction at the end of the system prompt that forces it to always answer in plain text, no markdown. One line. It saved me all the sanitizing logic on the frontend.
Claude Code’s role
I built a good chunk of this with Claude Code from the terminal. It wasn’t asking it to “write me an AI assistant for WordPress” and pasting whatever came out. It was working in short cycles: I’d lay out a piece, review what it proposed, adjust it, and move on.
Where it really helps is the scaffolding: setting up the curl call, laying out the table structure, wiring up the rate-limit transients. It handles that fast and saves me the mechanical part.
Where I had to steer was the architecture decisions. The 20-message window, the retry cap, inheriting the prompt from the parent course: the tool doesn’t decide that, I do, based on what I know about the project. Claude Code tends to give you the correct, generic solution; the right one for this club came out of my head, and it implemented it. That division of labor is what makes the CLI useful in real web development and not just for toy prototypes.
Build it yourself or reach for a plugin?
Is it worth building this by hand? It depends, and I’ll take a side.
If what you want is a generic support chatbot on some run-of-the-mill site, no. There are plugins that do it in an afternoon, and maintaining them isn’t your problem. Reinventing it from scratch there is programmer ego, not good engineering.
I built it custom because I needed things a plugin wouldn’t give me without a fight: the instructor’s identity, the inheritable per-course system prompt, the history in my own database, and fine control over what I send the model and how much. When the assistant is part of the product and not an add-on, owning the whole piece is worth the work.
The useful question isn’t “can I build it?” You almost always can. It’s “is this piece central to what I offer?” If it is, build it and understand it from the inside. If it isn’t, pay for the plugin and spend the time on what actually matters.