Conversational AI · System Design

Building Multi‑Mode Chatbots

How I built a portfolio assistant that changes how it answers you, instead of spitting the same generic paragraph for every question.

Most chatbots feel the same after five messages: long, rambling answers when you just wanted something quick, or shallow replies when you were actually asking for a proper deep dive. I wanted my portfolio assistant to behave more like a focused collaborator than a random “chat with my portfolio” gimmick.

This post walks through how I designed a multi‑mode chatbot for my portfolio: one brain, but four different “personalities” depending on what you ask it.

Why multiple response modes?

When people land on a portfolio, they don’t all want the same thing. Some want:

Fast, straight answers about skills and stack.
Deep technical breakdowns of a project.
Story‑style context: why I built something, what I learned.
General guidance about my research interests and direction.

For that reason, the assistant supports four modes:

Quick Answer – short, direct responses when the question is simple.
Deep‑Dive – detailed breakdowns with structure when the question is technical.
Story Mode – more narrative answers for background, motivation, and personal journey.
Default – a balanced mode when the intent isn’t very clear.

How mode detection actually works

I didn’t want a huge pile of hard‑coded if/else checks trying to guess what the user wanted. Instead, I combine a few simple signals:

Obvious keywords like “explain”, “how does”, “architecture” lean towards deep‑dive.
Short, fact‑style questions like “what stack did you use?” lean towards quick answers.
Questions that sound like “why”, “journey”, “how did you get into” are good candidates for story mode.
If none of that is clear, it falls back to the default mode.

That signal is then fed into the system prompt sent to the model. Each mode has its own short set of instructions that change how the answer is shaped, without changing the underlying model.

Keeping conversations coherent

A separate problem is context. If you refresh the page or navigate to a different project, you shouldn’t lose the entire conversation.

On the frontend, I store the conversation in localStorage and keep a clean array of messages. On each new request, the backend receives a trimmed version of that history: the last few turns in full plus a short summary of what has happened so far. That keeps things efficient while still letting the assistant “remember” what you were talking about.

Making it feel fast

Raw model latency can easily be a few seconds, especially with longer context. To make the experience feel fast:

Show a typing indicator immediately.
Render the user’s message instantly in the chat window.
Keep prompts lean and avoid unnecessary verbosity.

In practice, the average response time sits around a little over a second, which feels comfortable for a portfolio setting.

What I’d like to add next

There are a few directions I’d like to take this further:

Adding retrieval so the assistant can quote from real project docs.
Letting users explicitly switch modes when they know what they want.
A small analytics dashboard to see what questions people ask the most.

The main takeaway for me: the model is only half the story. The rest is UX, context management, and being intentional about how you want the assistant to behave.