💬

AI Portfolio Assistant

Production-Grade Conversational AI with Memory and Intelligent Response Modes

A full-stack AI chatbot featuring conversation memory, automatic mode detection, real-time logging, and analytics. Built to provide a personalized portfolio Q&A experience with enterprise-grade reliability.

View Code
Next.jsFastAPIGoogle GeminiPythonDockerCloud Run
Real-Time
1.2s
Average Response Time
Auto-Detection
4 Modes
Intelligent Response Types
Cross-Page Memory
100%
Session Persistence
🎯

The Challenge

Building an AI assistant isn't just about connecting to an API. A production-ready chatbot must handle context, provide varied responses, persist state, and offer observability—all while maintaining sub-2-second response times.

Generic responses feel robotic and unhelpful
Conversation context lost on page navigation
No visibility into user questions and response quality
Slow API calls create poor user experience

Design Goal:

Create a conversational AI that adapts its response style based on question type, remembers context across interactions, and provides production-grade logging—all while feeling instant and natural.

🏗️

System Architecture

Full-Stack Data Flow

1
Next.js Frontend
React chat widget with state management
2
FastAPI Backend
Python API with async handling
3
Gemini AI
Mode detection + generation
4
Logging System
Analytics & monitoring

Frontend (Next.js)

  • • React component with smooth animations (Framer Motion)
  • • Session storage for conversation persistence
  • • Optimistic UI updates for instant feedback
  • • Error handling with graceful fallbacks

🐍Backend (FastAPI)

  • • Async Python for concurrent request handling
  • • Mode detection logic (deep-dive, quick, story)
  • • Conversation history management
  • • Structured logging with timestamps and metadata

Intelligent Features

01

Multi-Mode Response System

The AI automatically detects question intent and adapts its response style for optimal user experience.

Deep-Dive

Trigger: Technical questions, "how does", "explain"

Returns detailed explanations with code snippets and architecture details

Quick Answer

Trigger: Simple facts, "what is", "when"

Concise 1-2 sentence responses for fast answers

Story Mode

Trigger: Personal questions, "why", "journey"

Engaging narrative responses about experiences and motivations

Default

Trigger: General queries

Balanced informative responses

Implementation:

Mode detection uses Gemini's understanding of question patterns, combined with keyword analysis and conversation context. Each mode has custom system prompts optimized for that response style.

02

Session Persistence & Memory

Conversations persist across page reloads and navigation, creating a continuous dialogue experience.

Browser Storage

Session state saved in localStorage with conversation history

Context Injection

Previous messages sent with each API call for continuity

Smart Summarization

Long conversations auto-summarized to stay within token limits

Implementation:

Frontend maintains conversation array in localStorage. Backend receives full history and uses sliding window approach to keep recent context while respecting Gemini's token limits.

03

Production Logging & Analytics

Every interaction is logged with rich metadata for monitoring, debugging, and improvement.

Request Logging

Timestamp, user query, detected mode, session ID

Response Metrics

Generation time, token count, response length

Error Tracking

API failures, timeout events, rate limits

Analytics Dashboard

Query patterns, popular questions, performance trends

Implementation:

Structured JSON logs written to file system with rotation. FastAPI middleware captures timing metrics. Future: Integration with Grafana/Prometheus for real-time monitoring.

🔧

Technical Implementation

Mode Detection Logic (FastAPI Backend)

Smart prompt engineering determines optimal response style

# FastAPI endpoint with mode detection
@app.post("/api/chat")
async def chat(request: ChatRequest):
    user_message = request.message
    history = request.history
    
    # Detect response mode based on question patterns
    mode = detect_mode(user_message)
    
    # Build context-aware prompt
    system_prompt = get_system_prompt(mode)
    context = build_context(history)
    
    # Call Gemini API
    start_time = time.time()
    response = await gemini_client.generate_content(
        model="gemini-1.5-flash",
        contents=[
            {"role": "system", "parts": [system_prompt]},
            {"role": "user", "parts": [context + user_message]}
        ]
    )
    
    # Log interaction
    log_interaction(
        user_message=user_message,
        mode=mode,
        response_time=time.time() - start_time,
        tokens=response.usage_metadata.total_tokens
    )
    
    return {"response": response.text, "mode": mode}

Session Persistence (Next.js Frontend)

Conversation state management with localStorage

// ChatWidget component with session persistence
const [messages, setMessages] = useState([]);

// Load conversation on mount
useEffect(() => {
  const savedMessages = localStorage.getItem('chat_history');
  if (savedMessages) {
    setMessages(JSON.parse(savedMessages));
  }
}, []);

// Save after each message
useEffect(() => {
  localStorage.setItem('chat_history', JSON.stringify(messages));
}, [messages]);

// Send message with full context
const sendMessage = async (userMessage) => {
  const newMessage = { role: 'user', content: userMessage };
  setMessages(prev => [...prev, newMessage]);
  
  // Include conversation history for context
  const response = await fetch('/api/chat', {
    method: 'POST',
    body: JSON.stringify({
      message: userMessage,
      history: messages // Full context sent to backend
    })
  });
  
  const { response: aiResponse, mode } = await response.json();
  setMessages(prev => [...prev, { 
    role: 'assistant', 
    content: aiResponse, 
    mode 
  }]);
};

Technical Challenges

01

Achieving Sub-2-Second Response Times

Challenge:

Gemini API calls can take 2-5 seconds for complex prompts. Users expect instant responses. Network latency and cold starts add additional delays.

Solution:

Implemented optimistic UI (show typing indicator immediately), used Gemini 1.5 Flash (faster variant), reduced prompt length through smart summarization, added response streaming for progressive display, and cached common questions.

Impact:

Average response time reduced from 3.8s to 1.2s. User-perceived latency feels instant due to optimistic UI and streaming.

02

Context Window Management

Challenge:

Long conversations exceed Gemini's token limits. Sending full history with each request is expensive and slow.

Solution:

Implemented sliding window approach: keep last 5 message pairs in full detail, summarize older messages into brief context, and use compression techniques for repetitive information.

Impact:

99% of conversations stay within token limits without losing critical context. API costs reduced by 60%.

03

Production Deployment & Observability

Challenge:

Need to monitor real-world usage, debug failures, and iterate based on actual user questions. Simple console.log isn't sufficient.

Solution:

Built structured logging system with JSON output, implemented request/response tracking with unique session IDs, added error categorization (API failures vs. user errors), created log analysis scripts for insights.

Impact:

Can identify popular questions, debug issues retroactively, and measure real response times in production. Informed 3 major feature improvements based on log analysis.

🚀

Deployment Architecture

🐳Containerization

  • Docker: FastAPI backend containerized for consistent deployment
  • Multi-stage builds: Optimized image size (from 1.2GB → 180MB)
  • Environment configs: API keys managed via secrets

☁️Cloud Run Deployment

  • Serverless scaling: Auto-scales from 0 to N instances
  • Cost-effective: Pay only for actual request processing time
  • Global CDN: Low-latency responses worldwide
🎓

Key Learnings & Future Work

What I Learned

  • User experience is 50% about actual speed, 50% about perceived speed (optimistic UI matters!)
  • Production logging is not optional—you're blind without it
  • Context management is the hardest part of conversational AI, not the LLM itself
  • Docker + Cloud Run makes deployment trivially easy compared to traditional servers

Future Enhancements

  • Add RAG (Retrieval-Augmented Generation) to ground responses in actual project documentation
  • Implement analytics dashboard with Grafana for real-time monitoring
  • Add voice input/output for accessibility and mobile experience
  • A/B test different system prompts to optimize response quality

Try the AI Assistant Now

The chat widget you see in the bottom-right corner is this exact system in action. Ask it anything about my projects!

View More Projects