💬

AI Portfolio Assistant

Production-Grade Conversational AI with Memory and Intelligent Response Modes

A full-stack AI chatbot featuring conversation memory, automatic mode detection, real-time logging, and analytics. Built to provide a personalized portfolio Q&A experience with enterprise-grade reliability.

View Code

Next.jsFastAPIGoogle GeminiPythonDockerCloud Run

Real-Time

1.2s

Average Response Time

Auto-Detection

4 Modes

Intelligent Response Types

Cross-Page Memory

100%

Session Persistence

🎯

The Challenge

Building an AI assistant isn't just about connecting to an API. A production-ready chatbot must handle context, provide varied responses, persist state, and offer observability—all while maintaining sub-2-second response times.

✗Generic responses feel robotic and unhelpful

✗Conversation context lost on page navigation

✗No visibility into user questions and response quality

✗Slow API calls create poor user experience

Design Goal:

Create a conversational AI that adapts its response style based on question type, remembers context across interactions, and provides production-grade logging—all while feeling instant and natural.

🏗️

System Architecture

Full-Stack Data Flow

Next.js Frontend

React chat widget with state management

→

FastAPI Backend

Python API with async handling

→

Gemini AI

Mode detection + generation

→

Logging System

Analytics & monitoring

⚡Frontend (Next.js)

• React component with smooth animations (Framer Motion)
• Session storage for conversation persistence
• Optimistic UI updates for instant feedback
• Error handling with graceful fallbacks

🐍Backend (FastAPI)

• Async Python for concurrent request handling
• Mode detection logic (deep-dive, quick, story)
• Conversation history management
• Structured logging with timestamps and metadata

✨

Intelligent Features

Multi-Mode Response System

The AI automatically detects question intent and adapts its response style for optimal user experience.

• Deep-Dive

Trigger: Technical questions, "how does", "explain"

Returns detailed explanations with code snippets and architecture details

• Quick Answer

Trigger: Simple facts, "what is", "when"

Concise 1-2 sentence responses for fast answers

• Story Mode

Trigger: Personal questions, "why", "journey"

Engaging narrative responses about experiences and motivations

• Default

Trigger: General queries

Balanced informative responses

Implementation:

Mode detection uses Gemini's understanding of question patterns, combined with keyword analysis and conversation context. Each mode has custom system prompts optimized for that response style.

Session Persistence & Memory

Conversations persist across page reloads and navigation, creating a continuous dialogue experience.

• Browser Storage

Session state saved in localStorage with conversation history

• Context Injection

Previous messages sent with each API call for continuity

• Smart Summarization

Long conversations auto-summarized to stay within token limits

Implementation:

Frontend maintains conversation array in localStorage. Backend receives full history and uses sliding window approach to keep recent context while respecting Gemini's token limits.

Production Logging & Analytics

Every interaction is logged with rich metadata for monitoring, debugging, and improvement.

• Request Logging

Timestamp, user query, detected mode, session ID

• Response Metrics

Generation time, token count, response length

• Error Tracking

API failures, timeout events, rate limits

• Analytics Dashboard

Query patterns, popular questions, performance trends

Implementation:

Structured JSON logs written to file system with rotation. FastAPI middleware captures timing metrics. Future: Integration with Grafana/Prometheus for real-time monitoring.

🔧

Technical Implementation

Mode Detection Logic (FastAPI Backend)

Smart prompt engineering determines optimal response style

# FastAPI endpoint with mode detection
@app.post("/api/chat")
async def chat(request: ChatRequest):
    user_message = request.message
    history = request.history
    
    # Detect response mode based on question patterns
    mode = detect_mode(user_message)
    
    # Build context-aware prompt
    system_prompt = get_system_prompt(mode)
    context = build_context(history)
    
    # Call Gemini API
    start_time = time.time()
    response = await gemini_client.generate_content(
        model="gemini-1.5-flash",
        contents=[
            {"role": "system", "parts": [system_prompt]},
            {"role": "user", "parts": [context + user_message]}
        ]
    )
    
    # Log interaction
    log_interaction(
        user_message=user_message,
        mode=mode,
        response_time=time.time() - start_time,
        tokens=response.usage_metadata.total_tokens
    )
    
    return {"response": response.text, "mode": mode}

Session Persistence (Next.js Frontend)

Conversation state management with localStorage

// ChatWidget component with session persistence
const [messages, setMessages] = useState([]);

// Load conversation on mount
useEffect(() => {
  const savedMessages = localStorage.getItem('chat_history');
  if (savedMessages) {
    setMessages(JSON.parse(savedMessages));
  }
}, []);

// Save after each message
useEffect(() => {
  localStorage.setItem('chat_history', JSON.stringify(messages));
}, [messages]);

// Send message with full context
const sendMessage = async (userMessage) => {
  const newMessage = { role: 'user', content: userMessage };
  setMessages(prev => [...prev, newMessage]);
  
  // Include conversation history for context
  const response = await fetch('/api/chat', {
    method: 'POST',
    body: JSON.stringify({
      message: userMessage,
      history: messages // Full context sent to backend
    })
  });
  
  const { response: aiResponse, mode } = await response.json();
  setMessages(prev => [...prev, { 
    role: 'assistant', 
    content: aiResponse, 
    mode 
  }]);
};

⚡

Technical Challenges

Achieving Sub-2-Second Response Times

Challenge:

Gemini API calls can take 2-5 seconds for complex prompts. Users expect instant responses. Network latency and cold starts add additional delays.

Solution:

Implemented optimistic UI (show typing indicator immediately), used Gemini 1.5 Flash (faster variant), reduced prompt length through smart summarization, added response streaming for progressive display, and cached common questions.

Impact:

Average response time reduced from 3.8s to 1.2s. User-perceived latency feels instant due to optimistic UI and streaming.

Context Window Management

Challenge:

Long conversations exceed Gemini's token limits. Sending full history with each request is expensive and slow.

Solution:

Implemented sliding window approach: keep last 5 message pairs in full detail, summarize older messages into brief context, and use compression techniques for repetitive information.

Impact:

99% of conversations stay within token limits without losing critical context. API costs reduced by 60%.

Production Deployment & Observability

Challenge:

Need to monitor real-world usage, debug failures, and iterate based on actual user questions. Simple console.log isn't sufficient.

Solution:

Built structured logging system with JSON output, implemented request/response tracking with unique session IDs, added error categorization (API failures vs. user errors), created log analysis scripts for insights.

Impact:

Can identify popular questions, debug issues retroactively, and measure real response times in production. Informed 3 major feature improvements based on log analysis.

🚀

Deployment Architecture

🐳Containerization

▸Docker: FastAPI backend containerized for consistent deployment
▸Multi-stage builds: Optimized image size (from 1.2GB → 180MB)
▸Environment configs: API keys managed via secrets

☁️Cloud Run Deployment

✓Serverless scaling: Auto-scales from 0 to N instances
✓Cost-effective: Pay only for actual request processing time
✓Global CDN: Low-latency responses worldwide

🎓

Key Learnings & Future Work

What I Learned

✓User experience is 50% about actual speed, 50% about perceived speed (optimistic UI matters!)
✓Production logging is not optional—you're blind without it
✓Context management is the hardest part of conversational AI, not the LLM itself
✓Docker + Cloud Run makes deployment trivially easy compared to traditional servers

Future Enhancements

→Add RAG (Retrieval-Augmented Generation) to ground responses in actual project documentation
→Implement analytics dashboard with Grafana for real-time monitoring
→Add voice input/output for accessibility and mobile experience
→A/B test different system prompts to optimize response quality

Try the AI Assistant Now

The chat widget you see in the bottom-right corner is this exact system in action. Ask it anything about my projects!

View More Projects