DevOps · Deployment

Deploying to Cloud Run: A Practical Flow for Small AI Services

How I ship a FastAPI + LLM backend without babysitting servers or fighting with infrastructure every weekend.

For the portfolio assistant, I wanted something in between “just run it locally” and “spin up a full Kubernetes cluster”. Cloud Run hits a nice middle ground: you get containerised deployment, autoscaling, and a simple pricing model without having to live inside YAML.

This post walks through the deployment flow I use for small AI backends: build a Docker image, push it, and let Cloud Run take care of the rest.

What I’m deploying

The service is a FastAPI backend that:

Receives chat messages and conversation history.
Performs mode detection and prompt construction.
Calls an external LLM API.
Logs each interaction with timing and metadata.

It’s a classic “thin” API: the heavy lifting is done by the model provider, but I still care about robustness, latency, and observability.

Containerising the backend

The first step is putting the backend into a container so it runs the same way locally and in the cloud. The Dockerfile is intentionally simple: start from a slim Python base image, install only what’s needed, and use a production‑grade ASGI server.

I prefer multi‑stage builds so the final image stays small. It makes deployments faster and keeps attack surface lower.

Why Cloud Run works well for this use case

Cloud Run fits this kind of service nicely for a few reasons:

It scales to zero, which means you’re not paying when there’s no traffic.
You deploy containers, not “apps”, so you keep full control over the runtime.
Concurrency per container is configurable, so you can balance latency vs. cost.

For a portfolio assistant, traffic is bursty and fairly light, so this model is more than enough.

Handling secrets and configuration

One thing I’m careful about is never hard‑coding API keys or credentials into the image. Configuration lives in environment variables, and secrets are injected via the platform.

Locally, I use a simple `.env` file. In the cloud, Cloud Run gets its environment from the console or from a CI pipeline. The image itself stays generic and reusable.

Observability: not optional

Once something is deployed, the way you debug it changes. Print statements don’t help if you can’t see them. For this service, I log:

Request and response times.
Detected mode for each message.
High‑level error categories (timeouts, API failures, etc.).

Over time, those logs turn into a feedback loop: you see what users actually ask, where latency spikes, and which parts of the system misbehave under load.

Takeaways

Deploying to Cloud Run isn’t magical, but it removes most of the infrastructure friction for small AI backends. You focus on designing the service and logging it properly, and let the platform handle scaling and rollout.

For me, the main win is mental: I can iterate on the assistant’s behaviour without worrying that a forgotten server will fall over quietly in the background.