DevOps Ā· Deployment
Deploying to Cloud Run: A Practical Flow for Small AI Services
How I ship a FastAPI + LLM backend without babysitting servers or fighting with infrastructure every weekend.
For the portfolio assistant, I wanted something in between ājust run it locallyā and āspin up a full Kubernetes clusterā. Cloud Run hits a nice middle ground: you get containerised deployment, autoscaling, and a simple pricing model without having to live inside YAML.
This post walks through the deployment flow I use for small AI backends: build a Docker image, push it, and let Cloud Run take care of the rest.
What Iām deploying
The service is a FastAPI backend that:
- Receives chat messages and conversation history.
- Performs mode detection and prompt construction.
- Calls an external LLM API.
- Logs each interaction with timing and metadata.
Itās a classic āthinā API: the heavy lifting is done by the model provider, but I still care about robustness, latency, and observability.
Containerising the backend
The first step is putting the backend into a container so it runs the same way locally and in the cloud. The Dockerfile is intentionally simple: start from a slim Python base image, install only whatās needed, and use a productionāgrade ASGI server.
I prefer multiāstage builds so the final image stays small. It makes deployments faster and keeps attack surface lower.
Why Cloud Run works well for this use case
Cloud Run fits this kind of service nicely for a few reasons:
- It scales to zero, which means youāre not paying when thereās no traffic.
- You deploy containers, not āappsā, so you keep full control over the runtime.
- Concurrency per container is configurable, so you can balance latency vs. cost.
For a portfolio assistant, traffic is bursty and fairly light, so this model is more than enough.
Handling secrets and configuration
One thing Iām careful about is never hardācoding API keys or credentials into the image. Configuration lives in environment variables, and secrets are injected via the platform.
Locally, I use a simple `.env` file. In the cloud, Cloud Run gets its environment from the console or from a CI pipeline. The image itself stays generic and reusable.
Observability: not optional
Once something is deployed, the way you debug it changes. Print statements donāt help if you canāt see them. For this service, I log:
- Request and response times.
- Detected mode for each message.
- Highālevel error categories (timeouts, API failures, etc.).
Over time, those logs turn into a feedback loop: you see what users actually ask, where latency spikes, and which parts of the system misbehave under load.
Takeaways
Deploying to Cloud Run isnāt magical, but it removes most of the infrastructure friction for small AI backends. You focus on designing the service and logging it properly, and let the platform handle scaling and rollout.
For me, the main win is mental: I can iterate on the assistantās behaviour without worrying that a forgotten server will fall over quietly in the background.