Why this matters
As a Platform Engineer, you remove bottlenecks by letting developers provision services, environments, and cloud resources safely without waiting on tickets. Good self-serve workflows reduce lead time, errors, and cognitive load. Real tasks include: creating new microservices from templates, requesting databases or queues, spinning up preview environments, and ensuring everything is compliant, observable, and cost-aware.
Concept explained simply
Self-serve provisioning is a guided, guardrailed path where developers request an outcome (like “new service with a database”), and the platform turns that request into versioned changes, automated plans, approvals, and deployments.
Mental model
- Catalog: discoverable menu of safe options (service templates, DBs, environments).
- Templates: golden paths that encode best practices.
- Orchestrator: pipelines that run plans/applies and post results back to developers.
- GitOps backbone: everything is code; changes flow via pull requests.
- Guardrails: policies, RBAC, quotas, and cost checks baked in.
Core components
- Service catalog and request form (or CLI prompts) for consistent inputs.
- Scaffolding templates (e.g., repo boilerplates, CI/CD, Helm charts).
- Infrastructure modules (e.g., Terraform/Cloud templates) with safe defaults.
- Git-based change flow: PR with plan output, review, and merge to apply.
- Policy engine: enforce naming, tags, regions, size limits, TTL.
- Secrets delivery: rotate and inject via secure stores into runtimes.
- Observability hooks: logs, metrics, tracing added by default.
- Cost visibility: estimates in PR comments; budgets and alerts.
Worked examples
Example 1: New microservice golden path
- Developer submits: name=my-orders, runtime=Node.js, needs=HTTP+metrics.
- Platform scaffolds a repo from a template with health checks, linting, tests, Dockerfile, Helm chart, and CI pipeline.
- GitOps repo receives a namespace entry for dev and a Helm release for the service.
- Pipeline posts a dry-run and link to generated artifacts; on PR merge, the service is deployed.
What can go wrong?
Missing naming conventions cause conflicts; guard with a policy. CI secrets not set; add pre-flight checks. Runtime mismatch; restrict options in the form.
Example 2: Request a managed Postgres database
- Developer fills request: service=my-orders, env=dev, size=small, TTL=14d.
- Workflow creates a PR changing infra code using a hardened Postgres module.
- Bot runs terraform plan and cost estimate, posting results to the PR.
- On approval, merge applies infra and stores credentials in secret manager; app namespace gets a read-only secret reference.
- TTL controller sets an auto-expiry tag; cleanup job destroys it after 14d.
What can go wrong?
Orphaned DBs after branch deletes; add TTL and scheduled cleanup. Over-sized instances; enforce size tiers. Secrets linger; rotate on destroy.
Example 3: Ephemeral preview environments per PR
- On PR open, pipeline provisions a preview namespace and deploys the branch with a temporary URL.
- Optional light-weight DB sandbox is created with seed data.
- Status checks report readiness; on PR close/merge, resources are destroyed.
What can go wrong?
Resource sprawl; enforce per-user and per-repo quotas and teardown hooks. Long-living previews; add TTL. Race conditions; use deploy locks per environment.
Design blueprint (start small)
- Phase 0: Pick one high-value workflow (e.g., new service) and define the minimal inputs and outputs.
- Phase 1: Template + GitOps PR flow + basic policy checks + observability defaults.
- Phase 2: Add cost estimates, TTL, secrets automation, and RBAC.
- Phase 3: Expand catalog (databases, queues, preview envs) and measure lead time and failure rates.
Checklist: Definition of Done for a self-serve workflow
- Inputs are validated with clear error messages.
- Outcome is represented as code and reviewed via PR.
- Plan/apply results are visible to the requester.
- Guardrails: policies, quotas, and RBAC enforced.
- Secrets handled via a secure manager; no plaintext.
- Observability and cost hooks included by default.
- Idempotent and safe re-runs; rollback path documented.
- Automatic cleanup for ephemeral resources.
Exercises
Exercise 1: Draft a PR-driven service + DB workflow request
Create a single YAML request manifest that a bot could consume to generate a repo, a Helm release, and a small Postgres instance for the dev environment.
Show fields to consider
- service_name, owner, runtime, env, database, size_tier, ttl, tags, compliance
Exercise 2: Write guardrail policies (pseudocode)
Write simple policy rules to enforce: name pattern, allowed regions, required tags, and a max instance size. Use clear denial messages.
Common mistakes and self-check
- Skipping policy-as-code: leads to inconsistent reviews. Self-check: is every rule codified and tested?
- Over-customizable templates: increases drift. Self-check: do templates enforce sensible defaults?
- No cleanup path: causes cost sprawl. Self-check: do all ephemeral resources have TTL and teardown?
- Opaque errors: developers file tickets. Self-check: do bots post actionable errors and links to logs?
- Secrets leakage in logs: Self-check: are logs scrubbed and secrets masked?
Practical projects
- Golden path template: Scaffold a containerized service with CI, Helm, metrics, and health endpoints.
- DB request workflow: A PR-based Terraform module request with plan and cost estimate comments.
- Preview environments: Namespace-per-PR with auto-destroy on PR close and a daily TTL sweep.
Mini challenge
Design a single form (or YAML) that, when submitted, creates a new service, a preview environment for its first PR, and an optional small cache. Add at least three guardrails and specify how cost is surfaced to the user.
Learning path
- Foundations: GitOps basics, templates, and pipelines.
- Pilot workflow: new service golden path with enforced defaults.
- Guardrails: add policies, RBAC, quotas, and cost checks.
- Environments: add preview and sandbox data strategies.
- Operate: metrics, SLOs, runbooks, and periodic audits.
Who this is for
- Platform Engineers enabling internal developer platforms.
- Backend Engineers contributing to golden paths and modules.
- Tech Leads and SREs standardizing infra and deployments.
Prerequisites
- Comfort with Git and pull request workflows.
- Basic CI/CD concepts and containerization.
- Intro-level infrastructure-as-code (e.g., Terraform concepts).
Next steps
- Implement a minimum viable workflow for one team; iterate with feedback.
- Add policy-as-code and cost checks; measure lead time improvements.
- Scale with catalogs, templates, and clear runbooks.
Ready? Take the quick test
Anyone can take the test. Only logged-in users will see saved progress.