我们团队的 Scrum 流程一度被数据库 provisioning 卡住了。每个 Sprint,为了开发和测试新功能,我们需要创建多个独立的数据库实例。最初的流程依赖于手动执行 gcloud sql instances create
或者维护越来越臃肿的 Terraform 模块。这个过程不仅耗时,而且极易出错,一个错误的参数就可能导致数小时的排查。当一个 Sprint 中有三四个并行的功能分支时,数据库管理员(DBA)和 DevOps 工程师就成了整个团队的瓶颈。这种摩擦力违背了 Scrum 敏捷的初衷。
我们需要的是一种“数据库即服务”的能力:开发人员能够通过一行命令或是一个 YAML 文件,在几分钟内获得一个隔离的、配置正确的数据库集群,并在功能合并后轻松销毁它。现有的 Helm chart 只能解决首次部署的问题,对于数据库的 Day-2 操作,如扩容、配置变更、甚至简单的密码轮换都显得力不从心。我们真正需要的是一个能够理解数据库生命周期的自动化控制器。这正是 Kubernetes Operator 模式的用武之地。我们决定在 Google Kubernetes Engine (GKE) 上,为团队构建一个精简但实用的 PostgreSQL Operator。
技术痛点与初步构想
痛点非常明确:
- 供应速度慢: 手动创建或通过 IaC 工具创建数据库实例,通常需要5到15分钟,这在需要快速迭代的开发环境中是无法接受的。
- 配置不一致: 手动操作导致每个环境的数据库配置(如
max_connections
,shared_buffers
)可能存在细微差别,为日后的生产问题埋下隐患。 - 资源管理困难: 开发、测试环境的数据库资源常常被遗忘,导致云成本不必要的浪费。销毁流程同样是手动的。
我们的构想是创建一个 Kubernetes Custom Resource Definition (CRD),名为 PostgresCluster
。开发人员只需定义他们期望的数据库状态,例如:
# config/samples/db_v1alpha1_postgrescluster.yaml
apiVersion: db.my.domain/v1alpha1
kind: PostgresCluster
metadata:
name: feature-branch-xyz
spec:
# PostgreSQL major version
version: "14"
# Number of instances in the cluster (1 primary, N-1 standbys)
instances: 3
# Storage configuration
storage:
size: "10Gi"
storageClassName: "premium-rwo" # GCP Persistent Disk storage class
# Compute resources for each instance
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1"
memory: "2Gi"
我们的 Operator 会持续监听这些 PostgresCluster
资源的变化,并自动在 GKE 中创建、配置或删除底层的 Kubernetes 对象(如 StatefulSet
, Service
, Secret
, PersistentVolumeClaim
),以使实际状态与期望状态保持一致。这个过程,就是 Operator 的核心——调谐循环(Reconciliation Loop)。
核心实现:从 CRD 定义到调谐循环
我们选择使用 Kubebuilder 框架来快速搭建 Operator 的骨架。它能自动生成 CRD 的 YAML 定义、Go 的 API 类型以及 Controller 的基本框架。
1. API 定义 (CRD)
首先,我们定义 PostgresCluster
的数据结构。这是 Operator 与用户交互的契约。
api/v1alpha1/postgrescluster_types.go
:
package v1alpha1
import (
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
// PostgresClusterSpec defines the desired state of PostgresCluster
type PostgresClusterSpec struct {
// PostgreSQL major version, e.g., "14"
// +kubebuilder:validation:Required
// +kubebuilder:validation:Pattern=`^\d+$`
Version string `json:"version"`
// Number of instances in the cluster. Must be 1 or greater.
// +kubebuilder:validation:Required
// +kubebuilder:validation:Minimum=1
Instances int32 `json:"instances"`
// Storage configuration for each instance
// +kubebuilder:validation:Required
Storage PostgresStorageSpec `json:"storage"`
// Compute resource requirements for each instance
// +kubebuilder:validation:Optional
Resources corev1.ResourceRequirements `json:"resources,omitempty"`
}
// PostgresStorageSpec defines the storage configuration
type PostgresStorageSpec struct {
// PVC size, e.g., "10Gi"
// +kubebuilder:validation:Required
Size string `json:"size"`
// Storage class to use for the PVCs
// +kubebuilder:validation:Required
StorageClassName string `json:"storageClassName"`
}
// PostgresClusterStatus defines the observed state of PostgresCluster
type PostgresClusterStatus struct {
// Phase indicates the current state of the cluster.
// E.g., Creating, Ready, Failed.
Phase string `json:"phase,omitempty"`
// ReadyInstances is the number of database instances that are ready.
ReadyInstances int32 `json:"readyInstances,omitempty"`
// Conditions represent the latest available observations of an object's state
Conditions []metav1.Condition `json:"conditions,omitempty"`
}
//+kubebuilder:object:root=true
//+kubebuilder:subresource:status
//+kubebuilder:printcolumn:name="Version",type="string",JSONPath=".spec.version"
//+kubebuilder:printcolumn:name="Instances",type="integer",JSONPath=".spec.instances"
//+kubebuilder:printcolumn:name="Ready",type="integer",JSONPath=".status.readyInstances"
//+kubebuilder:printcolumn:name="Phase",type="string",JSONPath=".status.phase"
//+kubebuilder:printcolumn:name="Age",type="date",JSONPath=".metadata.creationTimestamp"
// PostgresCluster is the Schema for the postgresclusters API
type PostgresCluster struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec PostgresClusterSpec `json:"spec,omitempty"`
Status PostgresClusterStatus `json:"status,omitempty"`
}
//+kubebuilder:object:root=true
// PostgresClusterList contains a list of PostgresCluster
type PostgresClusterList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []PostgresCluster `json:"items"`
}
func init() {
SchemeBuilder.Register(&PostgresCluster{}, &PostgresClusterList{})
}
这里的 +kubebuilder
注释非常关键,它们会用于生成 CRD YAML 文件,定义校验规则、状态子资源以及 kubectl get
的额外打印列。
2. 控制器核心逻辑 (Reconciliation Loop)
这是 Operator 的大脑。Reconcile
方法会在 PostgresCluster
资源发生任何变化(创建、更新、删除)时被调用。我们的任务是在这个方法中,通过与 Kubernetes API Server 交互,将现实世界(GKE 中的资源状态)调整为用户在 PostgresCluster
Spec 中所期望的状态。
下面是 Reconcile
方法的简化但完整的实现框架,展示了核心的调谐逻辑。
controllers/postgrescluster_controller.go
:
package controllers
import (
"context"
"fmt"
"time"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/errors"
"k8s.io/apimachinery/pkg/api/resource"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/apimachinery/pkg/types"
"k8s.io/apimachinery/pkg/util/intstr"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
dbv1alpha1 "github.com/your-repo/postgres-operator/api/v1alpha1"
)
// ... (Reconciler struct definition)
const finalizerName = "db.my.domain/finalizer"
func (r *PostgresClusterReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
logger := log.FromContext(ctx)
// 1. Fetch the PostgresCluster instance
instance := &dbv1alpha1.PostgresCluster{}
err := r.Get(ctx, req.NamespacedName, instance)
if err != nil {
if errors.IsNotFound(err) {
logger.Info("PostgresCluster resource not found. Ignoring since object must be deleted.")
return ctrl.Result{}, nil
}
logger.Error(err, "Failed to get PostgresCluster")
return ctrl.Result{}, err
}
// --- Finalizer Logic for graceful deletion ---
if instance.ObjectMeta.DeletionTimestamp.IsZero() {
// The object is not being deleted, so if it does not have our finalizer,
// let's add it and update the object.
if !controllerutil.ContainsFinalizer(instance, finalizerName) {
controllerutil.AddFinalizer(instance, finalizerName)
if err := r.Update(ctx, instance); err != nil {
return ctrl.Result{}, err
}
}
} else {
// The object is being deleted
if controllerutil.ContainsFinalizer(instance, finalizerName) {
// Our finalizer is present, so let's handle any external dependency
logger.Info("Performing finalizer operations for PostgresCluster")
// In a real operator, you might perform actions like taking a final backup.
// Here, we just log it.
// Remove our finalizer from the list and update it.
controllerutil.RemoveFinalizer(instance, finalizerName)
if err := r.Update(ctx, instance); err != nil {
return ctrl.Result{}, err
}
}
// Stop reconciliation as the item is being deleted
return ctrl.Result{}, nil
}
// 2. Reconcile the admin password Secret
// In a real project, password should be generated and stored securely.
// For simplicity, we create a secret if it does not exist.
secret := &corev1.Secret{}
err = r.Get(ctx, types.NamespacedName{Name: instance.Name + "-pg-secret", Namespace: instance.Namespace}, secret)
if err != nil && errors.IsNotFound(err) {
logger.Info("Creating a new Secret for admin credentials")
newSecret := r.secretForPostgres(instance)
if err := r.Create(ctx, newSecret); err != nil {
logger.Error(err, "Failed to create new Secret")
return ctrl.Result{}, err
}
// Requeue to ensure the secret is available for next steps
return ctrl.Result{Requeue: true}, nil
} else if err != nil {
logger.Error(err, "Failed to get Secret")
return ctrl.Result{}, err
}
// 3. Reconcile the Headless Service for StatefulSet discovery
headlessSvc := &corev1.Service{}
err = r.Get(ctx, types.NamespacedName{Name: instance.Name + "-headless", Namespace: instance.Namespace}, headlessSvc)
if err != nil && errors.IsNotFound(err) {
logger.Info("Creating a new Headless Service")
svc := r.headlessServiceForPostgres(instance)
if err := r.Create(ctx, svc); err != nil {
logger.Error(err, "Failed to create Headless Service")
return ctrl.Result{}, err
}
return ctrl.Result{Requeue: true}, nil
} else if err != nil {
logger.Error(err, "Failed to get Headless Service")
return ctrl.Result{}, err
}
// 4. Reconcile the StatefulSet
foundSts := &appsv1.StatefulSet{}
err = r.Get(ctx, types.NamespacedName{Name: instance.Name, Namespace: instance.Namespace}, foundSts)
if err != nil && errors.IsNotFound(err) {
// Define a new statefulset
sts := r.statefulSetForPostgres(instance)
logger.Info("Creating a new StatefulSet", "StatefulSet.Namespace", sts.Namespace, "StatefulSet.Name", sts.Name)
err = r.Create(ctx, sts)
if err != nil {
logger.Error(err, "Failed to create new StatefulSet", "StatefulSet.Namespace", sts.Namespace, "StatefulSet.Name", sts.Name)
return ctrl.Result{}, err
}
// StatefulSet created successfully - return and requeue
instance.Status.Phase = "Creating"
if err := r.Status().Update(ctx, instance); err != nil {
logger.Error(err, "Failed to update PostgresCluster status")
}
return ctrl.Result{RequeueAfter: time.Second * 5}, nil
} else if err != nil {
logger.Error(err, "Failed to get StatefulSet")
return ctrl.Result{}, err
}
// 5. Ensure the StatefulSet size is the same as the spec
size := instance.Spec.Instances
if *foundSts.Spec.Replicas != size {
logger.Info("Updating StatefulSet replica count", "current", *foundSts.Spec.Replicas, "desired", size)
foundSts.Spec.Replicas = &size
err = r.Update(ctx, foundSts)
if err != nil {
logger.Error(err, "Failed to update StatefulSet", "StatefulSet.Namespace", foundSts.Namespace, "StatefulSet.Name", foundSts.Name)
return ctrl.Result{}, err
}
// Spec updated - return and requeue
return ctrl.Result{Requeue: true}, nil
}
// 6. Update the PostgresCluster status
// In a production operator, we would check pod statuses, primary election, etc.
// For this example, we'll just check the ready replicas of the StatefulSet.
readyReplicas := foundSts.Status.ReadyReplicas
if instance.Status.ReadyInstances != readyReplicas {
instance.Status.ReadyInstances = readyReplicas
}
if readyReplicas == instance.Spec.Instances {
instance.Status.Phase = "Ready"
} else {
instance.Status.Phase = "Reconciling"
}
err = r.Status().Update(ctx, instance)
if err != nil {
logger.Error(err, "Failed to update PostgresCluster status")
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
3. 辅助函数:构建 Kubernetes 对象
Reconcile
方法依赖一系列辅助函数来动态构建所需的 Kubernetes 对象。这里的核心是 statefulSetForPostgres
。
controllers/postgrescluster_helpers.go
:
package controllers
import (
// ... imports
)
// statefulSetForPostgres returns a postgres StatefulSet object
func (r *PostgresClusterReconciler) statefulSetForPostgres(p *dbv1alpha1.PostgresCluster) *appsv1.StatefulSet {
ls := labelsForPostgres(p.Name)
replicas := p.Spec.Instances
storageSize := resource.MustParse(p.Spec.Storage.Size)
// A common pitfall is not setting the controller reference.
// Without it, deleting the PostgresCluster object will not delete the StatefulSet.
// This is known as garbage collection in Kubernetes.
dep := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: p.Name,
Namespace: p.Namespace,
},
Spec: appsv1.StatefulSetSpec{
Replicas: &replicas,
Selector: &metav1.LabelSelector{
MatchLabels: ls,
},
ServiceName: p.Name + "-headless",
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: ls,
},
Spec: corev1.PodSpec{
// Use a proper termination grace period to allow Postgres to shut down cleanly.
TerminationGracePeriodSeconds: func(i int64) *int64 { return &i }(30),
Containers: []corev1.Container{{
Image: fmt.Sprintf("postgres:%s", p.Spec.Version),
Name: "postgres",
Env: []corev1.EnvVar{
{
Name: "POSTGRES_USER",
Value: "admin",
},
{
Name: "POSTGRES_PASSWORD",
ValueFrom: &corev1.EnvVarSource{
SecretKeyRef: &corev1.SecretKeySelector{
LocalObjectReference: corev1.LocalObjectReference{Name: p.Name + "-pg-secret"},
Key: "password",
},
},
},
{
Name: "PGDATA",
Value: "/var/lib/postgresql/data/pgdata",
},
},
Ports: []corev1.ContainerPort{{
ContainerPort: 5432,
Name: "postgres",
}},
VolumeMounts: []corev1.VolumeMount{{
Name: "pgdata",
MountPath: "/var/lib/postgresql/data",
}},
Resources: p.Spec.Resources,
// A robust liveness probe is critical for stateful applications.
// It should verify the database is actually responsive.
LivenessProbe: &corev1.Probe{
ProbeHandler: corev1.ProbeHandler{
Exec: &corev1.ExecAction{
Command: []string{"pg_isready", "-U", "admin"},
},
},
InitialDelaySeconds: 30,
TimeoutSeconds: 5,
PeriodSeconds: 10,
FailureThreshold: 6,
},
// Readiness probe ensures the pod only receives traffic when it's ready.
ReadinessProbe: &corev1.Probe{
ProbeHandler: corev1.ProbeHandler{
Exec: &corev1.ExecAction{
Command: []string{"pg_isready", "-U", "admin"},
},
},
InitialDelaySeconds: 5,
TimeoutSeconds: 3,
PeriodSeconds: 5,
},
}},
},
},
// VolumeClaimTemplates are the key to persistent storage in StatefulSets.
// Each pod gets its own persistent volume.
VolumeClaimTemplates: []corev1.PersistentVolumeClaim{{
ObjectMeta: metav1.ObjectMeta{
Name: "pgdata",
},
Spec: corev1.PersistentVolumeClaimSpec{
AccessModes: []corev1.PersistentVolumeAccessMode{corev1.ReadWriteOnce},
StorageClassName: &p.Spec.Storage.StorageClassName,
Resources: corev1.ResourceRequirement{
Requests: corev1.ResourceList{
corev1.ResourceStorage: storageSize,
},
},
},
}},
},
}
// Set PostgresCluster instance as the owner and controller
ctrl.SetControllerReference(p, dep, r.Scheme)
return dep
}
// ... (other helper functions for Service, Secret, labels)
这段代码中最关键的部分是:
-
ctrl.SetControllerReference
: 这行代码建立了PostgresCluster
和StatefulSet
之间的父子关系。当PostgresCluster
被删除时,Kubernetes 的垃圾回收机制会自动删除其拥有的StatefulSet
。这是避免资源泄露的关键。 -
VolumeClaimTemplates
: 这是StatefulSet
管理有状态应用的核心。它为每个 Pod 副本创建一个独立的PersistentVolumeClaim
,确保了数据的持久性和唯一性。 - Probes: 精心设计的
LivenessProbe
和ReadinessProbe
对于数据库至关重要。liveness
探针失败会导致容器重启,而readiness
探针失败会暂时将 Pod 从 Service 的端点中移除,停止接收流量,但不会杀死它。
调谐流程可视化
整个调谐过程可以通过下面的 Mermaid 图清晰地展示出来:
sequenceDiagram participant Dev as Developer participant K8s as Kubernetes API Server participant Op as Postgres Operator participant STS as StatefulSet participant Pod as Postgres Pods Dev->>+K8s: kubectl apply -f postgres-cluster.yaml K8s-->>+Op: Notify change for PostgresCluster 'feature-branch-xyz' Op->>K8s: Get PostgresCluster 'feature-branch-xyz' K8s-->>Op: Return object details Op->>K8s: Does Secret 'feature-branch-xyz-pg-secret' exist? K8s-->>Op: No, Not Found Op->>+K8s: Create Secret K8s-->>-Op: Secret created Op->>K8s: Does Service 'feature-branch-xyz-headless' exist? K8s-->>Op: No, Not Found Op->>+K8s: Create Headless Service K8s-->>-Op: Service created Op->>K8s: Does StatefulSet 'feature-branch-xyz' exist? K8s-->>Op: No, Not Found Op->>+K8s: Create StatefulSet (replicas=3) Note over K8s, STS: K8s Controller Manager sees new StatefulSet K8s-->>-Op: StatefulSet created STS->>+Pod: Create Pod-0, Pod-1, Pod-2 Pod-->>-STS: Pods become Ready Op->>K8s: Reconcile again (requeued) Op->>K8s: Get StatefulSet 'feature-branch-xyz' K8s-->>Op: Return StatefulSet with Status (readyReplicas=3) Op->>K8s: Update PostgresCluster 'feature-branch-xyz' Status (Phase=Ready, ReadyInstances=3) K8s-->>Op: Status updated
最终成果与遗留问题
部署完成后,开发人员现在可以完全自服务地管理他们的数据库环境。创建一个功能分支的数据库集群,从提交 YAML 文件到数据库可用,整个过程缩短到了3分钟以内。团队的迭代速度得到了显著提升,DevOps 工程师也从繁琐的重复工作中解放出来,可以专注于更有价值的平台工程建设。
当然,我们当前实现的 Operator 只是一个起点,它在生产环境中使用还存在明显的局限性:
- 高可用与故障转移: 当前版本没有实现真正的高可用。虽然
StatefulSet
会在节点故障时在别处重建 Pod,但这并不能自动处理 PostgreSQL 的主从切换。一个生产级的 Operator 需要集成 Patroni 或类似工具来管理集群选举和故障转移。 - 备份与恢复: 数据库的生命周期管理离不开备份和恢复。下一步的迭代计划是为
PostgresCluster
CRD 增加backup
spec,并实现一个独立的PostgresBackup
CRD,由 Operator 协调pg_dump
或pg_basebackup
的执行,并将备份上传到 GCP Cloud Storage。 - 版本升级: 无缝的数据库主版本升级是一个复杂的过程,需要精确控制 Pod 的滚动更新顺序、执行
pg_upgrade
、并处理潜在的数据格式不兼容问题。这需要更复杂的调谐逻辑。 - 配置管理: 目前配置是硬编码在容器镜像或环境变量中的。一个更灵活的设计应该允许用户通过
ConfigMap
来管理postgresql.conf
,并由 Operator 触发滚动重启来应用配置变更。
尽管存在这些待办事项,但这个项目已经证明了 Operator 模式在自动化复杂有状态应用方面的强大能力。它将运维知识固化为代码,为我们的 Scrum 团队提供了一个真正符合敏捷精神的云原生数据库平台。