如何使用 Envoy 和 gRPC-Web 配置 GKE Autopilot

Ren*_*vis 4 google-kubernetes-engine grpc-web autopilot

我的本地计算机上运行一个应用程序,它使用 React -> gRPC-Web -> Envoy -> Go 应用程序,一切运行都没有问题。我正在尝试使用 GKE Autopilot 进行部署,但无法获得正确的配置。我对所有 GCP/GKE 都是新手,因此我正在寻求帮助以找出问题所在。

我最初关注的是这个文档,尽管我只有一个 gRPC 服务: https://cloud.google.com/architecture/exusing-grpc-services-on-gke-using-envoy-proxy

据我所知,GKE Autopilot 模式需要使用外部 HTTP(s) 负载平衡,而不是上述解决方案中所述的网络负载平衡,因此我一直在尝试使其发挥作用。经过各种尝试,我目前的策略有Ingress、BackendConfig、Service、Deployment。该部署包含三个容器:我的应用程序、用于转换 gRPC-Web 请求和响应的 Envoy sidecar,以及云 SQL 代理 sidecar。我最终想使用 TLS,但现在我将其排除在外,以免事情变得更加复杂。

当我应用所有配置时,后端服务显示一个区域中的一个后端,并且运行状况检查失败。健康检查设置为端口 8080 和路径 /healthz,这是我在部署配置中指定的内容,但我很怀疑,因为当我查看 envoy-sidecar 容器的详细信息时,它显示了 Readiness 探针如:http-get HTTP://:0/healthz headers=x-envoy-livenessprobe:healthz。“:0”只是意味着它使用容器的默认地址和端口,还是表明存在配置问题?

我一直在阅读各种文档,但无法将它们拼凑在一起。有没有一个例子可以说明如何做到这一点?我一直在寻找,但没有找到。

我当前的配置是:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grammar-games-ingress
  #annotations:
    # If the class annotation is not specified it defaults to "gce".
    # kubernetes.io/ingress.class: "gce"
    # kubernetes.io/ingress.global-static-ip-name: <IP addr>
spec:
  defaultBackend:
    service:
      name: grammar-games-core
      port:
        number: 80
---
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: grammar-games-bec
  annotations:
    cloud.google.com/neg: '{"ingress": true}'
spec:
  sessionAffinity:
    affinityType: "CLIENT_IP"  
  healthCheck:
    checkIntervalSec: 15
    port: 8080
    type: HTTP
    requestPath: /healthz
  timeoutSec: 60
---
apiVersion: v1
kind: Service
metadata:
  name: grammar-games-core
  annotations:
    cloud.google.com/neg: '{"ingress": true}'
    cloud.google.com/app-protocols: '{"http":"HTTP"}'
    cloud.google.com/backend-config: '{"default": "grammar-games-bec"}'
spec:
  type: ClusterIP
  selector:
    app: grammar-games-core
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grammar-games-core
spec:
  # Two replicas for right now, just so I can see how RPC calls get directed.
  # replicas: 2
  selector:
    matchLabels:
      app: grammar-games-core
  template:
    metadata:
      labels:
        app: grammar-games-core
    spec:
      serviceAccountName: grammar-games-core-k8sa
      containers:
      - name: grammar-games-core
        image: gcr.io/grammar-games/grammar-games-core:1.1.2
        command:
          - "/bin/grammar-games-core"
        ports:
        - containerPort: 52001
        env:
        - name: GAMESDB_USER
          valueFrom:
            secretKeyRef:
              name: gamesdb-config
              key: username
        - name: GAMESDB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: gamesdb-config
              key: password
        - name: GAMESDB_DB_NAME
          valueFrom:
            secretKeyRef:
              name: gamesdb-config
              key: db-name 
        - name: GRPC_SERVER_PORT
          value: '52001'
        - name: GAMES_LOG_FILE_PATH
          value: ''
        - name: GAMESDB_LOG_LEVEL
          value: 'debug'
        resources:
          requests:
            # The proxy's memory use scales linearly with the number of active
            # connections. Fewer open connections will use less memory. Adjust
            # this value based on your application's requirements.
            memory: "2Gi"
            # The proxy's CPU use scales linearly with the amount of IO between
            # the database and the application. Adjust this value based on your
            # application's requirements.
            cpu:    "1"
        readinessProbe:
          exec:
            command: ["/bin/grpc_health_probe", "-addr=:52001"]
          initialDelaySeconds: 5
      - name: cloud-sql-proxy
        # It is recommended to use the latest version of the Cloud SQL proxy
        # Make sure to update on a regular schedule!
        image: gcr.io/cloudsql-docker/gce-proxy:1.24.0
        command:
          - "/cloud_sql_proxy"

          # If connecting from a VPC-native GKE cluster, you can use the
          # following flag to have the proxy connect over private IP
          # - "-ip_address_types=PRIVATE"

          # Replace DB_PORT with the port the proxy should listen on
          # Defaults: MySQL: 3306, Postgres: 5432, SQLServer: 1433
          - "-instances=grammar-games:us-east1:grammar-games-db=tcp:3306"
        securityContext:
          # The default Cloud SQL proxy image runs as the
          # "nonroot" user and group (uid: 65532) by default.
          runAsNonRoot: true
        # Resource configuration depends on an application's requirements. You
        # should adjust the following values based on what your application
        # needs. For details, see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
        resources:
          requests:
            # The proxy's memory use scales linearly with the number of active
            # connections. Fewer open connections will use less memory. Adjust
            # this value based on your application's requirements.
            memory: "2Gi"
            # The proxy's CPU use scales linearly with the amount of IO between
            # the database and the application. Adjust this value based on your
            # application's requirements.
            cpu:    "1"
      - name: envoy-sidecar
        image: envoyproxy/envoy:v1.20-latest
        ports:
        - name: http
          containerPort: 8080
        resources:
          requests:
            cpu: 10m
            ephemeral-storage: 256Mi
            memory: 256Mi
        volumeMounts:
        - name: config
          mountPath: /etc/envoy
        readinessProbe:
          httpGet:
            port: http
            httpHeaders:
            - name: x-envoy-livenessprobe
              value: healthz
            path: /healthz
            scheme: HTTP
      volumes:
      - name: config
        configMap:
          name: envoy-sidecar-conf      
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: envoy-sidecar-conf
data:
  envoy.yaml: |
    static_resources:
      listeners:
      - name: listener_0
        address:
          socket_address:
            address: 0.0.0.0
            port_value: 8080
        filter_chains:
        - filters:
          - name: envoy.filters.network.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              access_log:
              - name: envoy.access_loggers.stdout
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
              codec_type: AUTO
              stat_prefix: ingress_http
              route_config:
                name: local_route
                virtual_hosts:
                - name: http
                  domains:
                  - "*"
                  routes:
                  - match:
                      prefix: "/grammar_games_protos.GrammarGames/"
                    route:
                      cluster: grammar-games-core-grpc
                  cors:
                    allow_origin_string_match:
                    - prefix: "*"
                    allow_methods: GET, PUT, DELETE, POST, OPTIONS
                    allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,custom-header-1,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout
                    max_age: "1728000"
                    expose_headers: custom-header-1,grpc-status,grpc-message
              http_filters:
              - name: envoy.filters.http.health_check
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                  pass_through_mode: false
                  headers:
                  - name: ":path"
                    exact_match: "/healthz"
                  - name: "x-envoy-livenessprobe"
                    exact_match: "healthz"
              - name: envoy.filters.http.grpc_web
              - name: envoy.filters.http.cors
              - name: envoy.filters.http.router
                typed_config: {}
      clusters:
      - name: grammar-games-core-grpc
        connect_timeout: 0.5s
        type: logical_dns
        lb_policy: ROUND_ROBIN
        http2_protocol_options: {}
        load_assignment:
          cluster_name: grammar-games-core-grpc
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: 0.0.0.0
                    port_value: 52001
        health_checks:
          timeout: 1s
          interval: 10s
          unhealthy_threshold: 2
          healthy_threshold: 2
          grpc_health_check: {}
    admin:
      access_log_path: /dev/stdout
      address:
        socket_address:
          address: 127.0.0.1
          port_value: 8090

Run Code Online (Sandbox Code Playgroud)

Ren*_*vis 5

终于解决了这个问题,所以想把我的答案贴出来供参考。

事实证明,本文档中的解决方案有效:

https://cloud.google.com/architecture/exusing-grpc-services-on-gke-using-envoy-proxy#introduction

在有关 GKE 自动驾驶模式的文档之一中,我的印象是您不能使用网络负载均衡器,而需要使用 Ingress 进行 HTTP(S) 负载均衡。这就是为什么我寻求另一种方法,但即使在与 Google 支持人员合作了几周后,配置看起来都正确,但负载均衡器的运行状况检查无法正常工作。就在那时,我们发现这个带有网络负载均衡器的解决方案实际上是可行的。

我在配置 https/TLS 时也遇到了一些问题。事实证明这是我的特使配置文件中的一个问题。

我仍然有一个关于 Pod 稳定性的问题,但这是一个单独的问题,我将在另一篇文章/问题中讨论。只要我只要求 1 个副本,该解决方案就稳定且运行良好,并且自动驾驶仪应该根据需要扩展 Pod。

我知道所有这些的配置都可能非常棘手,因此我将其全部包含在这里以供参考(仅使用 my-app 作为占位符)。希望它能帮助其他人比我更快到达那里!我认为一旦 gRPC-Web 能够正常工作,它就是一个很好的解决方案。您还会注意到,我使用 cloud-sql-proxy sidecar 与 DB Cloud SQL 进行通信,并使用 GKE 服务帐户进行身份验证。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      serviceAccountName: my-app-k8sa
      terminationGracePeriodSeconds: 30
      containers:
      - name: my-app
        image: gcr.io/my-project/my-app:1.1.0
        command:
          - "/bin/my-app"
        ports:
        - containerPort: 52001
        env:
        - name: DB_USER
          valueFrom:
            secretKeyRef:
              name: db-config
              key: username
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: db-config
              key: password
        - name: DB_NAME
          valueFrom:
            secretKeyRef:
              name: db-config
              key: db-name 
        - name: GRPC_SERVER_PORT
          value: '52001'
        readinessProbe:
          exec:
            command: ["/bin/grpc_health_probe", "-addr=:52001"]
          initialDelaySeconds: 10
        livenessProbe:
          exec:
            command: ["/bin/grpc_health_probe", "-addr=:52001"]
          initialDelaySeconds: 15
      - name: cloud-sql-proxy
        # It is recommended to use the latest version of the Cloud SQL proxy
        # Make sure to update on a regular schedule!
        image: gcr.io/cloudsql-docker/gce-proxy:1.27.1
        command:
          - "/cloud_sql_proxy"

          # If connecting from a VPC-native GKE cluster, you can use the
          # following flag to have the proxy connect over private IP
          # - "-ip_address_types=PRIVATE"

          # Replace DB_PORT with the port the proxy should listen on
          # Defaults: MySQL: 3306, Postgres: 5432, SQLServer: 1433
          - "-instances=my-project:us-east1:my-app-db=tcp:3306"
        securityContext:
          # The default Cloud SQL proxy image runs as the
          # "nonroot" user and group (uid: 65532) by default.
          runAsNonRoot: true

---
apiVersion: v1
kind: Service
metadata:
  name: my-app
spec:
  type: ClusterIP
  selector:
    app: my-app
  ports:
  - name: my-app-port
    protocol: TCP
    port: 52001
  clusterIP: None
---
apiVersion: v1
kind: Service
metadata:
  name: envoy
spec:
  type: LoadBalancer
  selector:
    app: envoy
  ports:
  - name: https
    protocol: TCP
    port: 443
    targetPort: 8443
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: envoy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: envoy
  template:
    metadata:
      labels:
        app: envoy
    spec:
      containers:
      - name: envoy
        image: envoyproxy/envoy:v1.20-latest
        ports:
        - name: https
          containerPort: 8443
        resources:
          requests:
            cpu: 10m
            ephemeral-storage: 256Mi
            memory: 256Mi
        volumeMounts:
        - name: config
          mountPath: /etc/envoy
        - name: certs
          mountPath: /etc/ssl/envoy
        readinessProbe:
          httpGet:
            port: https
            httpHeaders:
            - name: x-envoy-livenessprobe
              value: healthz
            path: /healthz
            scheme: HTTPS
      volumes:
      - name: config
        configMap:
          name: envoy-conf
      - name: certs
        secret:
          secretName: envoy-certs
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: envoy-conf
data:
  envoy.yaml: |
    static_resources:
      listeners:
      - name: listener_0
        address:
          socket_address:
            address: 0.0.0.0
            port_value: 8443
        filter_chains:
        - filters:
          - name: envoy.filters.network.http_connection_manager
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
              access_log:
              - name: envoy.access_loggers.stdout
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
              codec_type: AUTO
              stat_prefix: ingress_https
              route_config:
                name: local_route
                virtual_hosts:
                - name: https
                  domains:
                  - "*"
                  routes:
                  - match:
                      prefix: "/my_app_protos.MyService/"
                    route:
                      cluster: my-app-cluster
                  cors:
                    allow_origin_string_match:
                    - prefix: "*"
                    allow_methods: GET, PUT, DELETE, POST, OPTIONS
                    allow_headers: keep-alive,user-agent,cache-control,content-type,content-transfer-encoding,custom-header-1,x-accept-content-transfer-encoding,x-accept-response-streaming,x-user-agent,x-grpc-web,grpc-timeout
                    max_age: "1728000"
                    expose_headers: custom-header-1,grpc-status,grpc-message
              http_filters:
              - name: envoy.filters.http.health_check
                typed_config:
                  "@type": type.googleapis.com/envoy.extensions.filters.http.health_check.v3.HealthCheck
                  pass_through_mode: false
                  headers:
                  - name: ":path"
                    exact_match: "/healthz"
                  - name: "x-envoy-livenessprobe"
                    exact_match: "healthz"
              - name: envoy.filters.http.grpc_web
              - name: envoy.filters.http.cors
              - name: envoy.filters.http.router
                typed_config: {}
          transport_socket:
            name: envoy.transport_sockets.tls
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
              require_client_certificate: false
              common_tls_context:
                tls_certificates:
                - certificate_chain:
                    filename: /etc/ssl/envoy/tls.crt
                  private_key:
                    filename: /etc/ssl/envoy/tls.key
      clusters:
      - name: my-app-cluster
        connect_timeout: 0.5s
        type: STRICT_DNS
        dns_lookup_family: V4_ONLY
        lb_policy: ROUND_ROBIN
        http2_protocol_options: {}
        load_assignment:
          cluster_name: my-app-cluster
          endpoints:
          - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: my-app.default.svc.cluster.local
                    port_value: 52001
        health_checks:
          timeout: 1s
          interval: 10s
          unhealthy_threshold: 2
          healthy_threshold: 2
          grpc_health_check: {}
    admin:
      access_log_path: /dev/stdout
      address:
        socket_address:
          address: 127.0.0.1
          port_value: 8090
Run Code Online (Sandbox Code Playgroud)

我仍然不确定指定部署中两个容器的资源要求和副本数量,但解决方案正在工作。