无法导出到监控服务，因为：GaxError RPC failed，由 3 引起

Question

无法导出到监控服务，因为：GaxError RPC failed，由 3 引起

Shb*_*Shb 7 google-app-engine google-cloud-platform google-cloud-stackdriver

我在 App Engine 中有一个 Java 应用程序，最近我开始收到以下错误：

Unable to export to Monitering service because: GaxError RPC failed, caused by 3:One or more TimeSeries could not be written: Metrics cannot be written to gae_app. See https://cloud.google.com/monitoring/custom-metrics/creating-metrics#which-resource for a list of writable resource types.: timeSeries[0]

Run Code Online (Sandbox Code Playgroud)

每次在健康检查日志后都会发生这种情况：

Health checks: instance=instanceName start=2020-01-14T14:28:07+00:00 end=2020-01-14T14:28:53+00:00 total=18 unhealthy=0 healthy=18

Run Code Online (Sandbox Code Playgroud)

一段时间后，我的实例将重新启动，同样的事情又开始发生。

应用程序.yaml :

 #https://cloud.google.com/appengine/docs/flexible/java/reference/app-yaml

#General settings
runtime: java
api_version: '1.0'
env: flex
runtime_config:
  jdk: openjdk8
#service: service_name #Required if creating a service. Optional for the default service.

#https://cloud.google.com/compute/docs/machine-types
#Resource settings
resources:
  cpu: 2
  memory_gb: 6 #memory_gb = cpu * [0.9 - 6.5] - 0.4
#  disk_size_gb: 10 #default

##Liveness checks - Liveness checks confirm that the VM and the Docker container are running. Instances that are deemed unhealthy are restarted.
liveness_check:
  path: "/liveness_check"
  timeout_sec: 20         #1-300   Timeout interval for each request, in seconds.
  check_interval_sec: 30 #1-300   1-300Time interval between checks, in seconds.
  failure_threshold: 6   #1-10    An instance is unhealthy after failing this number of consecutive checks.
  success_threshold: 2   #1-10    An unhealthy instance becomes healthy again after successfully responding to this number of consecutive checks.
  initial_delay_sec: 300 #0-3600  The delay, in seconds, after the instance starts during which health check responses are ignored. This setting can allow an instance more time at deployment to get up and running.

##Readiness checks - Readiness checks confirm that an instance can accept incoming requests. Instances that don't pass the readiness check are not added to the pool of available instances.
readiness_check:
  path: "/readiness_check"
  timeout_sec: 10             #1-300      Timeout interval for each request, in seconds.
  check_interval_sec: 15      #1-300      Time interval between checks, in seconds.
  failure_threshold: 4       #1-10    An instance is unhealthy after failing this number of consecutive checks.
  success_threshold: 2       #1-10    An unhealthy instance becomes healthy after successfully responding to this number of consecutive checks.
  app_start_timeout_sec: 300 #1-3600  The maximum time, in seconds, an instance has to become ready after the VM and other infrastructure are provisioned. After this period, the deployment fails and is rolled back. You might want to increase this setting if your application requires significant initialization tasks, such as downloading a large file, before it is ready to serve.

#Service scaling settings
automatic_scaling:
  min_num_instances: 2
  max_num_instances: 3
  cpu_utilization:
    target_utilization: 0.7

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 2

该错误是由于 stackdriverlogging sidecar 升级到1.6.25 版本引起的，该版本开始通过 OpenCensus 将 FluentD 指标推送到 Stackdriver 监控。但是，与 App Engine Flex 的集成尚无法运行。

这些错误应该只是日志。它与健康检查日志无关。它不应影响虚拟机重新启动。如果您的虚拟机实例频繁重启，可能是由其他原因引起的。在 Stackdriver 日志记录 UI 中，您可以Free disk space在流下vm.syslog和unhealthy sidecars流下进行搜索vm.events。如果出现一些日志，则您的实例重新启动可能是由于可用磁盘大小较低或任何不健康的 sidecar 容器引起的。

归档时间：	5 年，10 月前
查看次数：	1594 次
最近记录：	5 年，9 月前