安装在 VM 上的 GCP stackdriver-agent 每分钟发送一次奇怪的日志

ESC*_*TOR 10 google-cloud-monitoring google-cloud-stackdriver

请你能帮我解决以下问题吗?

我在 node.js 上有一个后端服务,我将它部署在 GCE VM 上。它工作正常,但在安装日志记录和监控代理后,我在日志查看器中看到非常奇怪的日志。我查看了生成该日志的付费内容。它是堆栈驱动程序代理。

以下是它们:

A 2020-05-15T22:45:26Z write_gcm: can not take infinite value
A 2020-05-15T22:45:26Z write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing. 
A 2020-05-15T22:45:26Z write_gcm: can not take infinite value 
A 2020-05-15T22:45:26Z write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing. 
A 2020-05-15T22:45:26Z write_gcm: can not take infinite value 
A 2020-05-15T22:45:26Z write_gcm: wg_typed_value_create_from_value_t_inline failed for swap/percent/value! Continuing. 
A 2020-05-15T22:45:28Z write_gcm: Server response (CollectdTimeseriesRequest) contains errors:#012{#012  "payloadErrors": [#012    {#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 5,#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 10,#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 15,#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 20,#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    },#012    {#012      "index": 25 
A 2020-05-15T22:45:29Z write_gcm: Server response (CollectdTimeseriesRequest) contains errors:#012{#012  "payloadErrors": [#012    {#012      "error": {#012        "code": 3,#012        "message": "Unsupported collectd plugin/type combination: plugin: \"processes\" type: \"io_octets\""#012      }#012    }#012  ]#012} 
A 2020-05-15T22:45:29Z write_gcm: Unsuccessful HTTP request 400: {#012  "error": {#012    "code": 400,#012    "message": "Field timeSeries[3].points[0].interval.start_time had an invalid value of \"2020-05-15T15:45:27.348251-07:00\": The start time must be before the end time (2020-05-15T15:45:27.348251-07:00) for the non-gauge metric 'agent.googleapis.com/agent/api_request_count'.",#012    "status": "INVALID_ARGUMENT"#012  }#012} 
A 2020-05-15T22:45:29Z write_gcm: Error talking to the endpoint. 
A 2020-05-15T22:45:29Z write_gcm: wg_transmit_unique_segment failed. 
A 2020-05-15T22:45:29Z write_gcm: wg_transmit_unique_segments failed. Flushing. 
Run Code Online (Sandbox Code Playgroud)

所以,每分钟我都会看到这样的日志出现。当我停止 stackdriver-agent 服务时,它们消失了。我的项目中有 4 个虚拟机。只有其中两个出现此类问题在 Cent OS7 VM 和 Ubuntu 18 VM 上

gav*_*koa 3

到目前为止有 2 个 PIT:

最后一篇有谷歌工程师对错误的解释400

这些消息很烦人但无害。您不会丢失任何指标。您可以安全地忽略这些日志。

根本原因是服务器端配置更改并影响所有代理。该更改仅影响响应的详细程度,而不影响请求的处理。一些传入的指标在该更改之前已被悄悄删除,现在已被大声删除。

默认情况下,这些指标由上游的collectd插件发送,并且我们没有任何控制措施可以完全阻止这些指标的发送。日志垃圾邮件消息是由collectd 对这些指标的内部处理产生的。

如果您想过滤掉看到的所有嘈杂日志,可以创建日志排除[1][2]或日志接收器[3][4]。日志排除会将日志与指定的过滤器进行匹配,并在日志进入之前将其从日志查看器中删除,而日志接收器将获取日志并将其定向到存储桶、大查询表或 PubSub 主题。

关于交换有一篇博客文章:

发生此错误的原因是 VM 实例没有交换内存,因此此指标插件尝试除以 0。

要解决此问题,请删除此配置并重新启动stackdriver-agent