Docker 容器中间歇性 DNS 查找失败

Mos*_*atz 6 dns docker docker-compose

我有一个 Node.js 应用程序在使用以下(简化的)Docker Compose 配置设置的 Docker 容器中运行:

x-restart-policy: &restart_policy
  restart: unless-stopped
x-sails-app-defaults: &sails_defaults
  << : *restart_policy
  image: registry.example.com/group/webapp/snapshots
  depends_on:
    - postgres

services:

  postgres:
    << : *restart_policy
    image: postgis/postgis:11-3.1-alpine
    environment:
      POSTGRES_PASSWORD_FILE: /postgres-password
    volumes:
      - postgres:/var/lib/postgresql/data
      - /root/postgres-setup-password:/postgres-password:ro
      - ./postgres/setup.sh:/docker-entrypoint-initdb-resources/001-setup.sh:ro

  webapp:
    << : *sails_defaults
    volumes:
      - ./apps/webapp.sailsrc:/usr/src/app/.sailsrc:ro
    labels:
      - traefik.enable=true
      - traefik.http.routers.sbdev.entrypoints=https
      - traefik.http.routers.sbdev.rule=Host(`app.example.com`)
      - traefik.http.routers.sbdev.tls=true
      - traefik.http.routers.sbdev.tls.certresolver=letsencrypt
Run Code Online (Sandbox Code Playgroud)

该应用程序的 Dockerfile 如下所示:

FROM node:fermium-alpine3.12 AS builderbase

# Need git and some other tools for npm install
RUN apk add --no-cache \
    git \
    python3 \
    make \
    openssh-client \
    g++

WORKDIR /usr/src/app

COPY package*.json ./

# Don't download Chromium for Puppeteer,
# since we will install it ourselves
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true

FROM builderbase AS builderdev

# Build development dependencies
RUN npm set progress=false && \
    npm config set depth 0 && \
    npm install

FROM node:fermium-alpine3.12 AS base

# Don't download Chromium for Puppeteer,
# since we will install it ourselves
ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser

# Install Chrome for Puppeteer.
# See https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#running-on-alpine
RUN apk add --no-cache \
    # Make sure version here is compatible with version of puppeteer
    chromium=86.0.4240.111-r0 \
    nss \
    freetype \
    freetype-dev \
    harfbuzz \
    ca-certificates \
    ttf-freefont

# Install tools we need
RUN apk add --no-cache \
    bash \
    jq \
    postgresql-client \
    su-exec \
    tini

COPY ./lib/docker/entrypoint.sh /docker-entrypoint.sh

ENTRYPOINT ["/sbin/tini", "--", "/docker-entrypoint.sh"]

EXPOSE 1337
WORKDIR /usr/src/app
COPY --chown=node . .

FROM base AS development

COPY --chown=node --from=builderdev /usr/src/app/node_modules node_modules
Run Code Online (Sandbox Code Playgroud)

该应用程序在 Ubuntu 20.04 上的 Docker 20.10.1 中运行。

该应用程序使用Sails.js框架。它还使用该sails-hook-cron包来运行计划的作业。

对于sails-hook-cron每分钟运行一次的作业,每隔几分钟我就会在日志中收到以下错误:

 2021-02-12T17:02:00.021Z error: Error sending Notifier health-check ping Exception: `getConnection` failed ("failed").  Could not acquire a connection to the database using the specified manager.
 Additional data:

 {
   error: Error: getaddrinfo ENOTFOUND postgres
       at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:67:26) {
     errno: -3008,
     code: 'ENOTFOUND',
     syscall: 'getaddrinfo',
     hostname: 'postgres'
   },
   meta: undefined
 }
     at flaverr (/usr/src/app/node_modules/flaverr/index.js:94:15)
     at Function.handlerCbs.<computed> [as failed] (/usr/src/app/node_modules/machine/lib/private/help-build-machine.js:879:31)
     at PendingItem.cb [as callback] (/usr/src/app/node_modules/machinepack-postgresql/machines/get-connection.js:76:22)
     at /usr/src/app/node_modules/pg-pool/index.js:237:23
     at Connection.connectingErrorHandler (/usr/src/app/node_modules/machinepack-postgresql/node_modules/pg/lib/client.js:213:14)
     at Connection.emit (events.js:315:20)
     at Connection.EventEmitter.emit (domain.js:467:12)
     at Socket.reportStreamError (/usr/src/app/node_modules/machinepack-postgresql/node_modules/pg/lib/connection.js:57:10)
     at Socket.emit (events.js:315:20)
     at Socket.EventEmitter.emit (domain.js:467:12)
     at emitErrorNT (internal/streams/destroy.js:106:8)
     at emitErrorCloseNT (internal/streams/destroy.js:74:3)
     at processTicksAndRejections (internal/process/task_queues.js:80:21)
Run Code Online (Sandbox Code Playgroud)

这项工作的代码非常简单:

 2021-02-12T17:02:00.021Z error: Error sending Notifier health-check ping Exception: `getConnection` failed ("failed").  Could not acquire a connection to the database using the specified manager.
 Additional data:

 {
   error: Error: getaddrinfo ENOTFOUND postgres
       at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:67:26) {
     errno: -3008,
     code: 'ENOTFOUND',
     syscall: 'getaddrinfo',
     hostname: 'postgres'
   },
   meta: undefined
 }
     at flaverr (/usr/src/app/node_modules/flaverr/index.js:94:15)
     at Function.handlerCbs.<computed> [as failed] (/usr/src/app/node_modules/machine/lib/private/help-build-machine.js:879:31)
     at PendingItem.cb [as callback] (/usr/src/app/node_modules/machinepack-postgresql/machines/get-connection.js:76:22)
     at /usr/src/app/node_modules/pg-pool/index.js:237:23
     at Connection.connectingErrorHandler (/usr/src/app/node_modules/machinepack-postgresql/node_modules/pg/lib/client.js:213:14)
     at Connection.emit (events.js:315:20)
     at Connection.EventEmitter.emit (domain.js:467:12)
     at Socket.reportStreamError (/usr/src/app/node_modules/machinepack-postgresql/node_modules/pg/lib/connection.js:57:10)
     at Socket.emit (events.js:315:20)
     at Socket.EventEmitter.emit (domain.js:467:12)
     at emitErrorNT (internal/streams/destroy.js:106:8)
     at emitErrorCloseNT (internal/streams/destroy.js:74:3)
     at processTicksAndRejections (internal/process/task_queues.js:80:21)
Run Code Online (Sandbox Code Playgroud)

考虑到这可能是 Docker-Compose 生成容器主机名的问题,我在适当的位置添加了hostname: postgresdocker-compose.yml但这似乎没有任何帮助。

然后我意识到这可能不是问题,因为绝大多数时候它都工作正常(无论是在主应用程序代码还是在这些计划任务中)。

为什么我会遇到这些间歇性名称查找失败的情况?如何修复它?