Gradle 中的 Kafka 集成测试遇到 GitHub Actions

BPa*_*ini 5 apache-kafka docker docker-compose github-actions

我们一直在将我们公司的应用程序从 CircleCI 迁移到 GitHub Actions,但我们遇到了一个奇怪的情况。

项目代码没有任何变化,但我们的 kafka 集成测试在 GH Actions 机器上开始失败。在 CircleCI 和本地(MacOS 和 Fedora Linux 机器)中一切正常。

CircleCI 和 GH Actions 机器都运行 Ubuntu(测试版本为 18.04 和 20.04)。MacOS 未在 GH Actions 中进行测试,因为它没有 Docker。

以下是构建和集成测试使用的docker-compose和文件:workflow

  • docker-compose.yml
version: '2.1'

services:
  postgres:
    container_name: listings-postgres
    image: postgres:10-alpine
    mem_limit: 500m
    networks:
      - listings-stack
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: listings
      POSTGRES_PASSWORD: listings
      POSTGRES_USER: listings
      PGUSER: listings
    healthcheck:
      test: ["CMD", "pg_isready"]
      interval: 1s
      timeout: 3s
      retries: 30

  listings-zookeeper:
    container_name: listings-zookeeper
    image: confluentinc/cp-zookeeper:6.2.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    networks:
      - listings-stack
    ports:
      - "2181:2181"
    healthcheck:
      test: nc -z localhost 2181 || exit -1
      interval: 10s
      timeout: 5s
      retries: 10

  listings-kafka:
    container_name: listings-kafka
    image: confluentinc/cp-kafka:6.2.0
    depends_on:
      listings-zookeeper:
        condition: service_healthy
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://listings-kafka:9092,PLAINTEXT_HOST://localhost:29092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_ZOOKEEPER_CONNECT: listings-zookeeper:2181
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - listings-stack
    ports:
      - "29092:29092"
    healthcheck:
      test: kafka-topics --bootstrap-server 127.0.0.1:9092 --list
      interval: 10s
      timeout: 10s
      retries: 50

networks: {listings-stack: {}}
Run Code Online (Sandbox Code Playgroud)
  • 构建.yml
name: Build

on: [ pull_request ]

env:
  AWS_ACCESS_KEY_ID: ${{ secrets.TUNNEL_AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.TUNNEL_AWS_SECRET_ACCESS_KEY }}
  AWS_DEFAULT_REGION: 'us-east-1'
  CIRCLECI_KEY_TUNNEL: ${{ secrets.ID_RSA_CIRCLECI_TUNNEL }}

jobs:
  build:
    name: Listings-API Build
    runs-on: [ self-hosted, zap ]

    steps:
      - uses: actions/checkout@v2
        with:
          token: ${{ secrets.GH_OLXBR_PAT }}
          submodules: recursive
          path: ./repo
          fetch-depth: 0

      - name: Set up JDK 11
        uses: actions/setup-java@v2
        with:
          distribution: 'adopt'
          java-version: '11'
          architecture: x64
          cache: 'gradle'

      - name: Docker up
        working-directory: ./repo
        run: docker-compose up -d

      - name: Build with Gradle
        working-directory: ./repo
        run: ./gradlew build -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2 -x integrationTest

      - name: Integration tests with Gradle
        working-directory: ./repo
        run: ./gradlew integrationTest -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2

      - name: Sonarqube
        working-directory: ./repo
        env:
          GITHUB_TOKEN: ${{ secrets.GH_OLXBR_PAT }}
          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
        run: ./gradlew sonarqube --info -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2

      - name: Docker down
        if: always()
        working-directory: ./repo
        run: docker-compose down --remove-orphans

      - name: Cleanup Gradle Cache
        # Remove some files from the Gradle cache, so they aren't cached by GitHub Actions.
        # Restoring these files from a GitHub Actions cache might cause problems for future builds.
        run: |
          rm -f ${{ env.HOME }}/.gradle/caches/modules-2/modules-2.lock
          rm -f ${{ env.HOME }}/.gradle/caches/modules-2/gc.properties

Run Code Online (Sandbox Code Playgroud)

集成测试是使用Spock框架编写的,出现错误的部分如下:

  boolean compareRecordSend(String topicName, int expected) {
    def condition = new PollingConditions()
    condition.within(kafkaProperties.listener.pollTimeout.getSeconds() * 5) {
      assert expected == getRecordSendTotal(topicName)
    }
    return true
  }

  int getRecordSendTotal(String topicName) {
    kafkaTemplate.flush()
    return kafkaTemplate.metrics().find {
      it.key.name() == "record-send-total" && it.key.tags().get("topic") == topicName
    }?.value?.metricValue() ?: 0
  }
Run Code Online (Sandbox Code Playgroud)

我们得到的错误是:

Condition not satisfied after 50.00 seconds and 496 attempts
    at spock.util.concurrent.PollingConditions.within(PollingConditions.java:185)
    at com.company.listings.KafkaAwareBaseSpec.compareRecordSend(KafkaAwareBaseSpec.groovy:31)
    at com.company.listings.application.worker.listener.notifier.ListingNotifierITSpec.should notify listings(ListingNotifierITSpec.groovy:44)

    Caused by:
    Condition not satisfied:

    expected == getRecordSendTotal(topicName)
    |        |  |                  |
    10       |  0                  v4
                false
Run Code Online (Sandbox Code Playgroud)

我们已经调试了 GH Actions 机器(通过 SSH 连接)并手动运行。错误仍然会发生,但如果第二次运行集成测试(以及后续运行),一切都会正常运行。

我们还尝试初始化所有必要的主题并抢先向它们发送一些消息,但行为是相同的。

我们的问题是:

  • 在Ubuntu机器上运行Kafka dockerized时是否有任何问题(该错误也发生在同事的Ubuntu机器上)?
  • 关于为什么会发生这种情况有什么想法吗?

编辑

  • application.yml(Kafka相关配置)
spring:
  kafka:
    bootstrap-servers: localhost:29092
    producer:
      batch-size: 262144
      buffer-memory: 536870912
      retries: 1
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.apache.kafka.common.serialization.ByteArraySerializer
      acks: all
      properties:
        linger.ms: 0
Run Code Online (Sandbox Code Playgroud)

BPa*_*ini 0

我们确定了 Kafka 测试之间的一些测试序列依赖性。

我们更新了 Gradle 版本,7.3-rc-3该版本具有更具确定性的测试扫描方法。当我们准备修复测试的依赖项时,此更新“解决”了我们的问题。