AWS DMS SQL Server 到 s3 parquet - 更改数据类型转换规则和“不支持 Parquet 类型:INT32 (UINT_8)”

Ant*_*ton 6 amazon-s3 apache-spark aws-dms

我们使用 AWS DMS 将 SQL Server 数据库作为 parquet 文件转储到 S3 中。想法是使用 Spark 来运行一些分析。完全加载完成后,无法读取镶木地板,因为它们UINT在架构中具有字段。Spark 拒绝使用 来读取它们Parquet type not supported: INT32 (UINT_8)。我们使用转换规则来覆盖列的数据类型UINT。但看起来它们没有被 DMS 引擎拾取。为什么?

有许多规则,例如“将单位转换为 int”,请参见下文(注意 UINT1 是 1 字节无符号DMS 数据类型):

{
  "rule-type": "transformation",
  "rule-id": "7",
  "rule-name": "uintToInt",
  "rule-action": "change-data-type",
  "rule-target": "column",
  "object-locator": {
    "schema-name": "%",
    "table-name": "%",
    "column-name": "%",
    "data-type": "uint1"
  },
  "data-type": {
    "type": "int4"
  }
}
Run Code Online (Sandbox Code Playgroud)

S3DataFormat=parquet;ParquetVersion=parquet_2_0和 DMS 引擎版本是3.3.2

但是仍然使用 uint 获取镶木地板模式。见下文:

id: int32
name: string
value: string
status: uint8
Run Code Online (Sandbox Code Playgroud)

尝试使用 Spark 阅读此类镶木地板给了我

org.apache.spark.sql.AnalysisException: Parquet type not supported: INT32 (UINT_8);
    at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotSupported$1(ParquetSchemaConverter.scala:100)
    at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:136)
Run Code Online (Sandbox Code Playgroud)

为什么DMS转换规则没有触发?

jcu*_*nte 4

在 DMS 上将数据直接从UINT转换为INT可解决此问题。您的映射规则应如下所示:

{
"rules": [
    ...
    {
        "rule-type": "transformation",
        "rule-id": "2",
        "rule-name": "unit1-to-int1",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "schema",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint1"
        },
        "data-type": {
            "type": "int1"
        }
    },
    {
        "rule-type": "transformation",
        "rule-id": "3",
        "rule-name": "unit2-to-int2",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "schema",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint2"
        },
        "data-type": {
            "type": "int2"
        }
    },
    {
        "rule-type": "transformation",
        "rule-id": "4",
        "rule-name": "unit4-to-int4",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "schema",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint4"
        },
        "data-type": {
            "type": "int4"
        }
    },
    {
        "rule-type": "transformation",
        "rule-id": "5",
        "rule-name": "unit8-to-int8",
        "rule-action": "change-data-type",
        "rule-target": "column",
        "object-locator": {
            "schema-name": "schema",
            "table-name": "%",
            "column-name": "%",
            "data-type": "uint8"
        },
        "data-type": {
            "type": "int8"
        }
    }
]}
Run Code Online (Sandbox Code Playgroud)

文档:https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.CustomizingTasks.TableMapping.html#CHAP_Tasks.CustomizingTasks.TableMapping.SelectionTransformation.Transformations