Ant*_*ton 6 amazon-s3 apache-spark aws-dms
我们使用 AWS DMS 将 SQL Server 数据库作为 parquet 文件转储到 S3 中。想法是使用 Spark 来运行一些分析。完全加载完成后,无法读取镶木地板,因为它们UINT在架构中具有字段。Spark 拒绝使用 来读取它们Parquet type not supported: INT32 (UINT_8)。我们使用转换规则来覆盖列的数据类型UINT。但看起来它们没有被 DMS 引擎拾取。为什么?
有许多规则,例如“将单位转换为 int”,请参见下文(注意 UINT1 是 1 字节无符号DMS 数据类型):
{
"rule-type": "transformation",
"rule-id": "7",
"rule-name": "uintToInt",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "%",
"table-name": "%",
"column-name": "%",
"data-type": "uint1"
},
"data-type": {
"type": "int4"
}
}
Run Code Online (Sandbox Code Playgroud)
S3DataFormat=parquet;ParquetVersion=parquet_2_0和 DMS 引擎版本是3.3.2
但是仍然使用 uint 获取镶木地板模式。见下文:
id: int32
name: string
value: string
status: uint8
Run Code Online (Sandbox Code Playgroud)
尝试使用 Spark 阅读此类镶木地板给了我
org.apache.spark.sql.AnalysisException: Parquet type not supported: INT32 (UINT_8);
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.typeNotSupported$1(ParquetSchemaConverter.scala:100)
at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:136)
Run Code Online (Sandbox Code Playgroud)
为什么DMS转换规则没有触发?
在 DMS 上将数据直接从UINT转换为INT可解决此问题。您的映射规则应如下所示:
{
"rules": [
...
{
"rule-type": "transformation",
"rule-id": "2",
"rule-name": "unit1-to-int1",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "schema",
"table-name": "%",
"column-name": "%",
"data-type": "uint1"
},
"data-type": {
"type": "int1"
}
},
{
"rule-type": "transformation",
"rule-id": "3",
"rule-name": "unit2-to-int2",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "schema",
"table-name": "%",
"column-name": "%",
"data-type": "uint2"
},
"data-type": {
"type": "int2"
}
},
{
"rule-type": "transformation",
"rule-id": "4",
"rule-name": "unit4-to-int4",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "schema",
"table-name": "%",
"column-name": "%",
"data-type": "uint4"
},
"data-type": {
"type": "int4"
}
},
{
"rule-type": "transformation",
"rule-id": "5",
"rule-name": "unit8-to-int8",
"rule-action": "change-data-type",
"rule-target": "column",
"object-locator": {
"schema-name": "schema",
"table-name": "%",
"column-name": "%",
"data-type": "uint8"
},
"data-type": {
"type": "int8"
}
}
]}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2978 次 |
| 最近记录: |