Ath*_*kur 1 scala apache-spark apache-spark-sql
我有以下格式的数据框
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+
|DataPartition |TimeStamp |FFAction|!||IdentifierValue_effectiveFrom|IdentifierValue_effectiveTo|IdentifierValue_identifierEntityId|IdentifierValue_identifierEntityTypeId|IdentifierValue_identifierTypeId|
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+
|SelfSourcedPublic|2018-03-05T11:54:18+00:00|I|!| |1900-01-01T00:00:00+00:00 |9999-12-31T00:00:00+00:00 |4295903126 |404010 |320150 |
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+
Run Code Online (Sandbox Code Playgroud)
我想在下面的列中添加带有条件的额外列
IdentifierValue_identifierEntityTypeId
Run Code Online (Sandbox Code Playgroud)
添加具有以下条件的额外列分区
如果 IdentifierValue_identifierEntityTypeId =1001371402 那么分区 =Repno2FundamentalSeries 否则如果 IdentifierValue_identifierEntityTypeId404010 那么分区 = Repno2Organization
这就是我正在努力实现的目标
val temp = temp1.withColumn("Partition", when($"IdentifierValue_identifierEntityTypeId" === "404010", 0).otherwise("Repno2FundamentalSeries"))
temp.show(false)
Run Code Online (Sandbox Code Playgroud)
我的输出低于输出,但得到的值为零
IdentifierValue_identifierEntityTypeId
Run Code Online (Sandbox Code Playgroud)
我是 Scala 的新手,所以提出了基本问题
对于列上的多个条件如何写 when 和 else 。这对我不起作用
线程“main”中的异常 java.lang.IllegalArgumentException:otherwise() 只能在先前由 when() 生成的列上应用一次
val dataMain = dataMain1.withColumn(
"Partition",
when($"RelationObjectId_relatedObjectType" === "EDInstrument" && $"RelationObjectId_relatedObjectType" === "Fundamental", "Instrument2Fundamental")
.otherwise(when($"RelationObjectId_relatedObjectType" === "EDInstrument" && $"RelationObjectId_relatedObjectType" === "FundamentalSeries", "Instrument2FundamentalSeries"))
.otherwise(when($"RelationObjectId_relatedObjectType" === "Organization" && $"RelationObjectId_relatedObjectType" === "Fundamental", "Organization2Fundamental"))
.otherwise(when($"RelationObjectId_relatedObjectType" === "Organization" && $"RelationObjectId_relatedObjectType" === "FundamentalSeries", "Organization2FundamentalSeries"))
)
Run Code Online (Sandbox Code Playgroud)
根据您提供的条件,您应该如下更改 when 条件。
如果 IdentifierValue_identifierEntityTypeId =1001371402 那么分区 =Repno2FundamentalSeries 否则如果 IdentifierValue_identifierEntityTypeId404010 那么分区 = Repno2Organization
df1.withColumn("Partition",
when($"IdentifierValue_identifierEntityTypeId" === "1001371402", "Repno2FundamentalSeries")
.otherwise("Repno2Organization")
)
Run Code Online (Sandbox Code Playgroud)
输出:
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+-----------------------+
|DataPartition |TimeStamp |FFAction|!||IdentifierValue_effectiveFrom|IdentifierValue_effectiveTo|IdentifierValue_identifierEntityId|IdentifierValue_identifierEntityTypeId|IdentifierValue_identifierTypeId|Partition |
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+-----------------------+
|SelfSourcedPublic|2018-03-05T11:54:18+00:00|I||! |1900-01-01T00:00:00+00:00 |9999-12-31T00:00:00+00:00 |4295903126 |404010 |320150 |Repno2FundamentalSeries|
+-----------------+-------------------------+-----------+-----------------------------+---------------------------+----------------------------------+--------------------------------------+--------------------------------+-----------------------+
Run Code Online (Sandbox Code Playgroud)
编辑:
这是您编写嵌套的方式 When
val dataMain = df.withColumn(
"Partition",
when(($"RelationObjectId_relatedObjectType" === "EDInstrument" && $"RelationObjectId_relatedObjectType" === "Fundamental"), "Instrument2Fundamental")
.otherwise(
when($"RelationObjectId_relatedObjectType" === "EDInstrument" && $"RelationObjectId_relatedObjectType" === "FundamentalSeries", "Instrument2FundamentalSeries")
.otherwise(
when($"RelationObjectId_relatedObjectType" === "Organization" && $"RelationObjectId_relatedObjectType" === "Fundamental", "Organization2Fundamental")
.otherwise(when($"RelationObjectId_relatedObjectType" === "Organization" && $"RelationObjectId_relatedObjectType" === "FundamentalSeries", "Organization2FundamentalSeries")
)
)
)
Run Code Online (Sandbox Code Playgroud)
)
希望这可以帮助