带有来自 StringIndexer 的标签的 IndexToString 转换

Aji*_* Kb 3 python machine-learning apache-spark pyspark

如何IndexToString通过从 中获取标签来转换 using labelIndexer

labelIndexer = StringIndexer(inputCol="shutdown_reason", outputCol="label")

idx_to_string = IndexToString(inputCol="prediction", outputCol="predictedValue")
Run Code Online (Sandbox Code Playgroud)

hi-*_*zir 8

如何通过从 labelIndexer 获取标签来使用 IndexToString 进行转换?

你不能。labelIndexerStringIndexer, 并且要获取标签,您需要StringIndexerModel. fit该模型:

from pyspark.ml.feature import *

df = spark.createDataFrame([
    ("foo", ), ("bar", )
]).toDF("shutdown_reason")

labelIndexerModel = labelIndexer.fit(df)
Run Code Online (Sandbox Code Playgroud)

使用标签:

idx_to_string.setLabels(labelIndexerModel.labels)
idx_to_string.getLabels()
# ['foo', 'bar']
Run Code Online (Sandbox Code Playgroud)

transform

df_with_prediction = labelIndexerModel.transform(df).withColumnRenamed(
    "label", "prediction"
)

idx_to_string.transform(df_with_prediction).show()
# +---------------+----------+--------------+
# |shutdown_reason|prediction|predictedValue|
# +---------------+----------+--------------+
# |            foo|       0.0|           foo|
# |            bar|       1.0|           bar|
# +---------------+----------+--------------+
Run Code Online (Sandbox Code Playgroud)