小编jst*_*j14的帖子

错误:Pyspark pandas_udf 文档代码的“java.lang.UnsupportedOperationException”

我无法从此处提供的 Pyspark 文档中复制 Spark 代码

例如,当我尝试使用以下代码时Grouped Map

import numpy as np
import pandas as pd
from pyspark.sql.functions import pandas_udf, PandasUDFType
from pyspark.sql import SparkSession

spark.stop()

spark = SparkSession.builder.appName("New_App_grouped_map").getOrCreate()
spark.conf.set("spark.sql.execution.arrow.enabled", "true")

df = spark.createDataFrame(
    [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
    ("id", "v"))


@pandas_udf("id long, v double", PandasUDFType.GROUPED_MAP)
def subtract_mean(pdf):
    v = pdf.v
    return pdf.assign(v=v - v.mean())

df.groupby("id").apply(subtract_mean).show()
Run Code Online (Sandbox Code Playgroud)

我收到以下错误日志。

主要错误:

ERROR ArrowPythonRunner: Python worker exited unexpectedly (crashed)
Run Code Online (Sandbox Code Playgroud)
Caused by: java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.Direct …
Run Code Online (Sandbox Code Playgroud)

apache-spark apache-spark-sql pyspark pyspark-dataframes

10
推荐指数
2
解决办法
2971
查看次数