我无法从此处提供的 Pyspark 文档中复制 Spark 代码。
例如,当我尝试使用以下代码时Grouped Map:
import numpy as np
import pandas as pd
from pyspark.sql.functions import pandas_udf, PandasUDFType
from pyspark.sql import SparkSession
spark.stop()
spark = SparkSession.builder.appName("New_App_grouped_map").getOrCreate()
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
df = spark.createDataFrame(
[(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
("id", "v"))
@pandas_udf("id long, v double", PandasUDFType.GROUPED_MAP)
def subtract_mean(pdf):
v = pdf.v
return pdf.assign(v=v - v.mean())
df.groupby("id").apply(subtract_mean).show()
Run Code Online (Sandbox Code Playgroud)
我收到以下错误日志。
主要错误:
ERROR ArrowPythonRunner: Python worker exited unexpectedly (crashed)
Run Code Online (Sandbox Code Playgroud)
Caused by: java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.Direct …Run Code Online (Sandbox Code Playgroud)