我有一个具有以下架构的PySpark数据框:
root
|-- epoch: double (nullable = true)
|-- var1: double (nullable = true)
|-- var2: double (nullable = true)
Run Code Online (Sandbox Code Playgroud)
历元以秒为单位,应转换为日期时间。为此,我定义了一个用户定义的函数(udf),如下所示:
from pyspark.sql.functions import udf
import time
def epoch_to_datetime(x):
return time.localtime(x)
# return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(x))
# return x * 0 + 1
epoch_to_datetime_udf = udf(epoch_to_datetime, DoubleType())
df.withColumn("datetime", epoch_to_datetime(df2.epoch)).show()
Run Code Online (Sandbox Code Playgroud)
我收到此错误:
---> 21 return time.localtime(x)
22 # return x * 0 + 1
23
TypeError: a float is required
Run Code Online (Sandbox Code Playgroud)
如果我仅返回x + 1该函数,它将起作用。尝试float(x)or float(str(x))或numpy.float(x)in time.localtime(x) …