小编Una*_*ipg的帖子

从带有日期的火花数据帧转换为熊猫数据帧时出错

我有一个带有此架构的火花数据框:

root
 |-- product_id: integer (nullable = true)
 |-- stock: integer (nullable = true)
 |-- start_date: date (nullable = true)
 |-- end_date: date (nullable = true)
Run Code Online (Sandbox Code Playgroud)

尝试将其传递给 apandas_udf或转换为 Pandas 数据帧时:

pandas_df = spark_df.toPandas()
Run Code Online (Sandbox Code Playgroud)

它返回此错误:

AttributeError        Traceback (most recent call last)
<ipython-input-86-4bccc6e8422d> in <module>()
     10 # spark_df.printSchema()
     11 
---> 12 pandas_df = spark_df.toPandas()

/home/.../lib/python2.7/site-packages/pyspark/sql/dataframe.pyc in toPandas(self)
   2123                         table = pyarrow.Table.from_batches(batches)
   2124                         pdf = table.to_pandas()
-> 2125                         pdf = _check_dataframe_convert_date(pdf, self.schema)
   2126                         return _check_dataframe_localize_timestamps(pdf, timezone)
   2127                     else:

/home.../lib/python2.7/site-packages/pyspark/sql/types.pyc in _check_dataframe_convert_date(pdf, …
Run Code Online (Sandbox Code Playgroud)

dataframe pandas apache-spark pyspark

8
推荐指数
1
解决办法
2308
查看次数

标签 统计

apache-spark ×1

dataframe ×1

pandas ×1

pyspark ×1