我有一个带有此架构的火花数据框:
root
|-- product_id: integer (nullable = true)
|-- stock: integer (nullable = true)
|-- start_date: date (nullable = true)
|-- end_date: date (nullable = true)
Run Code Online (Sandbox Code Playgroud)
尝试将其传递给 apandas_udf或转换为 Pandas 数据帧时:
pandas_df = spark_df.toPandas()
Run Code Online (Sandbox Code Playgroud)
它返回此错误:
AttributeError Traceback (most recent call last)
<ipython-input-86-4bccc6e8422d> in <module>()
10 # spark_df.printSchema()
11
---> 12 pandas_df = spark_df.toPandas()
/home/.../lib/python2.7/site-packages/pyspark/sql/dataframe.pyc in toPandas(self)
2123 table = pyarrow.Table.from_batches(batches)
2124 pdf = table.to_pandas()
-> 2125 pdf = _check_dataframe_convert_date(pdf, self.schema)
2126 return _check_dataframe_localize_timestamps(pdf, timezone)
2127 else:
/home.../lib/python2.7/site-packages/pyspark/sql/types.pyc in _check_dataframe_convert_date(pdf, …Run Code Online (Sandbox Code Playgroud)