小编rme*_*ves的帖子

Pyspark:pyarrow.lib.ArrowTypeError:需要一个整数(获取类型时间戳)

我正在将 Spark 数据帧写入 bigquery 表。这是可行的,但现在我在将数据写入 bigquery 之前调用 pandas udf。由于某种原因,当我在将 Spark 数据帧写入 bigquery 之前调用 pandas udf 时,我现在看到以下错误:

Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/mnt1/yarn/usercache/hadoop/appcache/application_1579619644892_0001/container_1579619644892_0001_01_000002/pyspark.zip/pyspark/worker.py", line 377, in main
    process()
  File "/mnt1/yarn/usercache/hadoop/appcache/application_1579619644892_0001/container_1579619644892_0001_01_000002/pyspark.zip/pyspark/worker.py", line 372, in process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/mnt1/yarn/usercache/hadoop/appcache/application_1579619644892_0001/container_1579619644892_0001_01_000002/pyspark.zip/pyspark/serializers.py", line 287, in dump_stream
    batch = _create_batch(series, self._timezone)
  File "/mnt1/yarn/usercache/hadoop/appcache/application_1579619644892_0001/container_1579619644892_0001_01_000002/pyspark.zip/pyspark/serializers.py", line 256, in _create_batch
    arrs = [create_array(s, t) for s, t in series]
  File "/mnt1/yarn/usercache/hadoop/appcache/application_1579619644892_0001/container_1579619644892_0001_01_000002/pyspark.zip/pyspark/serializers.py", line 256, in <listcomp>
    arrs = [create_array(s, t) for s, t in series] …
Run Code Online (Sandbox Code Playgroud)

apache-spark pyspark

5
推荐指数
1
解决办法
8896
查看次数

标签 统计

apache-spark ×1

pyspark ×1