Joh*_*odd 7 python timestamp pandas apache-spark
你如何将时间戳数据从Spark Python往返转换为Pandas并返回?我从Spark中的Hive表中读取数据,想在Pandas中进行一些计算,并将结果写回Hive.只有最后一部分失败,将Pandas时间戳转换回Spark DataFrame时间戳.
import datetime
import pandas as pd
dates = [
('today', '2017-03-03 11:30:00')
, ('tomorrow', '2017-03-04 08:00:00')
, ('next Thursday', '2017-03-09 20:00:00')
]
string_date_rdd = sc.parallelize(dates)
timestamp_date_rdd = string_date_rdd.map(lambda t: (t[0], datetime.datetime.strptime(t[1], "%Y-%m-%d %H:%M:%S')))
timestamp_df = sqlContext.createDataFrame(timestamp_date_rdd, ['Day', 'Date'])
timestamp_pandas_df = timestamp_df.toPandas()
roundtrip_df = sqlContext.createDataFrame(timestamp_pandas_df)
roundtrip_df.printSchema()
roundtrip_df.show()
root
|-- Day: string (nullable = true)
|-- Date: long (nullable = true)
+-------------+-------------------+
| Day| Date|
+-------------+-------------------+
| today|1488540600000000000|
| tomorrow|1488614400000000000|
|next Thursday|1489089600000000000|
+-------------+-------------------+
Run Code Online (Sandbox Code Playgroud)
此时,往返Spark DataFrame的日期列为数据类型long.在Pyspark中,这可以很容易地转换回日期时间对象,例如datetime.datetime.fromtimestamp(148908960000000000/1000000000),尽管一天中的时间已经过了几个小时.如何执行此操作以转换Spark DataFrame的数据类型?
Python 3.4.5,Spark 1.6.0
谢谢,约翰
将 datetime64 列转换为 python datetime 对象对我有用。
from pandas import Series
def convert_to_python_datetime(df):
df_copy = df.copy()
for column_name, column in df_copy.iteritems():
if column.dtype.kind == 'M':
df_copy[column_name] = Series(column.dt.to_pydatetime(), dtype=object)
return df_copy
tmp = convert_to_python_datetime(timestamp_pandas_df)
roundtrip_df = sqlContext.createDataFrame(tmp)
roundtrip_df.printSchema()
roundtrip_df.show()
Run Code Online (Sandbox Code Playgroud)
输出:
root
|-- Day: string (nullable = true)
|-- Date: timestamp (nullable = true)
+-------------+--------------------+
| Day| Date|
+-------------+--------------------+
| today|2017-03-03 11:30:...|
| tomorrow|2017-03-04 08:00:...|
|next Thursday|2017-03-09 20:00:...|
+-------------+--------------------+
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2734 次 |
| 最近记录: |