use*_*224 4 python azure python-3.x azure-databricks
我正在研究 Azure Databrick。我在笔记本上运行 python 脚本并从 SQL 获取数据。我尝试将日期时间列拆分为日期和时间列。这是 python 的语法:
pushdown_query = "(SELECT * FROM STAGE.OutagesAndInterruptions) int_alias"
df = spark.read.jdbc(url=jdbcUrl, table=pushdown_query, properties=connectionProperties)
df['INTERRUPTION_DATE']=df['INTERRUPTION_TIME'].dt.date
Run Code Online (Sandbox Code Playgroud)
df['INTERRUPTION_TIME'] 看起来像:
+-------------------+
| INTERRUPTION_TIME|
+-------------------+
|1997-05-12 09:57:00|
|1998-03-08 13:00:00|
|1998-02-26 13:00:00|
|1998-02-26 13:00:00|
|1998-03-03 10:04:00|
|1998-05-20 09:27:00|
|1998-11-21 08:51:00|
|1998-11-27 08:44:00|
|1998-10-19 01:19:00|
|1998-10-19 01:44:00|
|2000-03-13 07:00:00|
|2000-03-19 07:30:00|
|2000-08-04 12:55:00|
|2002-09-30 18:11:00|
|2002-09-30 18:11:00|
|2002-05-06 09:22:00|
|2002-01-16 13:15:00|
|2003-01-08 15:46:00|
|2003-02-04 10:25:00|
|2003-02-04 10:25:00|
+-------------------+
Run Code Online (Sandbox Code Playgroud)
当我运行代码时,它抛出一条错误消息:
TypeError: 'DataFrame' object does not support item assignment
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<command-2244924718685919> in <module>
----> 1 df['INTERRUPTION_DATE']=df['INTERRUPTION_TIME'].dt.date
TypeError: 'DataFrame' object does not support item assignment
Run Code Online (Sandbox Code Playgroud)
我们可以在数据框的数据框中创建新列吗?我们如何在Azure数据块的数据框架上创建新列?
这应该有效
from pyspark.sql.types import DateType
df2 = df.withColumn('INTERRUPTION_DATE', ,df['INTERRUPTION_TIME'].cast(DateType()))
Run Code Online (Sandbox Code Playgroud)
评论后编辑:
from pyspark.sql.functions import date_format
df.select(date_format('INTERRUPTION_TIME', 'M/d/yyyy').alias('INTERRUPTION_DATE'),
date_format('INTERRUPTION_TIME', 'h:m:s a').alias('TIME'))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
22263 次 |
| 最近记录: |