Gau*_*sal 4 pyspark sparkr spark-dataframe
我有一个Spark DataFrame,如下所示:
#Create DataFrame
df <- data.frame(name = c("Thomas", "William", "Bill", "John"),
dates = c('2017-01-05', '2017-02-23', '2017-03-16', '2017-04-08'))
df <- createDataFrame(df)
#Make sure df$dates column is in 'date' format
df <- withColumn(df, 'dates', cast(df$dates, 'date'))
name | dates
--------------------
Thomas |2017-01-05
William |2017-02-23
Bill |2017-03-16
John |2017-04-08
Run Code Online (Sandbox Code Playgroud)
我想更改dates为月末日期,因此它们看起来如下所示.我该怎么做呢?SparkR或PySpark代码都可以.
name | dates
--------------------
Thomas |2017-01-31
William |2017-02-28
Bill |2017-03-31
John |2017-04-30
Run Code Online (Sandbox Code Playgroud)
您可以使用以下(PySpark):
from pyspark.sql.functions import last_day
df.select('name', last_day(df.dates).alias('dates')).show()
Run Code Online (Sandbox Code Playgroud)
澄清last_day(date)一下,返回该日期所属月份的最后一天.
我很确定sparkR中有类似的功能 https://spark.apache.org/docs/1.6.2/api/R/last_day.html
| 归档时间: |
|
| 查看次数: |
4554 次 |
| 最近记录: |