为什么列1st_from_end包含null:
from pyspark.sql.functions import split
df = sqlContext.createDataFrame([('a b c d',)], ['s',])
df.select( split(df.s, ' ')[0].alias('0th'),
split(df.s, ' ')[3].alias('3rd'),
split(df.s, ' ')[-1].alias('1st_from_end')
).show()
Run Code Online (Sandbox Code Playgroud)