试图在R数据框中拆分一个列,该数据框在变量中有多个空格,但我想在第一个空格上拆分.示例数据框:
df <- data.frame(game = c(1, 2, 3, 4, 5, 6), date = c("Monday Apr 3", "Tuesday Apr 4", "Wednesday Apr 5", "Thursday Apr 6", "Friday Apr 7", "Saturday Apr 8"))
Run Code Online (Sandbox Code Playgroud)
我正在尝试使用tidyr在第一个空格中拆分df'date'列,以便日期在它自己的列中:
game day date
1 1 Monday Apr 3
2 2 Tuesday Apr 4
3 3 Wednesday Apr 5
4 4 Thursday Apr 6
5 5 Friday Apr 7
6 6 Saturday Apr 8
Run Code Online (Sandbox Code Playgroud)
以上是问题所在.以下是我尝试过的,出了什么问题.
通过tidyr文档,'sep'的默认值是'一个匹配任何非字母数字值序列的正则表达式.' 所以如果我这样做:
df %>% separate(date, c("day", "date"))
Run Code Online (Sandbox Code Playgroud)
这将在空间上分裂,但它在两个空间上分裂(例如'星期一'之后的空间和'星期一4月3日''4月'之后的空格).结果是:
game day date
1 1 Monday Apr
2 2 Tuesday Apr
3 3 Wednesday Apr
4 4 Thursday Apr
5 5 Friday Apr
6 6 Saturday Apr
Warning message:
Too many values at 6 locations: 1, 2, 3, 4, 5, 6
Run Code Online (Sandbox Code Playgroud)
我可以添加正则表达式来选择第一个空格(我检查了这个正则表达式在Sublime Text中工作):
df %>% separate(date, c("day", "date"), sep='^[^\\s]*\\K\\s')
Run Code Online (Sandbox Code Playgroud)
但这给了我:
game day date
1 1 Monday Apr 3 <NA>
2 2 Tuesday Apr 4 <NA>
3 3 Wednesday Apr 5 <NA>
4 4 Thursday Apr 6 <NA>
5 5 Friday Apr 7 <NA>
6 6 Saturday Apr 8 <NA>
Warning message:
Too few values at 6 locations: 1, 2, 3, 4, 5, 6
Run Code Online (Sandbox Code Playgroud)
出了什么问题?或者我如何使这项工作?或者我明白不明白的是什么?
您需要将extra参数指定为merge:
library(tidyr)
df %>% separate(date, c("day", "date"), extra = "merge")
# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8
Run Code Online (Sandbox Code Playgroud)