iNy*_*yar 4 r multiple-columns strsplit
我试图在数据框内将字符向量分成三个不同的向量.
我的数据类似于:
> df <- data.frame(filename = c("Author1 (2010) Title of paper",
"Author2 et al (2009) Title of paper",
"Author3 & Author4 (2004) Title of paper"),
stringsAsFactors = FALSE)
Run Code Online (Sandbox Code Playgroud)
我想这3个信息(拆分authors,year,title)分成三个不同的列,所以,这将是:
> df
filename author year title
1 Author1 (2010) Title1 Author1 2010 Title1
2 Author2 et al (2009) Title2 Author2 et al 2009 Title2
3 Author3 & Author4 (2004) Title3 Author3 & Author4 2004 Title3
Run Code Online (Sandbox Code Playgroud)
我习惯在3个元素的向量中strsplit分割每个filename元素:
df$temp <- strsplit(df$filename, " \\(|\\) ")
Run Code Online (Sandbox Code Playgroud)
但现在,我找不到将每个元素放在单独列中的方法.我可以访问这样的特定信息:
> df$temp[[2]][1]
[1] "Author2 et al"
Run Code Online (Sandbox Code Playgroud)
但无法找到如何将其放在其他列中
> df$author <- df$temp[[]][1]
Error
Run Code Online (Sandbox Code Playgroud)
你可以试试tstrsplitdevel版本data.table
library(data.table)#v1.9.5+
setDT(df)[, c('author', 'year', 'title') :=tstrsplit(filename, ' \\(|\\) ')]
df
# filename author year
#1: Author1 (2010) Title of paper Author1 2010
#2: Author2 et al (2009) Title of paper Author2 et al 2009
#3: Author3 & Author4 (2004) Title of paper Author3 & Author4 2004
# title
#1: Title of paper
#2: Title of paper
#3: Title of paper
Run Code Online (Sandbox Code Playgroud)
编辑:包含OP的拆分模式以删除空格.
有了这个tidyr包,这是一个separate解决方案:
separate(df, "filename", c("Author","Year","Title"), sep=" \\(|\\) "), remove=F)
# filename Author
# 1 Author1 (2010) Title of paper Author1
# 2 Author2 et al (2009) Title of paper Author2 et al
# 3 Author3 & Author4 (2004) Title of paper Author3 & Author4
# Year Title
# 1 2010 Title of paper
# 2 2009 Title of paper
# 3 2004 Title of paper
Run Code Online (Sandbox Code Playgroud)
领先和尾随空间已被考虑在内
result <- cbind(df, do.call("rbind", strsplit(df$filename, " \\(|\\) ")))
colnames(result)[2:4] <- c("author", "year", "title")
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
662 次 |
| 最近记录: |