use*_*195 4 split r strsplit dataframe
我有一个如下数据集:
Country Region Molecule Item Code
IND NA PB102 FR206985511
THAI AP PB103 BA-107603 / F000113361 / 107603
LUXE NA PB105 1012701 / SGP-1012701 / F041701000
IND AP PB106 AU206985211 / CA-F206985211
THAI HP PB107 F034702000 / 1010701 / SGP-1010701
BANG NA PB108 F000007970/25781/20009021
Run Code Online (Sandbox Code Playgroud)
我想基于ITEMCODE列中的字符串值进行拆分,/并为每个条目创建一个新行.
例如,所需的输出将是:
Country Region Molecule Item.Code
IND NA PB102 FR206985511
THAI AP PB103 BA-107603
THAI AP PB103 F000113361
THAI AP PB103 107603
LUXE NA PB105 1012701
LUXE NA PB105 SGP-1012701
LUXE NA PB105 F041701000
IND AP PB106 AU206985211
IND AP PB106 CA-F206985211
THAI HP PB107 F034702000
THAI HP PB107 1010701
THAI HP PB107 SGP-1010701
BANG NA PB108 F000007970
BANG NA PB108 25781
BANG NA PB108 20009021
Run Code Online (Sandbox Code Playgroud)
我尝试了下面的代码
library(splitstackshape)
df2=concat.split.multiple(df1,"Plant.Item.Code","/", direction="long")
Run Code Online (Sandbox Code Playgroud)
但得到了错误
"Error: memory exhausted (limit reached?)"
Run Code Online (Sandbox Code Playgroud)
当我尝试时,strsplit()我得到以下错误消息.
Error in strsplit(df1$Plant.Item.Code, "/") : non-character argument
Run Code Online (Sandbox Code Playgroud)
Dav*_*urg 16
试试这个cSplit功能(因为你已经使用了@Anandas包).请注意,它将返回一个data.table对象,因此请确保已安装此软件包.您可以data.frame通过执行类似操作来恢复(如果您愿意)setDF(df2)
library(splitstackshape)
df2 <- cSplit(df1, "Item.Code", sep = "/", direction = "long")
df2
# Country Region Molecule Item.Code
# 1: IND NA PB102 FR206985511
# 2: THAI AP PB103 BA-107603
# 3: THAI AP PB103 F000113361
# 4: THAI AP PB103 107603
# 5: LUXE NA PB105 1012701
# 6: LUXE NA PB105 SGP-1012701
# 7: LUXE NA PB105 F041701000
# 8: IND AP PB106 AU206985211
# 9: IND AP PB106 CA-F206985211
# 10: THAI HP PB107 F034702000
# 11: THAI HP PB107 1010701
# 12: THAI HP PB107 SGP-1010701
# 13: BANG NA PB108 F000007970
# 14: BANG NA PB108 25781
# 15: BANG NA PB108 20009021
Run Code Online (Sandbox Code Playgroud)
基础R中的另一种方法:
as.data.frame(do.call(rbind, apply(df1, 1, function(x) {
do.call(expand.grid, strsplit(x, " */ *"))
})))
Run Code Online (Sandbox Code Playgroud)
结果:
Country Region Molecule Item.Code
1 IND <NA> PB102 FR206985511
2 THAI AP PB103 BA-107603
3 THAI AP PB103 F000113361
4 THAI AP PB103 107603
5 LUXE <NA> PB105 1012701
6 LUXE <NA> PB105 SGP-1012701
7 LUXE <NA> PB105 F041701000
8 IND AP PB106 AU206985211
9 IND AP PB106 CA-F206985211
10 THAI HP PB107 F034702000
11 THAI HP PB107 1010701
12 THAI HP PB107 SGP-1010701
13 BANG <NA> PB108 F000007970
14 BANG <NA> PB108 25781
15 BANG <NA> PB108 20009021
Run Code Online (Sandbox Code Playgroud)