我的数据有一列,我正在尝试使用行中每个"/"之后的内容创建其他列.以下是数据的前几行:
> dput(mydata)
structure(list(ALL = structure(c(1L, 4L, 4L, 3L, 2L), .Label = c("/
ca/put/sent_1/fe.gr/eq2_on/eq2_off",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/cbr_LBL", "/ca/put/sent_1/fe.g
r/eq2_on/eq2_off/cni_at.p3x.4",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/hi.on/hi.ov"), class = "factor
")), .Names = "ALL", class = "data.frame", row.names = c(NA,
-5L))
Run Code Online (Sandbox Code Playgroud)
如果变量出现在行中,结果应该如此(数据框)在新列中带有"1",否则为"0":
> dput(Result)
structure(list(ALL = structure(c(1L, 4L, 5L, 3L, 2L), .Label = c("/ca
/put/sent_1/fe.gr/eq2_on/eq2_off",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/cbr_LBL", "/ca/put/sent_1/fe.gr/
eq2_on/eq2_off/cni_at.p3x.4",
"/ca/put/sent_1/fe.gr/eq2_on/eq2_off/hi.on/hi.ov", "/ca/put/sent_1fe.
gr/eq2_on/eq2_off/hi.on/hi.ov"
), class = "factor"), ca = c(1L, 1L, 1L, 1L, 1L), put = c(1L,
1L, 1L, 1L, 1L), sent_1 = c(1L, 1L, 1L, 1L, 1L), fe.gr = c(1L, …Run Code Online (Sandbox Code Playgroud) 我有兴趣获取data.frame的列,其中列中的值是管道分隔的,并从管道分隔的值创建虚拟变量.
例如:
让我们说我们开始吧
df = data.frame(a = c("Ben|Chris|Jim", "Ben|Greg|Jim|", "Jim|Steve|Ben"))
> df
a
1 Ben|Chris|Jim
2 Ben|Greg|Jim
3 Jim|Steve|Ben
Run Code Online (Sandbox Code Playgroud)
我有兴趣最终得到:
df2 = data.frame(Ben = c(1, 1, 1), Chris = c(1, 0, 0), Jim = c(1, 1, 1), Greg = c(0, 1, 0),
Steve = c(0, 0, 1))
> df2
Ben Chris Jim Greg Steve
1 1 1 1 0 0
2 1 0 1 1 0
3 1 0 1 0 1
Run Code Online (Sandbox Code Playgroud)
我事先并不知道该领域有多少潜在价值.在上面的示例中,变量"a"可以包括1个值或10个值.假设它是一个合理的数字(即<100个可能的值).
有什么好办法吗?
我有一个具有以下结构的数据帧
test <- data.frame(col = c('a; ff; cc; rr;', 'rr; a; cc; e;'))
Run Code Online (Sandbox Code Playgroud)
现在我想从中创建一个数据帧,其中包含测试数据帧中每个唯一值的命名列.唯一值是以';'结尾的值 角色,从空间开始,不包括空间.然后,对于列中的每一行,我希望用1或0填充虚拟列.如下所示
data.frame(a = c(1,1), ff = c(1,0), cc = c(1,1), rr = c(1,0), e = c(0,1))
a ff cc rr e
1 1 1 1 1 0
2 1 0 1 1 1
Run Code Online (Sandbox Code Playgroud)
我尝试使用for循环和列中的唯一值创建一个df,但它变得很乱.我有一个可用的向量,包含列的唯一值.问题是如何创建1和0.我尝试了一些mutate_all()功能,grep()但这没用.
我有一个数据框,其中一列有多个值(以逗号分隔):
mydf <- structure(list(Age = c(99L, 10L, 40L, 15L),
Info = c("good, bad, sad", "nice, happy, joy", "NULL", "okay, nice, fun, wild, go"),
Target = c("Boy", "Girl", "Boy", "Boy")),
.Names = c("Age", "Info", "Target"),
row.names = c(NA, 4L),
class = "data.frame")
> mydf
Age Info Target
1 99 good, bad, sad Boy
2 10 nice, happy, joy Girl
3 40 NULL Boy
4 15 okay, nice, fun, wild, go Boy
Run Code Online (Sandbox Code Playgroud)
我想将Info列拆分为一个热编码列,并将结果附加到Target列之外,例如:
Age Info Target good bad …Run Code Online (Sandbox Code Playgroud)