我有一个如下所示的 csv 文件:
Id,Title,FullDescription,LocationRaw,LocationNormalized
1,hi,abc,def,Bristol
1,yo,abc,def,Bristol
1,was,abc,def,England
1,up,abc,def,India
1,yoh,abc,def,Nepal
1,home,abc,def,Bristol
Run Code Online (Sandbox Code Playgroud)
我想为每个LocationNormalized变量获得一个唯一的 ID 。这样我的
output looks like this:
Id,Title,FullDescription,LocationRaw,LocationNormalized,ID
1,hi,abc,def,Bristol,1
1,yo,abc,def,Bristol,1
1,was,abc,def,England,2
1,up,abc,def,India,3
1,yoh,abc,def,Nepal,4
1,home,abc,def,Bristol,1
Run Code Online (Sandbox Code Playgroud)
我是 R 的新手。我尝试过as.factor一些失败的脚本。
df <- data.table::fread("Id,Title,FullDescription,LocationRaw,LocationNormalized
1,hi,abc,def,Bristol
1,yo,abc,def,Bristol
1,was,abc,def,England
1,up,abc,def,India
1,yoh,abc,def,Nepal
1,home,abc,def,Bristol")
Run Code Online (Sandbox Code Playgroud)
library(dplyr)
df %>%
mutate(new_ID = group_indices(., LocationNormalized))
Id Title FullDescription LocationRaw LocationNormalized new_ID
1 1 hi abc def Bristol 1
2 1 yo abc def Bristol 1
3 1 was abc def England 2
4 1 up abc def India 3
5 1 yoh abc def Nepal 4
6 1 home abc def Bristol 1
Run Code Online (Sandbox Code Playgroud)