使用R进行数据重组

wax*_*tax 1 r split-apply-combine

我有一个如下所示的数据集(dat):

 Person       IPaddress
36598035    222.999.22.99
36598035    222.999.22.99
36598035    222.999.22.99
36598035    222.999.22.99
36598035    222.999.22.99
36598035    444.666.44.66
37811171    111.88.111.88
37811171    111.88.111.88
37811171    111.88.111.88
37811171    111.88.111.88
37811171    111.88.111.88
Run Code Online (Sandbox Code Playgroud)

它反映了在一段时间内登录网站的个人的实例.我需要数据看起来像这样:

Person        IPaddress      Number of Logins
36598035    222.999.22.99           6
37811171    111.88.111.88           5
Run Code Online (Sandbox Code Playgroud)

因此,对于同一个人而言,不是多个条目,每个人只有一行,并且计算他们登录的次数.

此外,您将在我的示例中注意到,人员36598035在多于1个IP地址下登录.发生这种情况时,我希望最终数据集中的IP地址反映模式IP地址 - 换句话说,个人最常登录的IP地址.

jaz*_*rro 5

这是一种方法.

library(dplyr)

mydf %>%
    group_by(Person, IPaddress) %>% # For each combination of person and IPaddress
    summarize(freq = n()) %>% # Get total number of log-in
    arrange(Person, desc(freq)) %>% # The most frequent IP address is in the 1st row for each user
    group_by(Person) %>% # For each user
    mutate(total = sum(freq)) %>% # Get total number of log-in
    select(-freq) %>% # Remove count
    do(head(.,1)) # Take the first row for each user

#    Person     IPaddress total
#1 36598035 222.999.22.99     6
#2 37811171 111.88.111.88     5
Run Code Online (Sandbox Code Playgroud)

UPDATE

dplyr0.3现在出来了.所以,你也可以做以下事情.使用时只需缩短一行count.我也用slice@aosmith推荐.

mydf %>%
    count(Person, IPaddress) %>%
    arrange(Person, desc(n)) %>%
    group_by(Person) %>%
    mutate(total = sum(n)) %>%
    select(-n) %>%
    slice(1)
Run Code Online (Sandbox Code Playgroud)