我试图获得数据帧中组("a"和"b")的变量(v)的累积和.如何将结果显示在我的数据帧的列cs的底部 - 其行的编号正确 - ?
> library(nlme)
> g <- factor(c("a","b","a","b","a","b","a","b","a","b","a","b"))
> v <- c(1,4,1,4,1,4,2,8,2,8,2,8)
> cs <- rep(0,12)
> d <- data.frame(g,v,cs)
> d
g v cs
1 a 1 0
2 b 4 0
3 a 1 0
4 b 4 0
5 a 1 0
6 b 4 0
7 a 2 0
8 b 8 0
9 a 2 0
10 b 8 0
11 a 2 0
12 b 8 0
> r=gapply(d,FUN="cumsum",form=~g, which="v")
>r …Run Code Online (Sandbox Code Playgroud) 我正在通过ip范围查找国家数千万行.我正在寻找一种更快速的查找方式.
我有这种形式的180K元组:
>>> data = ((0, 16777215, 'ZZ'),
... (1000013824, 1000079359, 'CN'),
... (1000079360, 1000210431, 'JP'),
... (1000210432, 1000341503, 'JP'),
... (1000341504, 1000603647, 'IN'))
Run Code Online (Sandbox Code Playgroud)
(整数是将IP地址转换为普通数字.)
这样做的工作正确,但只需要太长时间:
>>> ip_to_lookup = 999
>>> country_result = [country
... for (from, to, country) in data
... if (ip_to_lookup >= from) and
... (ip_to_lookup <= to)][0]
>>> print country_result
ZZ
Run Code Online (Sandbox Code Playgroud)
有人能指出我正确的方向来更快地进行这种查找吗?使用上述方法,100次查找需要3秒.我想,10M行意味着需要几天时间.