小编kma*_*gyo的帖子

按组计算连续行中的值之间的差异

这是我的df(data.frame):

group value
1     10
1     20
1     25
2     5
2     10
2     15

Run Code Online (Sandbox Code Playgroud)

我需要按组计算连续行中值之间的差异.

所以,我需要一个结果.

group value diff
1     10    NA # because there is a no previous value
1     20    10 # value[2] - value[1]
1     25    5  # value[3] value[2]
2     5     NA # because group is changed
2     10    5  # value[5] - value[4]
2     15    5  # value[6] - value[5]

Run Code Online (Sandbox Code Playgroud)

虽然,我可以通过使用来处理这个问题ddply,但需要花费太多时间.这是因为我的团队中有很多团体df.(我的超过1,000,000个团体df)

有没有其他有效的方法来处理这个问题？

kma*_*gyo

2018 06-14

53
推荐指数

2
解决办法

4万
查看次数

在R中更改rCharts sankey图中节点的颜色

我使用rCharts制作了sankey图表.这是我的代码示例.数据基于此URL(http://timelyportfolio.github.io/rCharts_d3_sankey/example_build_network_sankey.html)

library(devtools)
library(rjson)
library(igraph)

devtools::install_github("ramnathv/rCharts")

library(rCharts)

 g2 <- graph.tree(40, children=4)
 V(g2)$weight = 0
 V(g2)[degree(g2,mode="out")==0]$weight <- runif(n=length(V(g2)[degree(g2,mode="out")==0]),min=0,max=100)
 E(g2)[to(V(g2)$weight>0)]$weight <- V(g2)[V(g2)$weight>0]$weight

while(max(is.na(E(g2)$weight))) {
  df <- get.data.frame(g2)
  for (i in 1:nrow(df)) {
    x = df[i,]
    if(max(df$from==x$to)) {
      E(g2)[from(x$from) & to(x$to)]$weight = sum(E(g2)[from(x$to)]$weight)
    }
  }
}

edgelistWeight <- get.data.frame(g2)
colnames(edgelistWeight) <- c("source","target","value")
edgelistWeight$source <- as.character(edgelistWeight$source)
edgelistWeight$target <- as.character(edgelistWeight$target)

sankeyPlot2 <- rCharts$new()
sankeyPlot2$setLib('http://timelyportfolio.github.io/rCharts_d3_sankey')
sankeyPlot2$set(
     data = edgelistWeight,
     nodeWidth = 15,
     nodePadding = 10,
     layout = 32,
     width = 960,
     height = 500
 ) …

Run Code Online (Sandbox Code Playgroud)

r colors rcharts sankey-diagram

kma*_*gyo

2015 02-14

6
推荐指数

1
解决办法

2543
查看次数

Rjags错误消息:尺寸不匹配

我正在尝试研究基于"做贝叶斯数据分析:R,JAGS和斯坦(2015)的教程"一书中的贝叶斯分析.

在本书中,有一些例子.所以,我试图在R中复制这个例子.但是,在这个例子中我收到了一条错误信息.

具体而言,这是示例数据.

data
   y        s
1  1 Reginald
2  0 Reginald
3  1 Reginald
4  1 Reginald
5  1 Reginald
6  1 Reginald
7  1 Reginald
8  0 Reginald
9  0     Tony
10 0     Tony
11 1     Tony
12 0     Tony
13 0     Tony
14 1     Tony
15 0     Tony

y<-data$y
s<-as.numeric(data$s)
Ntotal=length(y)
Nsubj=length(unique(s))

dataList=list(y=y, s=s, Ntotal=Ntotal, Nsubj=Nsubj)

Run Code Online (Sandbox Code Playgroud)

另外,这是我的模特.

modelString=" 
model{
  for(i in 1:Ntotal){
    y[i] ~ dbern(theta[s[i]])
  }
  for(s in 1:Nsubj){
    theta[s] ~ dbeta(2,2)
  }
}
"
writeLines(modelString, con="TEMPmodel.txt") …

Run Code Online (Sandbox Code Playgroud)

r bayesian jags

kma*_*gyo

lucky-day

6
推荐指数

1
解决办法

2071
查看次数

将wordss(字符)与R中的参考值匹配

这是我的数据(A).

    keyword
[1] shoes
[2] childrenshoes
[3] nikeshoes
[4] sportsshiirts
[5] nikeshirts
[6] shirts
...

Run Code Online (Sandbox Code Playgroud)

另外,这是另一个数据(B).它是参考数据.

   keyword  value
[1] shoes    1
[2] shirts   2
...

Run Code Online (Sandbox Code Playgroud)

我需要匹配这个数据集.

所以,我想要那个结果.

    keyword        vlaue
[1] shoes          1
[2] childrenshoes  1     (because, this keyword include the 'shoes')
[3] nikeshoes      1     (because, this keyword include the 'shoes')
[4] sportsshiirts  2     (because, this keyword include the 'shirts')
[5] nikeshirts     2     (because, this keyword include the 'shirts')
[6] shirts         2
...

Run Code Online (Sandbox Code Playgroud)

如果我使用'merge',我的colud与这个数据集不匹配.这是因为数据(B)中的关键字与数据(A)中的数据不完全匹配.

我可以使用regexpr()或gregexpr()逐个处理.但是,我在数据中有很多参考(B)

那么,我该如何处理这个问题呢？

r character matching

kma*_*gyo

lucky-day

2
推荐指数

1
解决办法

86
查看次数

如何使用Rselenium选择下拉框？

我对英超联赛的数据感兴趣。因此，我尝试从此官方网站获取数据https://www.premierleague.com/stats/top/players/total_pass

我正在使用R和RSelenium包。

library(rvest)
library(httr)
library(RSelenium)

remDr <- remoteDriver(port = 4445L)
remDr$open()
remDr$navigate('https://www.premierleague.com/stats/top/players/total_pass')
getsource <-remDr$getPageSource()
name<- read_html(getsource[[1]]) %>% html_nodes("strong") %>% html_text()

Run Code Online (Sandbox Code Playgroud)

但是我遇到了一些问题。有一些类别的数据，例如季节，位置，俱乐部等。

因此，我认为我可以基于这些类别获取数据。但是我不知道如何在此站点中使用Rselenium在下拉框中选择特定的内容。

我认为，filenElement和clickElement对于这个实用的功能。但是，我不知道如何处理这些功能以选择特定条件，例如2016/17赛季和守门员位置。

请给我一个建议。

selenium r

kma*_*gyo

lucky-day

1
推荐指数

1
解决办法

1923
查看次数