我正在尝试运行迭代 for 循环来计算因子变量水平的相关性。我的数据集中有 32 支球队,每支球队都有 16 行数据。我想将年份与每个团队的积分关联起来。我可以一项一项地做到这一点,但想在循环方面做得更好。
correlate <- data %>%
select(Team, Year, Points_Game) %>%
filter(Team == "ARI") %>%
select(Year, Points_Game)
Run Code Online (Sandbox Code Playgroud)
cor(correlate)
我通过以下方式创建了一个“团队”对象:
teams <- levels(data$Team)
Run Code Online (Sandbox Code Playgroud)
使用 [i] 迭代所有 32 支球队以获得每支球队年份和积分的相关性的一点帮助将会非常有帮助!
我有一些代码可以抓取网站,但是在运行了多次抓取之后,我收到了 403 禁止错误。我知道 R 中有一个名为polite的包,它负责弄清楚如何根据主机要求运行抓取,这样就不会出现403。我尽力使其适应我的代码,但我陷入困境。非常感谢一些帮助。这是一些可重现的示例代码,其中只有一些链接:
library(tidyverse)
library(httr)
library(rvest)
library(curl)
urls = c("https://www.pro-football-reference.com/teams/pit/2021.htm", "https://www.pro-
football-reference.com/teams/pit/2020.htm", "https://www.pro-football-
reference.com/teams/pit/2019.htm")
pitt <- map_dfr(
.x = urls,
.f = function(x) {Sys.sleep(2); cat(1);
read_html(
curl(x, handle = curl::new_handle("useragent" = "chrome"))) %>%
html_nodes("table") %>%
html_table(header = TRUE) %>%
simplify() %>%
.[[2]] %>%
janitor::row_to_names(row_number = 1) %>%
janitor::clean_names(.) %>%
select(week, day, date, result = x_2, record = rec, opponent = opp, team_score = tm, opponent_score = opp_2) %>%
mutate(year = str_extract(string = x, pattern = "\\d{4}"))
}
) …Run Code Online (Sandbox Code Playgroud)