小编sta*_*oob的帖子

R 中的条件连接字符串

我在 R 中有这个数据集：

id = 1:5
col1 = c("12 ABC", "123", "AB", "123344567", "1345677.")
col2 = c("gggw", "12", "567", "abc 123", "p")
col3 = c("abw", "abi", "klo", "poy", "17df")
col4 = c("13 AB", "344", "Huh8", "98", "b")
    
my_data = data.frame(id, col1, col2, col3, col4)

 id      col1    col2 col3  col4
1  1    12 ABC    gggw  abw 13 AB
2  2       123      12  abi   344
3  3        AB     567  klo  Huh8
4  4 123344567 abc 123  poy    98
5  5  1345677.       p …

Run Code Online (Sandbox Code Playgroud)

r data-manipulation

sta*_*oob

lucky-day

7
推荐指数

2
解决办法

100
查看次数

R：最短路径计算

我正在使用 R 编程语言。

基于地理空间形状文件，我试图找出两个坐标之间的行驶距离。

首先我加载了形状文件：

library(sf)
library(rgdal)
library(sfnetworks)
library(igraph)
library(dplyr)
library(tidygraph)

# Set the URL for the shapefile
url <- "https://www12.statcan.gc.ca/census-recensement/2011/geo/RNF-FRR/files-fichiers/lrnf000r22a_e.zip"

# Create a temporary folder to download and extract the shapefile
temp_dir <- tempdir()
temp_file <- file.path(temp_dir, "lrnf000r22a_e.zip")

# Download the shapefile to the temporary folder
download.file(url, temp_file)

# Extract the shapefile from the downloaded zip file
unzip(temp_file, exdir = temp_dir)

# Read the shapefile using the rgdal package
# source: https://gis.stackexchange.com/questions/456748/strategies-for-working-with-large-shapefiles/456798#456798
a = st_read(file.path(temp_dir, "lrnf000r22a_e.shp"), query="select * from …

Run Code Online (Sandbox Code Playgroud)

sta*_*oob

2023 04-16

7
推荐指数

1
解决办法

559
查看次数

HTML/XML：了解“滚动条”的工作原理

我正在使用 R 编程语言，并尝试了解如何使用 Selenium 与网页交互。

例如，使用谷歌地图 - 我试图找到某个区域周围所有披萨店的名称、地址和经度/纬度。据我了解，这将涉及输入您感兴趣的位置，单击“附近”按钮，输入您要查找的内容（例如“披萨”），一直滚动到底部以确保所有披萨店均已加载- 然后复制所有披萨店的名称、地址和经度/纬度。

我一直在自学如何在 R 中使用 Selenium，并且能够自己解决这个问题的部分内容。这是我到目前为止所做的：

第 1 部分：搜索地址（例如美国纽约自由女神像）并返回经度/纬度：

library(RSelenium)
library(wdman)
library(netstat)

selenium()
seleium_object <- selenium(retcommand = T, check = F)


remote_driver <- rsDriver(browser = "chrome", chromever = "114.0.5735.90", verbose = F, port = free_port())

remDr<- remote_driver$client
remDr$navigate("https://www.google.com/maps")

search_box <- remDr$findElement(using = 'css selector', "#searchboxinput")
search_box$sendKeysToElement(list("Statue of Liberty", key = "enter"))

Sys.sleep(5)

url <- remDr$getCurrentUrl()[[1]]

long_lat <- gsub(".*@(-?[0-9.]+),(-?[0-9.]+),.*", "\\1,\\2", url)
long_lat <- unlist(strsplit(long_lat, ","))

> long_lat
[1] "40.7269409"  "-74.0906116"

Run Code Online (Sandbox Code Playgroud)

第 2 …

html python xml r selenium-webdriver

sta*_*oob

2023 07-20

7
推荐指数

1
解决办法

352
查看次数

R：从决策树中提取规则

我正在使用 R 编程语言。最近，我读到了一种名为“强化学习树”（RLT）的新决策树算法，据称该算法有可能使“更好”的决策树适合数据集。该库的文档可在此处找到：https ://cran.r-project.org/web/packages/RLT/RLT.pdf

我尝试使用这个库在（著名的）鸢尾花数据集上运行分类决策树：

library(RLT)
data(iris)
fit = RLT(iris[,c(1,2,3,4)], iris$Species, model = "classification", ntrees = 1)

Run Code Online (Sandbox Code Playgroud)

问题：从这里开始，是否可以从这个决策树中提取“规则”？

例如，如果您使用 CART 决策树模型：

library(rpart)
library(rpart.plot)

fit <-rpart( Species ~. , data = iris)
rpart.plot(fit)

Run Code Online (Sandbox Code Playgroud)

 rpart.rules(fit)

    Species  seto vers virg                                               
     setosa [1.00  .00  .00] when Petal.Length <  2.5                     
 versicolor [ .00  .91  .09] when Petal.Length >= 2.5 & Petal.Width <  1.8
  virginica [ .00  .02  .98] when Petal.Length >= 2.5 & Petal.Width >= 1.8

Run Code Online (Sandbox Code Playgroud)

是否可以使用 RLT 库来做到这一点？我一直在阅读这个库的文档，似乎找不到提取决策规则的直接方法。我知道这个库通常是用作随机森林（没有决策规则）的替代品 - 但我正在阅读该算法的原始论文，其中他们指定 RLT 算法适合单个决策树（通过RLT …

tree r data-manipulation prediction decision-tree

sta*_*oob

lucky-day

6
推荐指数

1
解决办法

2066
查看次数

修复图表上混乱的标题

我制作了以下 25 个网络图（为了简单起见，所有这些图都是复制品 - 实际上，它们都会有所不同）：

library(tidyverse)
library(igraph)


set.seed(123)
n=15
data = data.frame(tibble(d = paste(1:n)))

relations = data.frame(tibble(
  from = sample(data$d),
  to = lead(from, default=from[1]),
))

data$name = c("new york", "chicago", "los angeles", "orlando", "houston", "seattle", "washington", "baltimore", "atlanta", "las vegas", "oakland", "phoenix", "kansas", "miami", "newark" )

graph = graph_from_data_frame(relations, directed=T, vertices = data) 

V(graph)$color <- ifelse(data$d == relations$from[1], "red", "orange")

plot(graph, layout=layout.circle, edge.arrow.size = 0.2, main = "my_graph")

library(visNetwork)

    a = visIgraph(graph)  

m_1 = 1
m_2 = 23.6

 a = …

Run Code Online (Sandbox Code Playgroud)

r data-visualization igraph visnetwork

sta*_*oob

2022 07-04

6
推荐指数

1
解决办法

615
查看次数

R：递归平均

我正在使用 R 编程语言。我有以下数据：

library(dplyr)

my_data = data.frame(id = c(1,1,1,1,2,2,2,3,4,4,5,5,5,5,5), var_1 = sample(c(0,1), 15, replace = TRUE) , var_2 =sample(c(0,1), 15 , replace = TRUE) )

my_data = data.frame(my_data %>% group_by(id) %>% mutate(index = row_number(id)))

my_data = my_data[,c(1,4,2,3)]

Run Code Online (Sandbox Code Playgroud)

数据看起来像这样：

   id index var_1 var_2
1   1     1     0     1
2   1     2     0     0
3   1     3     1     1
4   1     4     0     1
5   2     1     1     0
6   2     2     1     1
7   2     3     0     1
8   3     1 …

Run Code Online (Sandbox Code Playgroud)

r data-manipulation dplyr

sta*_*oob

2022 06-08

6
推荐指数

2
解决办法

334
查看次数

R 中的逆向工程正则表达式？

我发现这个正则表达式代码（在 R 中）可以识别以下模式“类”：一组字符串中的“LETTER-NUMBER-LETTER NUMBER-LETTER-NUMBER”：

apply(my_string, 1, function(x) gsub('(([A-Z] ?[0-9]){3})|.', '\\1', toString(x)))

Run Code Online (Sandbox Code Playgroud)

根据上面的代码，“LETTER-NUMBER-LETTER NUMBER-LETTER-NUMBER”对应的正则表达式“类”似乎是“(([AZ] ?[0-9]){3})|”。

这引出了我的问题：

假设我在 R 中有以下输入：

input = "A1B 2C3"

Run Code Online (Sandbox Code Playgroud)

有什么方法可以确定该字符串中包含什么正则表达式“类”？例如，是否存在这样的功能？

output = some_function("A1B 2C3")

> output
'(([A-Z] ?[0-9]){3})|.'

Run Code Online (Sandbox Code Playgroud)

我发现这个网站（https://regex-generator.olafneumann.org/?sampleText=the%20follow%20abc&flags=i&selection=11%7CMultiple%20characters）似乎执行类似的任务 - 但有没有办法做到这一点在 R 中？例如（我不确定这是否正确）：

string = c("s0me strIngz123", "someString3s", "S0M3 5t1ngs")

#imaginary conversion function

regex_class = some_function(string)

# output

           string                  regex_class
1 s0me strIngz123 ^[A-Za-z0-9]+.*[a-zA-Z]+123$
2    someString3s               ^[A-Za-z0-9]+$
3     S0M3 5t1ngs  ^[A-Za-z0-9]+ [A-Za-z0-9]+$

Run Code Online (Sandbox Code Playgroud)

R 中有某种方法可以确定不同字符串的通用“正则表达式类”吗？

谢谢你！

regex r

sta*_*oob

lucky-day

6
推荐指数

1
解决办法

381
查看次数

在 R 中模拟掷骰子

我正在尝试用 R 编写以下游戏：

掷骰子，直到出现 4，然后出现 6
计算一下您观察到 4 后跟 6 需要多少次
重复前两个步骤 100 次
计算观察到 4 和 6 所需的平均次数

我尝试手动模拟如下 - 我首先使用 R 中的“runif”命令多次“掷骰子”，希望您最终会看到 4 后面跟着 6 （我不知道如何使用“直到循环”进行编码）。我重复了 100 次，并将所有这些卷放入数据框中：

roll_1 = floor(runif(100, min=1, max=6))

roll_2 = floor(runif(100, min=1, max=6))

roll_3 = floor(runif(100, min=1, max=6))

roll_4 = floor(runif(100, min=1, max=6))

roll_5 = floor(runif(100, min=1, max=6))

#etc 

roll_100 = floor(runif(100, min=1, max=6))

all_rolls = data.frame(roll_1, roll_2, roll_3, roll_4, roll_5, roll_100)

Run Code Online (Sandbox Code Playgroud)

这看起来如下：

head(all_rolls)
  roll_1 roll_2 roll_3 roll_4 roll_5 roll_100
1      4      2 …

Run Code Online (Sandbox Code Playgroud)

loops average r probability data-manipulation

sta*_*oob

2021 09-07

5
推荐指数

1
解决办法

1万
查看次数

R 错误：需要单个字符串值：[type=character; 范围=5]

我正在使用 R 编程语言。我在 R 中使用“CORELS”库，这是一个示例（CORELS 是一个类似于决策树的统计模型）：

library(corels)

logdir <- tempdir()
rules_file <- system.file("sample_data", "compas_train.out", package="corels")
labels_file <- system.file("sample_data", "compas_train.label", package="corels")
meta_file <- system.file("sample_data", "compas_train.minor", package="corels")

stopifnot(file.exists(rules_file),
          file.exists(labels_file),
          file.exists(meta_file),
          dir.exists(logdir))

corels(rules_file, labels_file, logdir, meta_file,
       verbosity_policy = "silent",
       regularization = 0.015,
       curiosity_policy = 2,   # by lower bound
       map_type = 1)       # permutation map
cat("See ", logdir, " for result file.")

Run Code Online (Sandbox Code Playgroud)

可以在此处查看输出：

OPTIMAL RULE LIST
if ({sex:Male,juvenile-crimes:>0}) then ({recidivate-within-two-years:Yes})
else if ({priors:>3}) then ({recidivate-within-two-years:Yes})
else ({recidivate-within-two-years:No})

Run Code Online (Sandbox Code Playgroud)

我的问题：我仍然对上述函数的语法如何工作感到有点困惑。例如，我尝试在“iris”数据集上使用上述函数：

data(iris)
head(iris)

  Sepal.Length Sepal.Width …

Run Code Online (Sandbox Code Playgroud)

optimization tree search r function

sta*_*oob

lucky-day

5
推荐指数

1
解决办法

1846
查看次数

R：连接地图上的点

我正在使用 R 编程语言。

使用“leaflet”库，我为这 5 个城市制作了以下地图：

library(dplyr)
library(leaflet)

map_data <- data.frame("Lat" = c(43.6426, 43.6424, 43.6544, 43.6452, 43.6629), "Long" = c(-79.3871, -79.3860, -79.3807, -79.3806,-79.3957 ), type = c(1,2,3,4,5))

map_data$type = as.factor(map_data$type)



leaflet(map_data) %>%
    addTiles() %>% addCircleMarkers(stroke = FALSE, label = ~type,fillOpacity = 0.8, labelOptions = labelOptions(direction = "center",style = list('color' = "white"),noHide = TRUE, offset=c(0,0), fill = TRUE, opacity = 1, weight = 10, textOnly = TRUE))

Run Code Online (Sandbox Code Playgroud)

在我创建的上面的地图上，我现在想根据它们的“数字”“连接”地图上（在路线中）的所有这些“点”（即城市）（例如，将 1 与 2、2 与 3 连接） , 3 与 4, 4 与 …

r data-visualization geospatial leaflet

sta*_*oob

2022 07-04

5
推荐指数

1
解决办法

910
查看次数