有没有办法使用rpart.plot
库来绘制不属于的对象rpart
(用于制作决策树)?
例如,以下是正在运行的经典rpart
库rpart.plot
:
#load libraries
library(rpart)
library(rpart.plot)
#load data
data(iris)
#fit rpart model (i.e. decision tree)
r = rpart(Species ~., data=iris)
#plot model
rpart.plot(r)
Run Code Online (Sandbox Code Playgroud)
问题:我正在研究一个多类分类问题(如上面的示例 - 我只是用著名的“iris 数据集”说明了这一点),其中“rpart”运行时间太长(我等了 10 个小时,“rpart”代码仍然没有运行)。
然而,我在 R 中发现了另一个名为“c50”的库,它能够立即创建类似的模型:
#load library
library(C50)
#run same model
tree_mod <- C5.0(x = iris[, -5], y = iris$Species, rules = TRUE)
#view model
summary(tree_mod)
plot(tree_mod)
Run Code Online (Sandbox Code Playgroud)
问题:是否可以将“rpart.plot”库与“C50”库中的对象一起使用?
例如:
#my attempt
rpart.plot(tree_mod)
Error in rpart.plot(tree_mod) : Not an rpart object
Run Code Online (Sandbox Code Playgroud)
我的想法: 可以使用“C50”库提取不同的规则:
summary(tree_mod) …
Run Code Online (Sandbox Code Playgroud) 我正在尝试用 R 编写以下游戏:
我尝试手动模拟如下 - 我首先使用 R 中的“runif”命令多次“掷骰子”,希望您最终会看到 4 后面跟着 6 (我不知道如何使用“直到循环”进行编码)。我重复了 100 次,并将所有这些卷放入数据框中:
roll_1 = floor(runif(100, min=1, max=6))
roll_2 = floor(runif(100, min=1, max=6))
roll_3 = floor(runif(100, min=1, max=6))
roll_4 = floor(runif(100, min=1, max=6))
roll_5 = floor(runif(100, min=1, max=6))
#etc
roll_100 = floor(runif(100, min=1, max=6))
all_rolls = data.frame(roll_1, roll_2, roll_3, roll_4, roll_5, roll_100)
Run Code Online (Sandbox Code Playgroud)
这看起来如下:
head(all_rolls)
roll_1 roll_2 roll_3 roll_4 roll_5 roll_100
1 4 2 …
Run Code Online (Sandbox Code Playgroud) 我正在使用 R 编程语言。我在 R 中使用“CORELS”库,这是一个示例(CORELS 是一个类似于决策树的统计模型):
library(corels)
logdir <- tempdir()
rules_file <- system.file("sample_data", "compas_train.out", package="corels")
labels_file <- system.file("sample_data", "compas_train.label", package="corels")
meta_file <- system.file("sample_data", "compas_train.minor", package="corels")
stopifnot(file.exists(rules_file),
file.exists(labels_file),
file.exists(meta_file),
dir.exists(logdir))
corels(rules_file, labels_file, logdir, meta_file,
verbosity_policy = "silent",
regularization = 0.015,
curiosity_policy = 2, # by lower bound
map_type = 1) # permutation map
cat("See ", logdir, " for result file.")
Run Code Online (Sandbox Code Playgroud)
可以在此处查看输出:
OPTIMAL RULE LIST
if ({sex:Male,juvenile-crimes:>0}) then ({recidivate-within-two-years:Yes})
else if ({priors:>3}) then ({recidivate-within-two-years:Yes})
else ({recidivate-within-two-years:No})
Run Code Online (Sandbox Code Playgroud)
我的问题:我仍然对上述函数的语法如何工作感到有点困惑。例如,我尝试在“iris”数据集上使用上述函数:
data(iris)
head(iris)
Sepal.Length Sepal.Width …
Run Code Online (Sandbox Code Playgroud) 在使用 R 工作时,我进行了以下非正式观察:
我注意到,与类似大小的 CSV 文件相比,我能够更快地导入 RDS 文件。
例如,假设我的计算机上有一个 CSV 文件。如果我将此 CSV 文件导入到 R 中,请使用“saveRDS”命令将此文件另存为 RDS,然后使用“readRDS”命令重新导入同一文件 - 导入 RDS 版本似乎需要更少的时间该文件的 CSV 版本与同一文件的 CSV 版本进行比较。
例如:
步骤 1我为我的假设创建一个文件
# create file (i.e. imagine this file currently exists on the computer in CSV format)
test_file = data.frame(col1 = sample.int(100, 1000000, replace = TRUE), col2 = sample.int(100, 1000000, replace = TRUE), col3 = sample.int(100, 1000000, replace = TRUE), col4 = sample.int(100, 1000000, replace = TRUE), col5 = sample.int(100, 1000000, replace = TRUE), …
Run Code Online (Sandbox Code Playgroud) 我在这里找到了这篇 stackoverflow 帖子(在浏览器中突出显示/查找散点图中的数据点),其中展示了如何为散点图中的散点图创建搜索栏。
我尝试运行此示例中提供的代码:
library(plotly)
library(htmlwidgets)
library(htmltools)
pcaCars <- princomp(mtcars, cor = TRUE)
carsHC <- hclust(dist(pcaCars$scores), method = "ward.D2")
carsDf <- data.frame(pcaCars$scores, "cluster" = factor(carsClusters))
carsClusters <- cutree(carsHC, k = 3)
carsDf <- transform(carsDf, cluster_name = paste("Cluster", carsClusters))
p <- plot_ly(carsDf, x = ~Comp.1 , y = ~Comp.2, text = rownames(carsDf),
mode = "markers", color = ~cluster_name, marker = list(size = 11), type = 'scatter', mode = 'markers')
p <- htmlwidgets::appendContent(p, htmltools::tags$input(id='inputText', value='Merc', ''), htmltools::tags$button(id='buttonSearch', 'Search'))
p <- …
Run Code Online (Sandbox Code Playgroud) 我的数据格式如下:
my_data = structure(list(name = c("john", "john", "john", "john", "john",
"alex", "alex", "alex", "alex", "alex", "tim", "tim", "tim",
"tim", "tim", "jason", "jason", "jason", "jason", "jason", "peter",
"peter", "peter", "peter", "peter", "nancy", "nancy", "nancy",
"nancy", "sarah", "sarah", "sarah", "sarah", "luke", "luke",
"luke", "luke", "steve", "steve", "steve", "steve", "matt", "matt",
"matt", "mark", "mark", "mark", "jim", "jim", "jim", "derek",
"derek", "lucy", "lucy", "anne", "kevin", "dave"), year = c(2010,
2011, 2013, 2014, 2019, 2010, 2011, 2012, 2013, 2014, 2008, 2010,
2014, 2018, 2019, 2005, …
Run Code Online (Sandbox Code Playgroud) 我正在使用 R 编程语言。
我有以下图形网络数据:
library(igraph)
library(visNetwork)
from <- c("Boss", "TeamA", "TeamA", "TeamA", "SubteamA1", "SubteamA1", "SubteamA1", "SubteamA2", "SubteamA2", "SubteamA2", "SubteamA3", "SubteamA3", "SubteamA3")
to <- c("TeamA", "SubteamA1", "SubteamA2", "SubteamA3", "employee1", "employee2", "employee3", "employee4", "employee5", "employee6", "employee7", "employee8", "employee9")
a1 = data_frame <- data.frame(from, to)
from <- c("Boss", "TeamB", "TeamB", "TeamB", "SubteamB1", "SubteamB1", "SubteamB1", "SubteamB2", "SubteamB2", "SubteamB2", "SubteamB3", "SubteamB3", "SubteamB3")
to <- c("TeamB", "SubteamB1", "SubteamB2", "SubteamB3", "employee10", "employee11", "employee12", "employee13", "employee14", "employee15", "employee16", "employee17", "employee18")
a2 = data_frame <- data.frame(from, to)
final = …
Run Code Online (Sandbox Code Playgroud) 我制作了以下地图:
library(sf)
library(leaflet)
library(leafgl)
library(colourvalues)
library(leaflet.extras)
nc <- st_read(system.file("gpkg/nc.gpkg", package="sf"), quiet = TRUE) %>%
st_transform(st_crs(4326)) %>%
st_cast('POLYGON')
leaflet(data = nc) %>% addPolygons( stroke = FALSE) %>% addTiles(group = "OSM") %>% addProviderTiles(provider = providers$OpenStreetMap) %>% addPolygons(data = nc, weight=1, popup = ~NAME,
label = ~NAME, group = "name", col = 'blue') %>%
addSearchFeatures(targetGroups = 'name', options = searchFeaturesOptions(zoom=10, openPopup=TRUE))
Run Code Online (Sandbox Code Playgroud)
我想给多边形涂上不同的颜色,以便它们更容易看到 - 我通过随机为多边形分配颜色来做到这一点:
nc$color <- sample(c("red", "blue", "green", "yellow", "purple"), nrow(nc), replace = TRUE)
leaflet(data = nc) %>%
addTiles(group = "OSM") %>% …
Run Code Online (Sandbox Code Playgroud) 我正在使用 R 和 Python 语言。
假设我在 Google 地图上搜索以下加拿大邮政编码 (M5V 3L9):
当我搜索此内容时,我可以看到该邮政编码的“周长”以红色突出显示:
我的问题:( 通过 R/Python 使用 Selenium)从 HTML/CSS/XML 角度来看 - 我试图获取构成该周界边界的所有坐标的列表。
我一直在尝试探索从该网站生成的源代码,看看是否可以做一些事情来查看该周边的源代码(例如 JSON 格式)的存储位置 - 但到目前为止,我可以没有找到任何东西:
我希望也许有一些东西可以让我使用 Selenium 反复单击此周边并提取经度/纬度点 - 但到目前为止,我找不到任何东西。
有人可以告诉我该怎么做吗?
谢谢!
注意:通用 Selenium 代码:
library(RSelenium)
library(wdman)
library(netstat)
selenium()
seleium_object <- selenium(retcommand = T, check = F)
remote_driver <- rsDriver(browser = "chrome", chromever = "114.0.5735.90", verbose = F, port = free_port())
remDr<- remote_driver$client
remDr$navigate("https://www.google.com/maps/place/Toronto,+ON+M5V+3L9/@43.642566,-79.3875851,18z/data=!4m6!3m5!1s0x882b34d436f9c825:0x9e9c6195e38030f2!8m2!3d43.6429129!4d-79.3853443!16s%2Fg%2F1tvq4rqd?entry=ttu")
Run Code Online (Sandbox Code Playgroud) 我在 R 中有这些数据:
string_1 = c("newyork 123", "california 123", "washington 123")
string_2 = c("123 red", "123 blue", "123 green")
my_data = data.frame(string_1, string_2)
Run Code Online (Sandbox Code Playgroud)
我想string_2
从中“减去” string_1
。结果看起来像这样:
"newyork", "california", "washington"
Run Code Online (Sandbox Code Playgroud)
我尝试这样做:
library(tidyverse)
# did not work as planned
> str_remove(string_1, "string_2")
[1] "newyork 123" "california 123" "washington 123"
Run Code Online (Sandbox Code Playgroud)
但这并不是执行“完整”减法。
谢谢你!