小编Ale*_*xis的帖子

R上的三个感叹号

我一直在读一本关于特征工程的书，一段代码有一个我不明白的三重感叹号：

vc_pred <- 
  recipe(Stroke ~ ., data = stroke_train %>% dplyr::select(Stroke, !!!VC_preds)) %>% 
  step_YeoJohnson(all_predictors()) %>% 
  prep(stroke_train %>% dplyr::select(Stroke, !!!VC_preds)) %>% 
  juice() %>% 
  gather(Predictor, value, -Stroke)

Run Code Online (Sandbox Code Playgroud)

VC_preds 是一个包含连续预测变量名称的向量。我理解除!!!标记外的所有代码。一个!应该是一个否定，但它是什么意思!!!？

提供的任何帮助将不胜感激。谢谢你。

问候，

亚历克西斯

Ale*_*xis

lucky-day

7
推荐指数

1
解决办法

1768
查看次数

如何在Rmarkdown中从R网状调用Python函数

我有这个 Rmarkdown，带有一个 python 函数：

---
title: "An hybrid experiment"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
runtime: shiny
---

    ```{r setup, include=FALSE}
    library(flexdashboard)
    library(reticulate)
    ```

    ```{r}
    selectInput("selector",label = "Selector",
      choices = list("1" = 1, "2" = 2, "3" = 3),
      selected = 1)
    ```

    ```{python}
    def addTwo(number):
      return number + 2
    ```

Run Code Online (Sandbox Code Playgroud)

我尝试addTwo在响应式上下文中使用该函数，所以我尝试了这个：

    ```{r}
    renderText({
      the_number <- py$addTwo(input$selector)
      paste0("The text is: ",the_number)
    })
    ```

Run Code Online (Sandbox Code Playgroud)

但我收到了这个错误：

TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'

Detailed traceback:
  File "<string>", line …

Run Code Online (Sandbox Code Playgroud)

r r-markdown shiny reticulate

Ale*_*xis

2021 06-13

6
推荐指数

1
解决办法

165
查看次数

Shiny Flexdashboard R 中的 Bootswatch 主题

我从 bootswatch ( https://bootswatch.com )下载了一个 css并将文件 (bootstrap.css) 保存在我的 flexdashboard 文件所在的位置。所以我尝试使用以下代码加载 css：

---
title: "Untitled"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    css: bootstrap.css
---

```{r setup, include=FALSE}
library(flexdashboard)
```

Column {data-width=650}
-----------------------------------------------------------------------

### Chart A

```{r}

```

Run Code Online (Sandbox Code Playgroud)

但是css没有加载。我想使用 Bootswatch 的“Mint”主题。请问你知道这个问题的解决方案吗？提供的任何帮助将不胜感激。

css r shiny flexdashboard

Ale*_*xis

2020 07-17

5
推荐指数

1
解决办法

1591
查看次数

python中拉丁字符的特殊文本

我有以下熊猫数据框：

the_df = pd.DataFrame({'id':[1,2],'name':['Joe','']})
the_df
    id  name
0   1   Joe
1   2

Run Code Online (Sandbox Code Playgroud)

如您所见，我们可以将第二个名字读为“Sarah”，但它是用特殊字符编写的。

我想创建一个新列，将这些字符转换为拉丁字符。我试过这种方法：

the_df['latin_name'] = the_df['name'].str.extract(r'(^[a-zA-Z\s]*)')
the_df
    id  name    latin_name
0   1   Joe     Joe
1   2

Run Code Online (Sandbox Code Playgroud)

但它不识别字母。请，对此的任何帮助将不胜感激。

python pandas

Ale*_*xis

2021 08-06

5
推荐指数

1
解决办法

84
查看次数

pandas 中两组的箱线图

我有以下数据集：

df_plots = pd.DataFrame({'Group':['A','A','A','A','A','A','B','B','B','B','B','B'],
                         'Type':['X','X','X','Y','Y','Y','X','X','X','Y','Y','Y'],
                         'Value':[1,1.2,1.4,1.3,1.8,1.5,15,19,18,17,12,13]})
df_plots
    Group   Type    Value
0   A       X       1.0
1   A       X       1.2
2   A       X       1.4
3   A       Y       1.3
4   A       Y       1.8
5   A       Y       1.5
6   B       X       15.0
7   B       X       19.0
8   B       X       18.0
9   B       Y       17.0
10  B       Y       12.0
11  B       Y       13.0

Run Code Online (Sandbox Code Playgroud)

我想Group在每个图中创建箱线图（示例中有两个）并按类型显示。我已经尝试过这个：

fig, axs = plt.subplots(1,2,figsize=(8,6), sharey=False)
axs = axs.flatten()

for i, g in enumerate(df_plots[['Group','Type','Value']].groupby(['Group','Type'])):
    g[1].boxplot(ax=axs[i])

Run Code Online (Sandbox Code Playgroud)

结果为IndexError，因为循环尝试创建 4 个绘图。 …

python matplotlib boxplot pandas

Ale*_*xis

2021 11-06

4
推荐指数

1
解决办法

5425
查看次数

无法在 R 中读取 shp 文件

我尝试使用以下代码在 Mac 上打开一个 shp 文件：

library(tidyverse)
library(sf)
library(rgeos)
sf_trees_raw <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-28/sf_trees.csv')
temp_shapefile <- tempfile()
download.file("https://www2.census.gov/geo/tiger/TIGER2017//ROADS/tl_2017_06075_roads.zip", temp_shapefile)
sf_roads <- unzip(temp_shapefile, "tl_2017_06075_roads.shp") %>%
  read_sf()

Run Code Online (Sandbox Code Playgroud)

但我收到此错误消息：

Error: Cannot open "/Users/name/Documents/Playground/Trees_SF/tl_2017_06075_roads.shp"; The source could be corrupt or not supported. See `st_drivers()` for a list of supported formats.

Run Code Online (Sandbox Code Playgroud)

我尝试了其他 shp 文件，但收到相同的错误消息：

map <- read_sf("per_admbnda_adm1_2018.shp")
Error: Cannot open "/Users/name/Documents/Playground/Trees_SF/per_admbnda_adm1_2018.shp"; The source could be corrupt or not supported. See `st_drivers()` for a list of supported formats.

Run Code Online (Sandbox Code Playgroud)

我尝试复制 shx 和 dbf 文件，但没有解决问题。

任何帮助将不胜感激。

r r-sf

Ale*_*xis

2020 04-18

3
推荐指数

1
解决办法

2940
查看次数

ivot_longer() 和ivot_wider() 是否具有传递性？

我想对虹膜数据集使用函数pivot_longer()和pivot_wider()。这是延长数据的代码：

iris_ds <- iris %>% pivot_longer(-Species, names_to = "Measure", values_to = "Value")

Run Code Online (Sandbox Code Playgroud)

在文档中它说pivot_wider()是pivot_longer()的逆变换，所以我应用代码：

iris_or <- iris_ds %>% pivot_wider(names_from = "Measure", values_from = "Value")

Run Code Online (Sandbox Code Playgroud)

我得到下表：

    Species    Sepal.Length      Sepal.Width    Petal.Length    Petal.Width
    setosa     <dbl>             <dbl>          <dbl>           <dbl>
    versicolor <dbl>             <dbl>          <dbl>           <dbl>
    virginica  <dbl>             <dbl>          <dbl>           <dbl>

Run Code Online (Sandbox Code Playgroud)

这在 Gather() spread() 类似问题中得到了回答（建议使用 RowId），我想要的帮助是新函数pivot_longer和pivot_wider是否有一种方法来管理它以使其具有传递性。预先感谢您的答复。

r tidyr

Ale*_*xis

2019 09-17

2
推荐指数

1
解决办法

532
查看次数

使用 gt 表 - R 绘制每行的直方图

我想创建一个 gt 表，在其中可以看到一些指标，例如观察次数、平均值和中位数，并且我想要一个带有直方图的列。对于这个问题，我将使用 iris 数据集。

我最近学会了如何使用以下代码将绘图放入小标题中：

library(dplyr)
library(tidyr)
library(purrr)
library(gt)
my_tibble <- iris %>%
  pivot_longer(-Species, 
               names_to = "Vars", 
               values_to = "Values") %>%
  group_by(Vars) %>%
  summarise(obs = n(),
            mean = round(mean(Values),2),
            median = round(median(Values),2), 
            plots = list(ggplot(cur_data(), aes(Values)) + geom_histogram()))

Run Code Online (Sandbox Code Playgroud)

现在我想使用绘图列来绘制每个变量的直方图，所以我尝试了以下方法：

my_tibble %>%
  mutate(ggplot = NA) %>%
  gt() %>%
  text_transform(
    locations = cells_body(vars(ggplot)),
    fn = function(x) {
      map(.$plots,ggplot_image)
    }
  )

Run Code Online (Sandbox Code Playgroud)

但它返回给我一个错误：

Error in body[[col]][stub_df$rownum_i %in% loc$rows] <- fn(body[[col]][stub_df$rownum_i %in%  : 
  replacement has length zero

Run Code Online (Sandbox Code Playgroud)

gt 表应该是这样的：

任何帮助将不胜感激。

r ggplot2 dplyr purrr gt

Ale*_*xis

2021 10-12

2
推荐指数

1
解决办法

1154
查看次数

根据 R 中的变量名称规则创建新变量

我想根据以下规则创建新变量：

Var 不以“sym”开头
Var 不以“pct”结尾

新的 var 是之前的 var 添加了“_ln”字符串。这是数据集（我的真实数据集有 184 个变量，这就是我想要一个函数的原因）

library(dplyr)
library(tidyr)

df <- data.frame(kg_chicken = c(1,2,3,4,5,6),
                 kg_chicken_pct = c(0.1,0.2,0.3,0.4,0.5,0.6),
                 sym_kg_chicken = c(-0.25,-0.15,-0.05,0.05,0.15,0.25))
df
  kg_chicken kg_chicken_pct sym_kg_chicken
1          1            0.1          -0.25
2          2            0.2          -0.15
3          3            0.3          -0.05
4          4            0.4           0.05
5          5            0.5           0.15
6          6            0.6           0.25

Run Code Online (Sandbox Code Playgroud)

这是我尝试过的：

df_final <- df %>%
  mutate_if(!starts_with("sym") & !ends_with("pct"),~ paste0(.,"_ln") = log(.))

Run Code Online (Sandbox Code Playgroud)

但我收到这个错误：

Error: unexpected '=' in:
"df_final <- df %>%
  mutate_if(!starts_with("sym") & !ends_with("pct"),~ paste0(.,"_ln") ="

Run Code Online (Sandbox Code Playgroud)

这是我的预期结果：

df_final …

Run Code Online (Sandbox Code Playgroud)

r dplyr tidyr

Ale*_*xis

lucky-day

1
推荐指数

1
解决办法

222
查看次数

错误信息：max 没有非缺失的参数；返回 -Inf

我试图确定密度图的半高处的宽度，我在上一篇文章中找到了以下代码：

  d <- ggplot(A0, aes(DIAMETER)) +
  geom_density()

xmax <- d$x[d$y==max(d$y, na.rm = TRUE)]
x1 <- d$x[d$x < xmax][which.min(abs(d$y[d$x < xmax]-max(d$y)/2))]
x2 <- d$x[d$x > xmax][which.min(abs(d$y[d$x > xmax]-max(d$y)/2))]
FWHM <- x2-x1

Run Code Online (Sandbox Code Playgroud)

当我执行它时，虽然我收到以下与函数 max() 相关的错误消息

Warning message:
In max(d$y, na.rm = TRUE) : no non-missing arguments to max; returning -Inf

Run Code Online (Sandbox Code Playgroud)

我环顾四周，发现这可能是由于我的数据集中存在 NA 值，但事实并非如此（下面的数据框结构）..有人知道我如何解决这个问题吗？提前致谢！

str(A0)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   387 obs. of  3 variables:
 $ SAMPLE  : Factor w/ 5 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 …

Run Code Online (Sandbox Code Playgroud)

r ggplot2

Viv*_*vvi

2020 05-25

0
推荐指数

1
解决办法

1642
查看次数