小编Niv*_*vel的帖子

通过计算特定字符来设置字符串

我有以下字符串:

strings <- c("ABBSDGNHNGA", "AABSDGDRY", "AGNAFG", "GGGDSRTYHG") 
Run Code Online (Sandbox Code Playgroud)

我想切断字符串,一旦A,G和N的出现次数达到一定值,比如说3.在这种情况下,结果应该是:

some_function(strings)

c("ABBSDGN", "AABSDG", "AGN", "GGG") 
Run Code Online (Sandbox Code Playgroud)

我试图用stringi,stringr和正则表达式的表达式,但我无法弄清楚.

regex r gsub stringr stringi

17
推荐指数
3
解决办法
491
查看次数

tidymodels 列中发现的新颖级别

我正在用来tidymodels创建随机福雷斯特预测。我的测试数据包含训练数据中不存在的新因子级别,这会导致错误:

1: Novel levels found in column 'Siblings': '4'. The levels have been removed, and values have been coerced to 'NA'. 
2: There are new levels in a factor: NA 
> test_predict
Fehler: Objekt 'test_predict' nicht gefunden
Run Code Online (Sandbox Code Playgroud)

我尝试在“兄弟姐妹”列中包含“step_novel和” step_dummy,但这并不能解决错误。我应该如何处理训练数据中不存在的新因素?

library(tidyverse)
library(tidymodels)

data <-
  data.frame(
    Survived = as.factor(c(0,1,1,1,0,0,0,0,0,1,1,1,0,0,0,0)),
    Siblings = as.factor(c(1,1,0,1,0,0,0,3,1,1,0,1,0,0,0,3)),
    Class = as.factor(c(0,1,0,1,0,1,0,0,0,1,0,1,0,1,0,0)),
    Embarked = as.factor(c("s","c","m","m","s","c","s","m","m","s","s","s","s","s","s","s")) 
  )

test <-
  data.frame(
    Siblings = as.factor(c(1,1,0,1,0,0,0,3,1,1,0,1,0,0,0,4)), #New factor level
    Class = as.factor(c(0,1,0,1,0,1,0,0,0,1,0,1,0,1,0,0)),
    Embarked = as.factor(c("s","c","m","m","s","c","s","m","m","s","s","s","s","s","s","s")) 
  )

#Model
rf_model …
Run Code Online (Sandbox Code Playgroud)

r tidymodels

6
推荐指数
1
解决办法
1005
查看次数

标签 统计

r ×2

gsub ×1

regex ×1

stringi ×1

stringr ×1

tidymodels ×1