标签: stringi

从字符串和文本数据中提取年份

我需要从具有这些性质的向量中提取开始年和结束年。

 yr<- c("June 2013 – Present (2 years 9 months)", "January 2012 – June 2013 (1 year 6 months)","2006 – Present (10 years)","2002 – 2006 (4 years)")


 yr
 June 2013 – Present (2 years 9 months)
 January 2012 – June 2013 (1 year 6 months)
 2006 – Present (10 years)
 2002 – 2006 (4 years)

Run Code Online (Sandbox Code Playgroud)

我期望这样的输出。有人有建议吗？

 start_yr       end_yr

2013            2016
2012            2013
2006            2016
2002            2006

Run Code Online (Sandbox Code Playgroud)

regex r lubridate stringi

use*_*187

lucky-day

4
推荐指数

1
解决办法

1089
查看次数

使用filter()和str_detect()按多个模式过滤

我想使用filter()和str_detect()匹配多个模式来过滤数据帧,而不需要多个str_detect()函数调用.在下面的示例中,我想过滤数据框df以仅显示包含字母a f和的行o.

df <- data.frame(numbers = 1:52, letters = letters)
df %>%
    filter(
        str_detect(.$letters, "a")|
        str_detect(.$letters, "f")| 
        str_detect(.$letters, "o")
    )
#  numbers letters
#1       1       a
#2       6       f
#3      15       o
#4      27       a
#5      32       f
#6      41       o

Run Code Online (Sandbox Code Playgroud)

我尝试了以下方法

df %>%
    filter(
        str_detect(.$letters, c("a", "f", "o"))
     )
#  numbers letters
#1       1       a
#2      15       o
#3      32       f

Run Code Online (Sandbox Code Playgroud)

并收到以下错误

警告消息:在stri_detect_regex中(字符串,模式,opts_regex = opts(模式)):较长的对象长度不是较短对象长度的倍数

r stringr stringi tidyverse

use*_*411

lucky-day

4
推荐指数

1
解决办法

7606
查看次数

R Studio安装stringi失败

我正在尝试调用knit,它告诉我它需要一个rmarkdown需要包的更新版本stringi.

安装时stringi我收到以下错误:

> install.packages("stringi")

Installing package into ‘C:/Users/matan/Documents/R/win-library/3.4’
(as ‘lib’ is unspecified)

  There is a binary version available but the source version is later:
        binary source needs_compilation
stringi  1.1.5  1.1.6              TRUE

  Binaries will be installed
trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/stringi_1.1.5.zip'
Warning in install.packages :
  cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/stringi_1.1.5.zip': HTTP status was '404 Not Found'
Error in download.file(url, destfile, method, mode = "wb", ...) : 
  cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/stringi_1.1.5.zip'
Warning in install.packages :
  download of …

Run Code Online (Sandbox Code Playgroud)

r rstudio r-markdown stringi

use*_*376

2017 11-19

4
推荐指数

1
解决办法

3935
查看次数

从 stringi 字符集中的字母数字 id 中排除“I”和“O”

我从Generate unique alphanumeric IDs中了解到，我可以使用stringi并stri_rand_strings生成唯一的字母数字 ID。我试图找出一种有效的方法来做到这一点，但只包括数字 0-9 和所有字母，但“I”和“O”除外。我似乎无法弄清楚如何将其包含在模式中c( LETTERS[c(1:8,10:14,16:26)],"[0-9]")

stri_rand_strings(25, 6)

Run Code Online (Sandbox Code Playgroud)

regex r character-class stringi

Mat*_*ewR

2019 12-13

4
推荐指数

1
解决办法

92
查看次数

如何删除R中没有大写字母的单词？

我在做使用R.文本分析有没有办法删除所有的话不是在盖使用tm或stringi？

如果我有这样的事情

Albert Einstein went to the store and saw his friend Nikola Tesla ... + 200 pags

Run Code Online (Sandbox Code Playgroud)

被转换成

Albert Einstein Nikola Tesla

Run Code Online (Sandbox Code Playgroud)

最好的祝福

r tm stringi

pac*_*ese

lucky-day

3
推荐指数

2
解决办法

521
查看次数

传送带问题-无法安装R包字符串

最近，使用Appveyor进行构建不再起作用。在实际构建程序之前，它会失败，因为无法安装软件包stringi。

在本地，一切正常，但是我需要一个针对Appveyor的解决方法。有没有人解决此问题的解决方案？

这是在Appveyor上的错误消息：

安装源程序包'stringi'... **程序包'stringi'成功解压并检查了MD5总数** libs * arch-i386 c：/ Rtools / mingw_32 / bin / g ++ -std = gnu ++ 11 -I“ c： / R / include” -DNDEBUG -I。-Iicu61 / -Iicu61 / unicode -Iicu61 / common -Iicu61 / i18n -DU_STATIC_IMPLEMENTATION -DU_COMMON_IMPLEMENTATION -DU_I18N_IMPLEMENTATION -DUCONFIG_USE_LOCAL -DU_TOOLUTIL_IMPLEMENTATION -DNDEBUG-DWINVER = 0_0_0_2_STR_L_DWIN_ = 0_0_2600 .cpp -o stri_ICU_settings.o / bin / sh：c：/ Rtools / mingw_32 / bin / g ++：没有这样的文件或目录：* [stri_ICU_settings.o]错误127错误：软件包'stringi'的编译失败

删除R CMD INSTALL中的'c：/ RLibrary / stringi'ip（...）错误：（从警告转换）安装软件包'stringi'的退出状态为非零调用：... with_rprofile_user-> with_envvar->强制->强制-> ip执行中止命令已退出，代码为1

另请参阅： …

r stringi appveyor r-package

sta*_*007

lucky-day

3
推荐指数

1
解决办法

133
查看次数

提取特殊字符“/”之间的倒数第二个单词

我想提取“/”符号后的倒数第二个字符串。例如，

url<- c('https://example.com/names/ani/digitalcod-org','https://example.com/names/bmc/ambulancecod.org' )
df<- data.frame (url)

Run Code Online (Sandbox Code Playgroud)

我想从两者之间的最后一个单词中提取第二个单词 // 并希望获取单词“ani”和“bmc”

所以，我尝试了这个

 library(stringr)
 df$name<- word(df$url,-2)

Run Code Online (Sandbox Code Playgroud)

我需要输出如下：

name 
ani
bmc

Run Code Online (Sandbox Code Playgroud)

regex r stringr stringi

use*_*187

lucky-day

3
推荐指数

1
解决办法

1211
查看次数

使用 stringi 在 R 中生成唯一的随机字符串

我有数据，其中每一行都是一个人。我想制作一个随机生成的唯一ID，这样我就可以在分析中识别它们。

这是一个示例数据框

df <- data.frame(
  gender = rep(c("M", "F", "M", "M", "F"), 1000),
  qtr = sample(c(1:99), 50000, replace = T),
  result = sample(c(100:1000), 50000, replace = T)
)

Run Code Online (Sandbox Code Playgroud)

为了生成唯一的 ID，我使用 stringi

library(stringi)
library(magrittr)
library(tidyr)

df <- df %>%
  mutate(UniqueID = do.call(paste0, Map(stri_rand_strings, n=50000, length=c(2, 6),
                                        pattern = c('[A-Z]', '[0-9]'))))

Run Code Online (Sandbox Code Playgroud)

但是，当我测试新变量 UniqueID 是否唯一时，通过运行此代码，我发现存在一些重复项。

length(unique(unlist(df[c("UniqueID")])))

Run Code Online (Sandbox Code Playgroud)

有没有办法生成一个真正唯一、没有重复的唯一ID？

我看过这些问题，但它没有回答如何使生成的随机数唯一。在R中的数据帧列中生成唯一的随机数创建一个数据帧，每列中包含随机数

谢谢

string random r stringi

Lau*_*ra

lucky-day

3
推荐指数

1
解决办法

5021
查看次数

R 中同一列中的条件字符串连接

我是 R 新手，在数据框中有一个非常大的不规则列，如下所示：

x <- data.frame(section = c("BOOK I: Introduction", "Page one: presentation", "Page two: acknowledgments", "MAGAZINE II: Considerations", "Page one: characters", "Page two: index", "BOOK III: General Principles", "BOOK III: General Principles", "Page one: invitation"))

section
BOOK I: Introduction
Page one: presentation
Page two: acknowledgments
MAGAZINE II: Considerations 
Page one: characters
Page two: index
BOOK III: General principles
BOOK III: General principles
Page one: invitation

Run Code Online (Sandbox Code Playgroud)

我需要将此列连接起来，如下所示：

section
BOOK I: Introduction 
BOOK I: Introduction / Page one: presentation
BOOK I: …

Run Code Online (Sandbox Code Playgroud)

string r stringi

abo*_*bot

lucky-day

3
推荐指数

1
解决办法

330
查看次数

如何执行多个字符串模式替换而不覆盖以前的替换？

我想采用代数国际象棋符号并将文件字母（a、b、c、d、e、f、g、h）转换为北约音标字母（alpha、bravo、charlie、echo、foxtrot、golf、hotel）），而不覆盖以前的替换。我在R工作。

notation <- "1.d4 Nf6 2.c4 e6 3.g3 d5 4.Bg2 Be7 5.Nf3 0-0 6.0-0 dxc4 7.Qc2 a6 8.Qxc4 b5 9.Qc2 Bb7 10.Bd2 Ra7 "

期望的结果："1.delta 4 Nfoxtrot 6 2.charlie 4 echo 6 3.golf 3 delta 5"等等。我现在不关心间距。

如果我使用简单的字符串替换方法，替换将相互冲突。

使用 gsub：

notation <- gsub("a", "alpha", notation)
notation <- gsub("b", "bravo", notation)
notation <- gsub("c", "charlie", notation)
notation <- gsub("d", "delta", notation)
notation <- gsub("e", "echo", notation)
notation <- gsub("f", "foxtrot", notation)
notation <- gsub("g", "golf", notation)
notation <- gsub("h", …

Run Code Online (Sandbox Code Playgroud)

replace r stringi

dol*_*ang

2023 09-29

3
推荐指数

2
解决办法

89
查看次数