使用strsplit获取r中的最后一个字符

Question

使用strsplit获取r中的最后一个字符

dat*_*r02 6 regex string parsing r strsplit

我有一个婴儿名字的文件,我正在阅读,然后试图得到婴儿名字中的最后一个字符.例如,该文件看起来像..

Name      Sex 
Anna      F
Michael   M
David     M
Sarah     F

Run Code Online (Sandbox Code Playgroud)

我在使用中读到了这个

sourcenames = read.csv("babynames.txt", header=F, sep=",")

Run Code Online (Sandbox Code Playgroud)

我最终希望结果看起来像......

Name   Last Initial   Sex
Michael  l             M
Sarah    h             F

Run Code Online (Sandbox Code Playgroud)

我已设法将名称拆分为单独的字符..

sourceout = strsplit(as.character(sourcenames$Name),'')

Run Code Online (Sandbox Code Playgroud)

但是现在我被困在哪里是如何得到最后一封信,所以在迈克尔的情况下,如何得到'我'.我认为tail()可能会工作,但它会返回最后几条记录,而不是每个Name元素中的最后一个字符.

非常感谢任何帮助或建议.

谢谢 :)

Answer 1

Ric*_*ven 12

为了您strsplit的方法来工作,你可以使用tail与sapply

df$LastInit <- sapply(strsplit(as.character(df$Name), ""), tail, 1)
df
#      Name Sex LastInit
# 1    Anna   F        a
# 2 Michael   M        l
# 3   David   M        d
# 4   Sarah   F        h

Run Code Online (Sandbox Code Playgroud)

或者,您可以使用 substring

with(df, substring(Name, nchar(Name)))
# [1] "a" "l" "d" "h"

Run Code Online (Sandbox Code Playgroud)

Answer 2

bar*_*nus 7

从stringi包中尝试此功能:

require(stringi)
x <- c("Ala", "Sarah","Meg")
stri_sub(x, from = -1, to = -1)

Run Code Online (Sandbox Code Playgroud)

此函数提取from和from之间的子串.如果索引为负数,则它会从字符串末尾开始计算字符数.所以,如果from=-1和to=-1它意味着我们要从最后一个子到最后一个字符:)

为何使用stringi？看看这个基准吧:)

require(microbenchmark)
x <- sample(x,1000,T)
microbenchmark(stri_sub(x,-1), str_extract(x, "[a-z]{1}$"), gsub(".*(.)$", "\\1", x), 
                    sapply(strsplit(as.character(x), ""), tail, 1), substring(x, nchar(x)))

Unit: microseconds
                                           expr       min         lq     median         uq       max neval
                                stri_sub(x, -1)    56.378    63.4295    80.6325    85.4170   139.158   100
                    str_extract(x, "[a-z]{1}$")   718.579   764.4660   821.6320   863.5485  1128.715   100
                     gsub(".*(.)$", "\\\\1", x)   478.676   493.4250   509.9275   533.8135   673.233   100
 sapply(strsplit(as.character(x), ""), tail, 1) 12165.470 13188.6430 14215.1970 14771.4800 21723.832   100
                         substring(x, nchar(x))   133.857   135.9355   141.2770   147.1830   283.153   100

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，4 月前
查看次数：	10317 次
最近记录：	8 年，12 月前