R使用"dplyr"编程来选择行并返回找到的行的索引

Tec*_*e01 13 r dplyr

设置/问题:

使用dplyr - 我无法确定返回已过滤行的行索引的最佳方式,而不是返回已过滤行的内容.

问题:

我可以使用dplyr :: filter()从数据帧中提取行...问题是想要提取已过滤行的索引值并将其添加到符合搜索条件的索引条目列表中.

题:

是否有一种简单的方法可以使用dplyr根据特定条件搜索数据帧并返回找到的每一行的数字索引?下面的代码使用r :: which()将索引行提取到列表中......

    requiredPackages <- c("dplyr")

    ipak <- function(pkg){
            new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
            if (length(new.pkg))
                    install.packages(new.pkg, dependencies = TRUE)
            sapply(pkg, require, character.only = TRUE)
    }

    ipak(requiredPackages)

    if (!file.exists("./week3/data")) {
            dir.create("./week3/data")
    }

    # CSV Download
    if (!file.exists("./week3/data/americancommunitySurvey.csv")) {
            fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv?accessType=DOWNLOAD"
            download.file(fileUrl, destfile = "./week3/data/americancommunitySurvey.csv", method = "curl")
    }

    housingData <- tbl_df(read.csv("./week3/data/americancommunitySurvey.csv"
                                   , stringsAsFactors = TRUE))

 Now we have to extract the relevant data
#
# Create a logical vector that identifies the households on greater than 10
# acres who sold more than $10,000 worth of agriculture products. Assign that
# logical vector to the variable agricultureLogical. Apply the which() function
# like this to identify the rows of the data frame where the logical vector is
# TRUE. which(agricultureLogical) What are the first 3 values that result?
#
# ACR 1
# Lot size
# b .N/A (GQ/not a one-family house or mobile home)
# 1 .House on less than one acre
# 2 .House on one to less than ten acres
# 3 .House on ten or more acres                 ACR == 3
#
# AGS 1
# Sales of Agriculture Products
# b .N/A (less than 1 acre/GQ/vacant/
#                 .2 or more units in structure)
# 1 .None
# 2 .$ 1 - $ 999
# 3 .$ 1000 - $ 2499
# 4 .$ 2500 - $ 4999
# 5 .$ 5000 - $ 9999
# 6 .$10000+                                    AGS == 6
#
# Thus, we need to select only the results that have a ACR == 3 AND a AGS == 6
#
agricultureLogical <- which(housingData$ACR == 3 & housingData$AGS == 6)
agricultureLogical
# Now we can display the first three values of the resulting list
head(agricultureLogical[1:3])
Run Code Online (Sandbox Code Playgroud)

上面的代码给了我想要的结果,但我想了解如何使用dplyr执行此操作.这是困扰我...我可以使用dplyr :: filter()如下提取行行 - 如何提取每行找到的索引????

agricultureLogical <- filter(housingData, ACR == 3 & housingData$AGS == 6)
Run Code Online (Sandbox Code Playgroud)

R设置

version _
platform x86_64-apple-darwin13.4.0
arch x86_64
os darwin13.4.0
system x86_64,darwin13.4.0
status
major 3
minor 1.2
year 2014
month 10
day 31
svn rev 66913
language R
version.string R version 3.1.2(2014-10 -31)昵称南瓜头盔

dplyr版本0.3.0.2

设置Mac OS X.

型号名称:MacBook Pro型号标识符:MacBookPro10,1处理器名称:Intel Core i7处理器速度:2.7 GHz处理器数量:1个核心总数:4个L2缓存(每个核心):256 KB L3缓存:8 MB内存:16 GB

Ist*_*sta 9

如果您使用的是dplyr> = 0.4,则可以执行以下操作

housingData %>%
  add_rownames() %>%
  filter(ACR == 3 & AGS == 6) %>%
  `[[`("rowname") %>%
  as.numeric() -> agricultureLogical
Run Code Online (Sandbox Code Playgroud)

虽然为什么你会认为这是一个改进

agricultureLogical <- which(housingData$ACR == 3 & housingData$AGS == 6)
Run Code Online (Sandbox Code Playgroud)

逃避我


Tec*_*e01 6

提出的解决方案

这是我想要做的一个例子......这是一种解决方案,但我不喜欢它.感谢Richard Scriven指向1:n()的指针...

手动为数据框添加索引列...

我还没有弄清楚如何为每个符合特定条件的行返回单个索引号...

所以我使用dplyr:mutate()在示例数据框中添加了一个索引列.然后,我在数据帧上使用dplyr :: filter()来根据所需条件应用过滤器.这给我留下了我想要播放的行列表...包括原始数据帧的索引...我现在使用dplyr :: select()来仅提取每行的原始数据帧条目的索引列符合标准......

h1 <- housingData
# Add an index column to the dataframe h1...
h1 <- mutate(h1, IDX = 1:n())
# Filter the h1 dataframe using the criteria defined...
h1 <- filter(h1, ACR == 3 & housingData$AGS == 6)
# Extract the index 
h1 <- select(h1, IDX)
# Convert to an integer list...
agricultureLogical <- as.integer(as.character(h1$IDX))
head(agricultureLogical[1:3])
Run Code Online (Sandbox Code Playgroud)

上面对我来说是重复的努力,因为索引隐含在原始数据框中.因此我的感觉是必须有一种方法来返回过滤器标识的项目的索引集...答案赞赏:-)