如何在R中将一列的信息相互链接

Question

如何在R中将一列的信息相互链接

Aza*_*aei 3 select r filter left-join dplyr

我有一个包含大约 150 万个人的数据集，可以通过 \xe2\x80\x9cHousehold ID\xe2\x80\x9d 在他们的家庭中进行区分。该数据集有一列（关系），根据家庭成员与户主的关系指定家庭成员之间的关系。

\n

    library(tidyverse)\n    sample <- tibble( \n    household.ID = c(11015015988, 11015015988, 11015015988 , \n    11015015988 , 11015015988, \n    11015015988, 11015015988, 11015015988,228979641, \n    228979641, 228979641 ,228979641),\n    member.ID= c(1101502683 ,11015026954,11015027098,11015027231 \n    ,11015027353,11015027484 \n    ,11015027615 ,11015027751,228992311,228996137,229001877,229005869),\n    relationship = c(1,2,3,3,3,3,2,3,1,3,2,2),\n    gender = c(1,2,1,2,1,1,2,2,1, 2, 2 , 2),\n    age = c(54,54,30,26,23,31,20,2, 60,12,34,62),\n    marriage.status= c(1,1,4,4,4,1,1,NA, 1, 4, 1,1),\n    children.ever.born= c(NA,8,NA,NA,NA,NA,1,NA,NA,NA,1,1),\n    living.children  = c(NA,8,NA,NA,NA,NA,1,NA,NA,NA,1,1))\n

Run Code Online (Sandbox Code Playgroud)\n

\xe2\x80\x9ccode 1\xe2\x80\x9d 指户主；\xe2\x80\x9ccode 2\xe2\x80\x9d 是团长的妻子；\xe2\x80\x9ccode 3\xe2\x80\x9d 是 head 的孩子；因此，家庭中具有 \xe2\x80\x9ccode 2\xe2\x80\x9d 的任何人都是 \xe2\x80\x9ccode 3\xe2\x80\x9d 的母亲。我需要链接 \xe2\x80\x9ccode 2\xe2\x80\x9d 和 \xe2\x80\x9ccode 3\xe2\x80\x9d，以便在每个家庭 ID 中，关系值 3 标识来自同一家庭的成员 ID关系值为 2。使用下面的代码我可以做到这一点。

\n

    sample2 <- select(sample, 1:8) \n    spouse_links <- left_join(relationship = "many-to-many",\n    sample2 %>% filter(relationship == 3),\n    sample2 %>% filter(relationship == 2, !is.na(living.children)) %>%\n    rename(member.ID.mother = member.ID), join_by(household.ID)) %>%\n    filter(!is.na(relationship.y)) %>%\n    select(1:8, member.ID.mother)\n

Run Code Online (Sandbox Code Playgroud)\n

然而，我对一夫多妻制（与多个配偶结婚的做法）为习俗的家庭有疑问。在这样的家庭中，我们在关系栏中面临两个或多个\xe2\x80\x9ccode 2\xe2\x80\x9d；因此，R 代码将所有 \xe2\x80\x9ccode 3\xe2\x80\x9d 链接到 \xe2\x80\x9ccode 2\xe2\x80\x9d。为了澄清这一点，请考虑下表：

\n

根据该表，我们可以发现，孩子与母亲的联系是基于家庭关系的远近。例如，第 2 行是第 3、4、5 和 6 行子项的母亲，第 7 行是第 8 行的母亲；然而，情况并非总是如此，有时母亲和“活着的孩子”的年龄可能会有所帮助。我需要到达如下所示的表：

\n

我非常感谢您能提供的任何帮助。

\n

Answer 1

jps*_*ith 5

可能有一种更优雅的方法来解决这个问题，但您可以使用cumsum(sample$relationship == 2)创建指标来按家庭内部分解价值观。

在管道中dplyr，这可以应用于将每个组拆分为一个列表（通过group_split），然后用于map创建mother.member.ID变量：

library(dplyr)

sample %>%
  mutate(row = row_number(),
         grp = cumsum(relationship == 2),
         .by = household.ID) %>%
  filter(relationship != 1) %>%
  group_by(household.ID, grp) %>%
  group_split() %>%
  purrr::map(~filter(., any(relationship == 2)) %>% 
               mutate(mother.member.ID = member.ID[relationship %in% 2]) %>% 
        filter(relationship == 3)) %>%
  bind_rows() %>% select(-grp)

#  household.ID   member.ID relationship gender   age marriage.status children.ever.born living.children   row mother.member.ID
#         <dbl>       <dbl>        <dbl>  <dbl> <dbl>           <dbl>              <dbl>           <dbl> <int>            <dbl>
#1  11015015988 11015027098            3      1    30               4                 NA              NA     3      11015026954
#2  11015015988 11015027231            3      2    26               4                 NA              NA     4      11015026954
#3  11015015988 11015027353            3      1    23               4                 NA              NA     5      11015026954
#4  11015015988 11015027484            3      1    31               1                 NA              NA     6      11015026954
#5  11015015988 11015027751            3      2     2              NA                 NA              NA     8      11015027615

Run Code Online (Sandbox Code Playgroud)

祝 tp 快乐，感谢您更新您的样本数据 - 您能否澄清一下，会员 ID 号 228996137 将分配给谁？ (2认同)
@AzamMirzaei - 我刚刚编辑了我认为适合您条件的内容（如果感到困惑，请忽略家庭） - 如果这仍然不起作用，请告诉我！ (2认同)
@AzamMirzaei，这更多的是代表您和您的研究团队的判断，而不是客观的编码方法 - 也许您可能希望根据具体情况手动处理这些异常。 (2认同)

归档时间：	2 年，4 月前
查看次数：	227 次
最近记录：	2 年，4 月前