我有以下数据框:
df <- structure(list(BoneMarrow = c(30, 0, 0, 31138, 2703), Pulmonary = c(3380,
21223.3333333333, 0, 0, 27)), row.names = c("ATP1B1", "CYCS",
"DDX5", "GNB2L1", "PRR11"), class = "data.frame", .Names = c("BoneMarrow",
"Pulmonary"))
df
#> BoneMarrow Pulmonary
#> ATP1B1 30 3380.00
#> CYCS 0 21223.33
#> DDX5 0 0.00
#> GNB2L1 31138 0.00
#> PRR11 2703 27.00
Run Code Online (Sandbox Code Playgroud)
我想要做的是摆脱任何列中值<8的行.我尝试了这个,但行名称(例如ATP1B1,CYCS等)消失了:
> df %>% filter(!apply(., 1, function(row) any(row <= 8 )))
BoneMarrow Pulmonary
1 30 3380
2 2703 27
Run Code Online (Sandbox Code Playgroud)
我怎样才能在dplyr链中保留它?
我有以下数据框:
library(dplyr)
library(tibble)
df <- tibble(
source = c("a", "b", "c", "d", "e"),
score = c(10, 5, NA, 3, NA ) )
df
Run Code Online (Sandbox Code Playgroud)
它看起来像这样:
# A tibble: 5 x 2
source score
<chr> <dbl>
1 a 10 . # current max value
2 b 5
3 c NA
4 d 3
5 e NA
Run Code Online (Sandbox Code Playgroud)
我想要做的是NA用现有的值范围替换分数列max + n。其中n范围从 1 到总行数df
导致这个(手工编码):
source score
a 10
b 5
c 11 # obtained from 10 + …Run Code Online (Sandbox Code Playgroud) 我有这个字符串:
seed_pattern <- "K?ED??HRDDKDKD?HE?REKE??DE?KKK"
Run Code Online (Sandbox Code Playgroud)
给定另一个字符串
bb_seq <- "rhhhhitv"
Run Code Online (Sandbox Code Playgroud)
我想做的是通过保持结果的顺序来替换?为字符:bb_seqbb_seq
的总长度?保证与 相同bb_seq。
KrEDhhHRDDKDKDhHEhREKEitDEvKKK
Run Code Online (Sandbox Code Playgroud)
我怎样才能用 R 实现这一目标?
我尝试过这个但失败了:
seed_pattern <- "K?ED??HRDDKDKD?HE?REKE??DE?KKK"
bb_seq <- "rhhhhitv"
sp <- seed_pattern
gr <- gregexpr("\\?+", sp)
csml <- lapply(gr, function(sp) cumsum(attr(sp, "match.length")))
regmatches(sp, gr) <- lapply(csml, function(sp) substring(bb_seq, c(1, sp[1]), sp))
sp
# KrEDrhhHRDDKDKDrhhhHErhhhhREKErhhhhitDErhhhhitvKKK
Run Code Online (Sandbox Code Playgroud)
我对非正则表达式解决方案持开放态度。
我有以下数据框:
import pandas as pd
# Create DataFrame
df = pd.DataFrame(
{'id':[2967, 5335, 13950, 6141, 6169],\
'Player': ['Cedric Hunter', 'Maurice Baker' ,\
'Ratko Varda' ,'Ryan Bowen' ,'Adrian Caldwell'],\
'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
'Age': [27 ,25 ,22 ,34 ,31],\
'Tm':['CHH' ,'VAN' ,'TOT' ,'OKC' ,'DAL'],\
'G':[6 ,7 ,60 ,52 ,81]})
df.set_index('Player', inplace=True)
Run Code Online (Sandbox Code Playgroud)
表明:
Out[128]:
Age G Tm Year id
Player
Cedric Hunter 27 6 CHH 1991 2967
Maurice Baker 25 7 VAN 2004 5335
Ratko Varda 22 60 TOT 2001 …Run Code Online (Sandbox Code Playgroud) 我有以下使用dplyr 的 group_split 的过程:
library(tidyverse)
set.seed(1)
iris %>% sample_n(size = 5) %>%
group_by(Species) %>%
group_split()
Run Code Online (Sandbox Code Playgroud)
结果是:
[[1]]
# A tibble: 2 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5 3.5 1.6 0.6 setosa
2 5.1 3.8 1.5 0.3 setosa
[[2]]
# A tibble: 2 x 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fct>
1 5.9 3 4.2 1.5 versicolor
2 6.2 2.2 4.5 1.5 versicolor
[[3]]
# A tibble: …Run Code Online (Sandbox Code Playgroud) 我有以下列表清单.它包含两个变量:对和基因.包含的pair是带有两个字符串的向量.变量 genes是一个可以包含多个值的向量.
lol <- list(structure(list(pair = c("BoneMarrow", "Pulmonary"), genes = "PRR11"), .Names = c("pair",
"genes")), structure(list(pair = c("BoneMarrow", "Umbilical"),
genes = "GNB2L1"), .Names = c("pair", "genes")), structure(list(
pair = c("Pulmonary", "Umbilical"), genes = "ATP1B1"), .Names = c("pair",
"genes")))
lol
#> [[1]]
#> [[1]]$pair
#> [1] "BoneMarrow" "Pulmonary"
#>
#> [[1]]$genes
#> [1] "PRR11"
#>
#>
#> [[2]]
#> [[2]]$pair
#> [1] "BoneMarrow" "Umbilical"
#>
#> [[2]]$genes
#> [1] "GNB2L1"
#>
#>
#> [[3]]
#> [[3]]$pair
#> [1] …Run Code Online (Sandbox Code Playgroud) 我有以下数据框:
library(tidyverse)
tdat <- structure(list(term = c("Hepatic Fibrosis / Hepatic Stellate Cell Activation",
"Cellular Effects of Sildenafil (Viagra)", "Epithelial Adherens Junction Signaling",
"STAT3 Pathway", "Nitric Oxide Signaling in the Cardiovascular System",
"LXR/RXR Activation", "NF-?B Signaling", "PTEN Signaling", "Gap Junction Signaling",
"G-Protein Coupled Receptor Signaling", "Role of Osteoblasts, Osteoclasts and Chondrocytes in Rheumatoid Arthritis",
"Osteoarthritis Pathway", "VDR/RXR Activation", "Axonal Guidance Signaling",
"Basal Cell Carcinoma Signaling", "Putrescine Degradation III",
"Tryptophan Degradation X (Mammalian, via Tryptamine)", "Factors Promoting Cardiogenesis in Vertebrates",
"Dopamine Degradation", …Run Code Online (Sandbox Code Playgroud) 我有以下数据框,我喜欢使用circlize 绘制:
library(circlize)
library(tidyverse)
circos_tc_dat <- structure(list(ligand = c("Cxcr4 ", "Cd44 ", "Cxcr4 ", "Cxcr4 ",
"Csf2rb ", "Plaur ", "Plaur ", "Cxcr4 ", "Csf3r ", "Sell ", "Tnfrsf1b ",
"Sell ", "Csf2rb ", "Tnfrsf1b ", "Csf2rb ", "Il1r2 ", "Plaur ",
"Calm1 ", "Cd44 ", "Ptafr ", "Il1r2 ", "Calm1 ", "Cxcr2 ", "Cxcr2 "
), receptor = c("Dsg2", "Itgb1", "Cxcl10", "Cxcl10", "Itgb1",
"Itgb1", "Agt", "Csf1", "Csf1", "Icam1", "Calm1", "Calm1", "Tnf",
"App", "Il1b", "Tnf", "Il1b", "Tnf", "Mmp9", …Run Code Online (Sandbox Code Playgroud) 我有以下顺序:
my_seq <- "----?????-----?V?D????-------???IL??A?---"
Run Code Online (Sandbox Code Playgroud)
我想做的是检测非虚线字符的位置范围。
----?????-----?V?D????-------???IL??A?---
| | | | | | |
1 5 9 15 22 30 38
Run Code Online (Sandbox Code Playgroud)
最终输出将是一个字符串向量:
out <- c("5-9", "15-22", "30-38")
Run Code Online (Sandbox Code Playgroud)
我怎样才能用 R 实现这一目标?
我在 R 代码中有以下字符串。
aas <- "QAWDIIKRIDKK"
Run Code Online (Sandbox Code Playgroud)
我想检查包含以下向量中的字符的字符串的最长连续片段:
hydrophobic_res <- c("W", "F", "I", "L", "V", "M", "C", "A", "G")
Run Code Online (Sandbox Code Playgroud)
答案是:
AW, II
Run Code Online (Sandbox Code Playgroud)
其他例子:
QFILVMD -> FILVM
Run Code Online (Sandbox Code Playgroud)
我怎样才能在 R 中做到这一点?