我正在尝试从特定类型的数据创建列联表.这对于循环等是可行的......但是因为我的最终表将包含超过10E5的单元格,所以我正在寻找一个预先存在的函数.
我的初步数据如下:
PLANT ANIMAL INTERACTIONS
---------------------- ------------------------------- ------------
Tragopogon_pratensis Propylea_quatuordecimpunctata 1
Anthriscus_sylvestris Rhagonycha_nigriventris 3
Anthriscus_sylvestris Sarcophaga_carnaria 2
Heracleum_sphondylium Sarcophaga_carnaria 1
Anthriscus_sylvestris Sarcophaga_variegata 4
Anthriscus_sylvestris Sphaerophoria_interrupta_Gruppe 3
Cerastium_holosteoides Sphaerophoria_interrupta_Gruppe 1
Run Code Online (Sandbox Code Playgroud)
我想创建一个这样的表:
Propylea_quatuordecimpunctata Rhagonycha_nigriventris Sarcophaga_carnaria Sarcophaga_variegata Sphaerophoria_interrupta_Gruppe
---------------------- ----------------------------- ----------------------- ------------------- -------------------- -------------------------------
Tragopogon_pratensis 1 0 0 0 0
Anthriscus_sylvestris 0 3 2 4 3
Heracleum_sphondylium 0 0 1 0 0
Cerastium_holosteoides 0 0 0 0 1
Run Code Online (Sandbox Code Playgroud)
也就是说,行中的所有植物物种,列中的所有动物物种,有时没有相互作用(而我的初始数据仅列出发生的相互作用).
我试图通过将每行除以行总和得到每个单元格的比例,但是R给了我一个错误说,
data.table $ Country:$ operator对原子向量无效
我怎样才能解决这个问题?另外,如何将整个列和行的总和值添加到data.table?我跑的时候会得到这个值addmargins(data.table)
,但是我想将这些值附加到我的数据帧中.
这是我的代码:
x = c(40,50,30,30,50)
y = c(40,20,30,40,45)
data.table = rbind(x,y)
data.table
dimnames(data.table)=list("Country"=c("England","Germany"),"Score"=c("Q-Score","T-score","X-score","Y-score","Z-score"))
addmargins(data.table)
table(data.table$Country,data.table$Score/rowSums(table(data.table&Country,data.table$Score)))
Run Code Online (Sandbox Code Playgroud) 我有一个数据框,看起来像这样:
structure(list(ab = c(0, 1, 1, 1, 1, 0, 0, 0, 1, 1), bc = c(1,
1, 1, 1, 0, 0, 0, 1, 0, 1), de = c(0, 0, 1, 1, 1, 0, 1, 1, 0,
1), cl = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 2)), .Names = c("ab", "bc",
"de", "cl"), row.names = c(NA, -10L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
列cl表示簇关联,变量ab,bc&de携带二进制答案,其中1表示是和0 - 否.
我正在尝试创建一个表交叉标签集群以及数据框中的所有其他列,即ab,bc和de,其中集群成为列变量.所需的输出是这样的
1 2 3
ab 1 3 2
bc 2 3 1
de 2 3 …
Run Code Online (Sandbox Code Playgroud) 是否有一个事件日志源始终可供ASP.NET webapp写入?
背景故事,如果有人有一个看似无关的解决方案:
我们的ASP.NET webapp使用自己的事件日志源,但它没有创建它的权限.因此,如果webapp尝试写入条目时事件日志源不存在(安装说明指示管理员手动注册事件日志源,但......),我们的webapp不会放入任何内容有问题时的事件日志.
我希望有另一个(应用程序不可知)源我可以用来通知观看事件日志的人.
我特别开始思考这个问题,试图让值形成一个不重复的向量。unique
不好(根据我可以从文档中收集的内容),因为它为您提供了重复的元素,但只有一次。duplicated
有同样的问题,因为它在第一次找到重复的值时给你 FALSE。这是我的解决方法
> d=c(1,2,4,3,4,6,7,8,5,10,3)
> setdiff(d,unique(d[duplicated(d)]))
[1] 1 2 6 7 8 5 10
Run Code Online (Sandbox Code Playgroud)
以下是更通用的方法
> table(d)->g
> as.numeric(names(g[g==1]))
[1] 1 2 5 6 7 8 10
Run Code Online (Sandbox Code Playgroud)
我们可以将其推广到 1 以外的其他值。但我发现这个解决方案有点笨拙,将字符串转换为数字。有没有更好或更直接的方法来获得这个向量?
Fisher's Exact Test 与超几何分布有关,我希望这两个命令会返回相同的 pvalues。谁能解释我做错了什么,他们不匹配?
#data (variable names chosen to match dhyper() argument names)
x = 14
m = 20
n = 41047
k = 40
#Fisher test, alternative = 'greater'
(fisher.test(matrix(c(x, m-x, k-x, n-(k-x)),2,2), alternative='greater'))$p.value
#returns 2.01804e-39
#geometric distribution, lower.tail = F, i.e. P[X > x]
phyper(x, m, n, k, lower.tail = F, log.p = F)
#returns 5.115862e-43
Run Code Online (Sandbox Code Playgroud) 我正在尝试在 rmarkdonw html 文档中生成格式良好的列联表。这是代码:
\n\n---\ntitle: "Probabilidad"\nauthor: "Nicol\xc3\xa1s Molano Gonzalez"\ndate: "7 de Abril de 2020"\noutput:\n html_document:\n fig_caption: true\n---\n\n```{r echo=F, message = FALSE, warning =F}\n\nlibrary(tidyverse)\nlibrary(kableExtra)\nlibrary(knitr)\n\nset.seed(150)\n\n```\n
Run Code Online (Sandbox Code Playgroud)\n\n这是表的数据:
\n\n```{r echo=FALSE, results = \'asis\'}\n\nca_ctr_r<-.3\n\nn <- 250\nnCA <- round(n*ca_ctr_r)\nz0 <- data.frame(status=c(rep("CA",nCA),rep("CTR",n-nCA)))\nz0$exposition <- NA\nexp_CA <- .45\nexp_CTR <- .19\n\nz0[z0$status %in% "CA","exposition"] <- ifelse(runif(nCA) < exp_CA,"yes","no")\nz0[z0$status %in% "CTR","exposition"] <- ifelse(runif(n-nCA) < exp_CA,"yes","no")\n\nz0$exposition <- factor(z0$exposition,levels = c("yes","no"))\n\n```\n
Run Code Online (Sandbox Code Playgroud)\n\n这是打印列联表的代码,应该改进。
\n\n```{r echo=FALSE, results = \'asis\'}\n\nres <- kable(t(table(z0)%>%addmargins))\n#res <- kable(t(table(z0)))\nkable_styling(res,"striped", position = "center",full_width = F) %>% add_header_above(c("exposition","status"=2," "))\n\n```\n …
Run Code Online (Sandbox Code Playgroud) 我正在尝试使用列联表计算 python 中的卡方值。这是一个例子。
+--------+------+------+
| | Cat1 | Cat2 |
+--------+------+------+
| Group1 | 80 | 120 |
| Group2 | 420 | 380 |
+--------+------+------+
Run Code Online (Sandbox Code Playgroud)
预期值为:
+--------+------+------+
| | Cat1 | Cat2 |
+--------+------+------+
| Group1 | 100 | 100 |
| Group2 | 400 | 400 |
+--------+------+------+
Run Code Online (Sandbox Code Playgroud)
如果我手动计算卡方值,我得到 10。但是使用 python 我得到 9.506。我使用以下代码:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
import scipy
# Some fake data.
n = 5 # Number of samples. …
Run Code Online (Sandbox Code Playgroud) 我想在 Pandas 中创建一个列联表。我可以用下面的代码来做,但我想知道是否有一个 Pandas 函数可以为我做这件事。
对于可重现的示例:
toy_data #json
'{"Light":{"321":"no_light","476":"night_light","342":"lamp","454":"lamp","25":"night_light","53":"night_light","120":"night_light","346":"night_light","360":"lamp","55":"no_light","391":"night_light","243":"no_light","101":"night_light","377":"night_light","124":"no_light","368":"lamp","400":"no_light","247":"night_light","270":"lamp","208":"night_light"},"Nearsightedness":{"321":"No","476":"Yes","342":"Yes","454":"Yes","25":"No","53":"Yes","120":"Yes","346":"No","360":"No","55":"Yes","391":"Yes","243":"No","101":"No","377":"Yes","124":"No","368":"No","400":"No","247":"No","270":"Yes","208":"No"}}'
toy_data.head()
Light Nearsightedness
321 no_light No
476 night_light Yes
342 lamp Yes
454 lamp Yes
25 night_light No
df = pd.DataFrame(toy_data.groupby(['Light', 'Nearsightedness']).size())
df = df.unstack('Nearsightedness')
df.columns = df.columns.droplevel()
df
Nearsightedness No Yes
Light
lamp 2 3
night_light 5 5
no_light 4 1
Run Code Online (Sandbox Code Playgroud) 我得到了以下双向列联表,其中包含细胞百分比和频率(括号中)。
gender blue blue-gray brown dark hazel yellow
female 33.33% (3) 0.00% (0) 55.56% (5) 0.00% (0) 11.11% (1) 0.00% (0)
male 34.62% (9) 3.85% (1) 46.15% (12) 3.85% (1) 3.85% (1) 7.69% (2)
Run Code Online (Sandbox Code Playgroud)
我使用的代码R
是
library(dplyr)
library(janitor)
starwars %>%
filter(species == "Human") %>%
tabyl(gender, eye_color) %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits = 2) %>%
adorn_ns()
Run Code Online (Sandbox Code Playgroud)
但是,我想要获得相同类型的表格,其中包含单元格频率和百分比(在括号中)。请提供任何帮助。
contingency ×10
r ×7
pandas ×2
python ×2
asp.net ×1
dataframe ×1
dplyr ×1
event-log ×1
fallback ×1
janitor ×1
kableextra ×1
knitr ×1
margin ×1
p-value ×1
python-3.x ×1
r-markdown ×1
statistics ×1
tidyverse ×1