I am struggling to get confidence interval around my regression line. My data consists of 7 columns and 50000 rows.
ID H.FC HFD N.FC NFD Group
G00000000004 1.08403833300442 0.00209119205547622 1.12705351468201 0.0017652841766293 BvsA
G00000000059 1.70298155378132 0.000146008455537281 1.78927991144484 0.000126476263754446 BvsA
G00000000067 1.48885136450707 1.94192154467639e-05 1.49169658915702 5.47633140183071e-05 CvsA
G00000000081 5.92680429312136 3.63075878342954e-06 5.89059544062979 7.07992913581687e-06 DvsA
G00000000086 0.499795076715132 0.00265935106849242 0.542319766242586 0.00212335608196823 BvsC
G00000000102 -2.60510733887004 0.000669953697126189 -2.62720386931755 0.000122899865824463 BvsA
G00000000104 -2.80909148854584 0.00686396994798396 -2.94362698679174 0.00342818761913247 BvsA
G00000000106 0.255264785072867 0.0388723342557597 0.174743590276556 0.197263787912382 BvsD
G00000000109 1.32895814248434 0.000311378914835491 1.30541212379603 0.000308851884560488 EvsF …Run Code Online (Sandbox Code Playgroud) 这个问题与以下内容有关: 如何从滑行中获取王国,门,阶级,顺序,家庭,属和物种的分类学特定ID?
那里给出的解决方案是可行的,但是我想为已定义等级的每个分类ID提供名称。我在ete3上发现了这一点,它可以完成这项工作:
names = ncbi.get_taxid_translator(lineage)
print [names[taxid] for taxid in lineage]
Run Code Online (Sandbox Code Playgroud)
但不是Python程序员,我无法将其合并到以上链接中给出的代码中。这是我尝试过的:
import csv
from ete3 import NCBITaxa
ncbi = NCBITaxa()
def get_desired_ranks(taxid, desired_ranks):
lineage = ncbi.get_lineage(taxid)
print lineage
#[1, 131567, 2157, 28890, 183925, 2158, 2159, 2160, 2162, 1204725]
names = ncbi.get_taxid_translator(lineage)
print names
#{1: u'root', 2157: u'Archaea', 2158: u'Methanobacteriales', 2159: u'Methanobacteriaceae', 2160: u'Methanobacterium', 2162: u'Methanobacterium formicicum', 183925: u'Methanobacteria', 28890: u'Euryarchaeota', 131567: u'cellular organisms', 1204725: u'Methanobacterium formicicum DSM 3637'}
lineage2ranks = ncbi.get_rank(names)
print lineage2ranks
#{1: u'no rank', 2157: u'superkingdom', …Run Code Online (Sandbox Code Playgroud)