我在pandas中有一个数据帧,用于生成散点图,并希望为绘图包含回归线.现在我正试着用polyfit做这件事.
这是我的代码:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from numpy import *
table1 = pd.DataFrame.from_csv('upregulated_genes.txt', sep='\t', header=0, index_col=0)
table2 = pd.DataFrame.from_csv('misson_genes.txt', sep='\t', header=0, index_col=0)
table1 = table1.join(table2, how='outer')
table1 = table1.dropna(how='any')
table1 = table1.replace('#DIV/0!', 0)
# scatterplot
plt.scatter(table1['log2 fold change misson'], table1['log2 fold change'])
plt.ylabel('log2 expression fold change')
plt.xlabel('log2 expression fold change Misson et al. 2005')
plt.title('Root Early Upregulated Genes')
plt.axis([0,12,-5,12])
# this is the part I'm unsure about
regres = polyfit(table1['log2 fold change misson'], table1['log2 …Run Code Online (Sandbox Code Playgroud) 我试图从我用csv文件制作的字典中提取一组随机的键值对.字典包含基因信息,基因名称是字典键,数字列表(与基因表达等相关)是值.
# python 2.7.5
import csv
import random
genes_csv = csv.reader(open('genes.csv', 'rb'))
genes_dict = {}
for row in genes_csv:
genes_dict[row[0]] = row[1:]
length = raw_input('How many genes do you want? ')
for key in genes_dict:
random_list = random.sample(genes_dict.items(), int(length))
print random_list
Run Code Online (Sandbox Code Playgroud)
问题是,如果我试图得到100个基因的列表(例如),它似乎迭代整个字典并返回100个基因的每个可能的组合.
我正在尝试迭代python 2.7.5中的列表列表,并返回第二个列表中找到第一个值的列表,如下所示:
#python 2.7.5
list1 = ['aa', 'ab', 'bb', 'bc', 'cc']
list2 = [['aa', 1, 3, 7],['de', 2, 2, 1],['bc', 3, 4, 4]]
list3 = []
for x in list1:
for y in list2:
if x == y:
list3.append(y)
Run Code Online (Sandbox Code Playgroud)
所以我希望list3包含[['aa',1,3,7],['bc', 3, 4, 4]]但是我得到了整个list2.
python ×3
csv ×1
dictionary ×1
list ×1
matplotlib ×1
numpy ×1
pandas ×1
random ×1
regression ×1