我不明白为什么 Pandas 数据框会四舍五入我的列中的值,我将其他两列的值相除。我希望新列中的数字带有两位小数,但这些值是四舍五入的。我检查了列的 dtypes,两者都是“float64”。
import pandas as pd
import numpy as np
# CURRENT DIRECTORY
cd = os.path.dirname(os.getcwd())
# concatenate csv files
dfList = []
for root, dirs, files in os.walk(cd):
for fname in files:
if re.match("output_contigs_SCMgenes.csv", fname):
frame = pd.read_csv(os.path.join(root, fname))
dfList.append(frame)
df = pd.concat(dfList)
#replace nan in SCM column with 0
df['SCM'].fillna(0, inplace=True)
#add column with genes/SCM
df['genes/SCM'] = df['genes']/df['SCM']
Run Code Online (Sandbox Code Playgroud)
输出如下:
genome contig genes SCM genes/SCM
0 20900 48 1 0 inf
1 20900 37 130 103 1 …Run Code Online (Sandbox Code Playgroud) 我有一个文件“specieslist.txt”,其中包含以下信息:
Bacillus,genus
Borrelia,genus
Burkholderia,genus
Campylobacter,genus
Run Code Online (Sandbox Code Playgroud)
现在,我希望 python 在第一列(在本例中为“弯曲杆菌”)中查找变量并返回第二列(“属”)的值。我写了以下代码
import csv
import pandas as pd
species_import = 'Campylobacter'
df = pd.read_csv('specieslist.txt', header=None, names = ['species', 'level'] )
input = df.loc[df['species'] == species_import]
print (input['level'])
Run Code Online (Sandbox Code Playgroud)
但是,我的代码返回太多,而我只想要“属”
3 genus
Name: level, dtype: object
Run Code Online (Sandbox Code Playgroud) 使用我的代码,我循环遍历文件并计算文件中的模式.我的代码如下
from collections import defaultdict
import csv, os, re
from itertools import groupby
import glob
def count_kmers(read, k):
counts = defaultdict(list)
num_kmers = len(read) - k + 1
for i in range(num_kmers):
kmer = read[i:i+k]
if kmer not in counts:
counts[kmer] = 0
counts[kmer] += 1
for item in counts:
return(basename, sequence, item, counts[item])
for fasta_file in glob.glob('*.fasta'):
basename = os.path.splitext(os.path.basename(fasta_file))[0]
with open(fasta_file) as f_fasta:
for k, g in groupby(f_fasta, lambda x: x.startswith('>')):
if k:
sequence = next(g).strip('>\n')
else:
d1 …Run Code Online (Sandbox Code Playgroud)