在以下ipython3会话中,我读取了格式不同的表,并对其中一列中的值求和:
In [278]: F = pd.read_table("../RNA_Seq_analyses/mapping_worm_number_tests/hisat2/mapped_C_elegans/feature_count/W100_1_on_C_elegans/protein_coding_fwd_counts.txt", skip
...: rows=2, usecols=[6]).sum()
In [279]: S = pd.read_table("../RNA_Seq_analyses/mapping_worm_number_tests/hisat2/mapped_C_elegans/intersect_count/W100_1_on_C_elegans/protein_coding_fwd_counts.txt", us
...: ecols=[6], header=None).sum()
In [280]: S
Out[280]:
6 3551266
dtype: int64
In [281]: F
Out[281]:
72 3164181
dtype: int64
In [282]: type(F)
Out[282]: pandas.core.series.Series
In [283]: type(S)
Out[283]: pandas.core.series.Series
In [284]: F[0]
Out[284]: 3164181
In [285]: S[0]
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-285-5a4339994a41> in <module>()
----> 1 S[0]
/home/bli/.local/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
601 result = self.index.get_value(self, key)
602
--> 603 if not is_scalar(result):
604 if is_list_like(result) and not isinstance(result, Series):
605
/home/bli/.local/lib/python3.6/site-packages/pandas/indexes/base.py in get_value(self, series, key)
pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3323)()
pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3026)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4009)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8146)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8090)()
KeyError: 0
Run Code Online (Sandbox Code Playgroud)
如果F和S对象是由相似的操作(sum)产生并且属于相同的类型(pandas.core.series.Series),它们为什么会有不同的行为?
提取我想要的值(一列的总和)的正确方法是什么?
In [297]: F["72"]
Out[297]: 3164181
In [298]: S["6"]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4009)()
pandas/src/hashtable_class_helper.pxi in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8125)()
TypeError: an integer is required
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-298-0127424036a0> in <module>()
----> 1 S["6"]
/home/bli/.local/lib/python3.6/site-packages/pandas/core/series.py in __getitem__(self, key)
601 result = self.index.get_value(self, key)
602
--> 603 if not is_scalar(result):
604 if is_list_like(result) and not isinstance(result, Series):
605
/home/bli/.local/lib/python3.6/site-packages/pandas/indexes/base.py in get_value(self, series, key)
pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3323)()
pandas/index.pyx in pandas.index.IndexEngine.get_value (pandas/index.c:3026)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4075)()
KeyError: '6'
Run Code Online (Sandbox Code Playgroud)
进一步调查:
In [306]: print(S.index)
Int64Index([6], dtype='int64')
In [307]: print(F.index)
Index(['72'], dtype='object')
In [308]: S[6]
Out[308]: 3551266
Run Code Online (Sandbox Code Playgroud)
因此,这两个对象最终具有不同类型的索引。这种行为让我想起了...
看来这header=None导致了用数字为索引的列S,而没有与header=None组合的skiprows=2结果是从第三行读取的数据生成了索引。(这揭示了我解析熊猫数据的方式中的一个错误……)
我认为您需要:
#select first value of one element series
f = F.iat[0]
#alternative
#f = F.iloc[0]
Run Code Online (Sandbox Code Playgroud)
要么:
#convert to numpy array and select first value
f = F.values[0]
Run Code Online (Sandbox Code Playgroud)
要么:
f = F.item()
Run Code Online (Sandbox Code Playgroud)
而且我认为您会出错,因为没有索引值0。
正如IanS所评论的那样,应该按索引值6和选择72:
f = F[72]
#f = f.loc[72]
s = S[6]
#s = S.loc[6]
Run Code Online (Sandbox Code Playgroud)
样品:
F = pd.Series([3164181], index=[72])
f = F[72]
print (f)
3164181
print (F.index)
Int64Index([72], dtype='int64')
print (F.index.tolist())
[72]
f = F[0]
print (f)
Run Code Online (Sandbox Code Playgroud)
KeyError:0
您在中获得一个整数索引S,因为参数header=None-pandas添加了默认索引(0,1,...)。对于F使用的6th列称为'72'-它是字符串。有区别。
| 归档时间: |
|
| 查看次数: |
1352 次 |
| 最近记录: |