Python pandas 从其他列返回值

Question

Python pandas 从其他列返回值

我有一个文件“specieslist.txt”，其中包含以下信息：

Bacillus,genus
Borrelia,genus
Burkholderia,genus
Campylobacter,genus

Run Code Online (Sandbox Code Playgroud)

现在，我希望 python 在第一列（在本例中为“弯曲杆菌”）中查找变量并返回第二列（“属”）的值。我写了以下代码

import csv
import pandas as pd

species_import = 'Campylobacter'
df = pd.read_csv('specieslist.txt', header=None, names = ['species', 'level'] )
input = df.loc[df['species'] == species_import]
print (input['level'])

Run Code Online (Sandbox Code Playgroud)

但是，我的代码返回太多，而我只想要“属”

3    genus
Name: level, dtype: object

Run Code Online (Sandbox Code Playgroud)

Answer 1

jez*_*ael 5

您可以通过以下方式选择系列的第一个值iat：

species_import = 'Campylobacter'
out = df.loc[df['species'] == species_import, 'level'].iat[0]
#alternative
#out = df.loc[df['species'] == species_import, 'level'].values[0]
print (out)
genus

Run Code Online (Sandbox Code Playgroud)

如果没有匹配的值并empty Series返回，则更好的解决方案工作- 它返回no match：

@jpp 评论
只有当您有一个大系列并且匹配值预计接近顶部时，此解决方案才更好

species_import = 'Campylobacter'
out = next(iter(df.loc[df['species'] == species_import, 'level']), 'no match')
print (out)
genus

Run Code Online (Sandbox Code Playgroud)

编辑：

来自评论的想法，感谢@jpp：

def get_first_val(val):
    try:
        return df.loc[df['species'] == val, 'level'].iat[0]
    except IndexError:
        return 'no match'

print (get_first_val(species_import))
genus

print (get_first_val('aaa'))
no match

Run Code Online (Sandbox Code Playgroud)

编辑：

df = pd.DataFrame({'species':['a'] * 10000 + ['b'], 'level':np.arange(10001)})

def get_first_val(val):
    try:
        return df.loc[df['species'] == val, 'level'].iat[0]
    except IndexError:
        return 'no match'


In [232]: %timeit next(iter(df.loc[df['species'] == 'a', 'level']), 'no match')
1.3 ms ± 33.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [233]: %timeit (get_first_val('a'))
1.1 ms ± 21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)



In [235]: %timeit (get_first_val('b'))
1.48 ms ± 206 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [236]: %timeit next(iter(df.loc[df['species'] == 'b', 'level']), 'no match')
1.24 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，1 月前
查看次数：	1419 次
最近记录：	7 年，1 月前