熊猫:消除循环

Zhu*_*arb 5 python vectorization dataframe pandas

我有两个Pandas数据框,分别是:habitat_familyhabitat_species。我想habitat_species根据分类标准lookupMap和中的值进行填充habitat_family

import pandas as pd
import numpy as np
species = ['tiger', 'lion', 'mosquito', 'ladybug', 'locust', 'seal', 'seabass', 'shark', 'dolphin']
families = ['mammal','fish','insect']
lookupMap = {'tiger':'mammal', 'lion':'mammal', 'mosquito':'insect', 'ladybug':'insect', 'locust':'insect',
            'seal':'mammal', 'seabass':'fish', 'shark':'fish', 'dolphin':'mammal' }

habitat_family = pd.DataFrame({'id': range(1,11),
                         'mammal': [101,123,523,562,546,213,562,234,987,901],
                         'fish' :  [625,254,929,827,102,295,174,777,123,763],
                         'insect': [345,928,183,645,113,942,689,539,789,814] 
                         }, index=range(1,11), columns=['id','mammal','fish','insect'])

habitat_species = pd.DataFrame(0.0, index=range(1,11), columns=species)

# My highly inefficient solution:
for id in habitat_family.index: # loop through habitat id's
   for spec in species: # loop through species
       corresp_family = lookupMap[spec]
       habitat_species.loc[id,spec] = habitat_family.loc[id,corresp_family]
Run Code Online (Sandbox Code Playgroud)

上面的嵌套for循环可以完成这项工作。但是实际上,我的数据帧的大小很大,并且使用for循环是不可行的。

有没有可能dataframe.apply()使用类似功能的更有效方法?

编辑:所需的输出habitat_species是:

habitat_species
    tiger  lion  mosquito  ladybug  locust  seal  seabass  shark  dolphin
1     101   101       345      345     345   101      625    625      101
2     123   123       928      928     928   123      254    254      123
3     523   523       183      183     183   523      929    929      523
4     562   562       645      645     645   562      827    827      562
5     546   546       113      113     113   546      102    102      546
6     213   213       942      942     942   213      295    295      213
7     562   562       689      689     689   562      174    174      562
8     234   234       539      539     539   234      777    777      234
9     987   987       789      789     789   987      123    123      987
10    901   901       814      814     814   901      763    763      901
Run Code Online (Sandbox Code Playgroud)

Dan*_*lan 4

你根本不需要任何循环。一探究竟:

In [12]: habitat_species = habitat_family[Series(species).replace(lookupMap)]

In [13]: habitat_species.columns = species

In [14]: habitat_species
Out[14]: 
    tiger  lion  mosquito  ladybug  locust  seal  seabass  shark  dolphin
1     101   101       345      345     345   101      625    625      101
2     123   123       928      928     928   123      254    254      123
3     523   523       183      183     183   523      929    929      523
4     562   562       645      645     645   562      827    827      562
5     546   546       113      113     113   546      102    102      546
6     213   213       942      942     942   213      295    295      213
7     562   562       689      689     689   562      174    174      562
8     234   234       539      539     539   234      777    777      234
9     987   987       789      789     789   987      123    123      987
10    901   901       814      814     814   901      763    763      901

[10 rows x 9 columns]
Run Code Online (Sandbox Code Playgroud)