Zhu*_*arb 5 python vectorization dataframe pandas
我有两个Pandas数据框,分别是:habitat_family和habitat_species。我想habitat_species根据分类标准lookupMap和中的值进行填充habitat_family:
import pandas as pd
import numpy as np
species = ['tiger', 'lion', 'mosquito', 'ladybug', 'locust', 'seal', 'seabass', 'shark', 'dolphin']
families = ['mammal','fish','insect']
lookupMap = {'tiger':'mammal', 'lion':'mammal', 'mosquito':'insect', 'ladybug':'insect', 'locust':'insect',
'seal':'mammal', 'seabass':'fish', 'shark':'fish', 'dolphin':'mammal' }
habitat_family = pd.DataFrame({'id': range(1,11),
'mammal': [101,123,523,562,546,213,562,234,987,901],
'fish' : [625,254,929,827,102,295,174,777,123,763],
'insect': [345,928,183,645,113,942,689,539,789,814]
}, index=range(1,11), columns=['id','mammal','fish','insect'])
habitat_species = pd.DataFrame(0.0, index=range(1,11), columns=species)
# My highly inefficient solution:
for id in habitat_family.index: # loop through habitat id's
for spec in species: # loop through species
corresp_family = lookupMap[spec]
habitat_species.loc[id,spec] = habitat_family.loc[id,corresp_family]
Run Code Online (Sandbox Code Playgroud)
上面的嵌套for循环可以完成这项工作。但是实际上,我的数据帧的大小很大,并且使用for循环是不可行的。
有没有可能dataframe.apply()使用类似功能的更有效方法?
编辑:所需的输出habitat_species是:
habitat_species
tiger lion mosquito ladybug locust seal seabass shark dolphin
1 101 101 345 345 345 101 625 625 101
2 123 123 928 928 928 123 254 254 123
3 523 523 183 183 183 523 929 929 523
4 562 562 645 645 645 562 827 827 562
5 546 546 113 113 113 546 102 102 546
6 213 213 942 942 942 213 295 295 213
7 562 562 689 689 689 562 174 174 562
8 234 234 539 539 539 234 777 777 234
9 987 987 789 789 789 987 123 123 987
10 901 901 814 814 814 901 763 763 901
Run Code Online (Sandbox Code Playgroud)
你根本不需要任何循环。一探究竟:
In [12]: habitat_species = habitat_family[Series(species).replace(lookupMap)]
In [13]: habitat_species.columns = species
In [14]: habitat_species
Out[14]:
tiger lion mosquito ladybug locust seal seabass shark dolphin
1 101 101 345 345 345 101 625 625 101
2 123 123 928 928 928 123 254 254 123
3 523 523 183 183 183 523 929 929 523
4 562 562 645 645 645 562 827 827 562
5 546 546 113 113 113 546 102 102 546
6 213 213 942 942 942 213 295 295 213
7 562 562 689 689 689 562 174 174 562
8 234 234 539 539 539 234 777 777 234
9 987 987 789 789 789 987 123 123 987
10 901 901 814 814 814 901 763 763 901
[10 rows x 9 columns]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1106 次 |
| 最近记录: |