我正在努力学习熊猫。我找到了几个关于如何构建 Pandas 数据框以及如何添加列的示例,它们运行良好。我想学习根据列的值选择所有行。如果列的值应该小于或大于某个数字,我已经找到了多个关于如何执行选择的示例,这也有效。我的问题是如何进行更一般的选择,我想首先计算列的函数,然后选择函数值大于或小于某个数字的所有行
import names
import numpy as np
import pandas as pd
from datetime import date
import random
def randomBirthday(startyear, endyear):
T1 = date.today().replace(day=1, month=1, year=startyear).toordinal()
T2 = date.today().replace(day=1, month=1, year=endyear).toordinal()
return date.fromordinal(random.randint(T1, T2))
def age(birthday):
today = date.today()
return today.year - birthday.year - ((today.month, today.day) < (birthday.month, birthday.day))
N_PEOPLE = 20
dict_people = { }
dict_people['gender'] = np.array(['male','female'])[np.random.randint(0, 2, N_PEOPLE)]
dict_people['names'] = [names.get_full_name(gender=g) for g in dict_people['gender']]
peopleFrame = pd.DataFrame(dict_people)
# Example 1: Add new columns to the data frame
peopleFrame['birthday'] = [randomBirthday(1920, 2020) for i in range(N_PEOPLE)]
# Example 2: Select all people with a certain age
peopleFrame.loc[age(peopleFrame['birthday']) >= 20]
Run Code Online (Sandbox Code Playgroud)
除了最后一行之外,此代码有效。请建议写这一行的正确方法是什么。我已经考虑添加一个额外的列,其中包含函数 age 的值,然后根据它的值进行选择。那行得通。但我想知道我是否必须这样做。如果我不想存储一个人的年龄,只用它来做选择怎么办
使用Series.apply:
peopleFrame.loc[peopleFrame['birthday'].apply(age) >= 20]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2446 次 |
| 最近记录: |