如何使用数据框中的值来查找属性

mat*_*112 5 python class object pandas

假设我有下面的 2 个数据框;其中包含学生列表和考试成绩,以及由学生组成的不同学生会议。假设我想向 df 添加一个新列“Sum”,其中包含每次会话的分数总和,以及一个新列,其中包含自最近一年学生参加测试以来经过的年数“Years Elapsed” 。实现这一目标的最佳方法是什么?我可以让学生成为一个班级,并使每个学生成为一个对象,但随后我陷入了如何将对象链接到数据框中他们的名字的问题上。

data1 = {'Student': ['John','Kim','Adam','Sonia'],
         'Score': [92,100,76,82],
         'Year': [2015,2013,2016,2018]}
 
df_students = pd.DataFrame(data1, columns=['Student','Score','Year'])

data2 = {'Session': [1,2,3,4],
         'Student1': ['Sonia','Kim','John','Adam'],
         'Student2': ['Adam','Sonia','Kim','John']}

df = pd.DataFrame(data2, columns=['Session','Student1','Student2'])
Run Code Online (Sandbox Code Playgroud)

期望的结果:

outcome = {'Session': [1,2,3,4],
           'Student1': ['Sonia','Kim','John','Adam'],
           'Student2': ['Adam','Sonia','Kim','John'],
           'Sum': [158, 182, 192, 168],
           'Years Elapsed': [4,4,7,6]}

df_outcome = pd.DataFrame(outcome, columns=['Session','Student1','Student2','Sum','Years Elasped'])
Run Code Online (Sandbox Code Playgroud)

我创建了一个名为的课程Student,并让每个学生成为一个对象,但在这之后我就陷入了困境。

df_students.columns = df_students.columns.str.lower()

class Student:
    def __init__(self, s, sc, yr):
        self.student = s
        self.score = sc
        self.year = yr

students = [Student(row.student, row.score, row.year) for index, row in df_students.iterrows()]

#check to see if list of objects was created correctly
s1 = students[1] 
s1.__dict__
Run Code Online (Sandbox Code Playgroud)

提前致谢!

小智 1

使用应用方法:

import pandas as pd

data1 = {'Student': ['John','Kim','Adam','Sonia'],
         'Score': [92,100,76,82],
         'Year': [2015,2013,2016,2018]}
 
df_students = pd.DataFrame(data1, columns=['Student','Score','Year'])

data2 = {'Session': [1,2,3,4],
         'Student1': ['Sonia','Kim','John','Adam'],
         'Student2': ['Adam','Sonia','Kim','John']}

df = pd.DataFrame(data2, columns=['Session','Student1','Student2'])

# SOLUTION 
def sum_scores(student1, student2):
    _score_s1 = df_students.loc[(df_students['Student']==student1)]['Score'].values[0]
    _score_s2 = df_students.loc[(df_students['Student']==student2)]['Score'].values[0]
    return  _score_s1 + _score_s2

def years_elapsed(student1, student2):
    _year = pd.to_datetime("today").year
    _year_s1 = df_students.loc[(df_students['Student']==student1)]['Year'].values[0]
    _year_s2 = df_students.loc[(df_students['Student']==student2)]['Year'].values[0]
    return _year - max(_year_s1, _year_s2)

df['sum_score'] = df.apply(lambda row: sum_scores(row['Student1'], row['Student2']), axis=1)
df['years_elapsed'] = df.apply(lambda row: years_elapsed(row['Student1'], row['Student2']), axis=1)

df
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述