mat*_*112 5 python class object pandas
假设我有下面的 2 个数据框;其中包含学生列表和考试成绩,以及由学生组成的不同学生会议。假设我想向 df 添加一个新列“Sum”,其中包含每次会话的分数总和,以及一个新列,其中包含自最近一年学生参加测试以来经过的年数“Years Elapsed” 。实现这一目标的最佳方法是什么?我可以让学生成为一个班级,并使每个学生成为一个对象,但随后我陷入了如何将对象链接到数据框中他们的名字的问题上。
data1 = {'Student': ['John','Kim','Adam','Sonia'],
'Score': [92,100,76,82],
'Year': [2015,2013,2016,2018]}
df_students = pd.DataFrame(data1, columns=['Student','Score','Year'])
data2 = {'Session': [1,2,3,4],
'Student1': ['Sonia','Kim','John','Adam'],
'Student2': ['Adam','Sonia','Kim','John']}
df = pd.DataFrame(data2, columns=['Session','Student1','Student2'])
Run Code Online (Sandbox Code Playgroud)
期望的结果:
outcome = {'Session': [1,2,3,4],
'Student1': ['Sonia','Kim','John','Adam'],
'Student2': ['Adam','Sonia','Kim','John'],
'Sum': [158, 182, 192, 168],
'Years Elapsed': [4,4,7,6]}
df_outcome = pd.DataFrame(outcome, columns=['Session','Student1','Student2','Sum','Years Elasped'])
Run Code Online (Sandbox Code Playgroud)
我创建了一个名为的课程Student,并让每个学生成为一个对象,但在这之后我就陷入了困境。
df_students.columns = df_students.columns.str.lower()
class Student:
def __init__(self, s, sc, yr):
self.student = s
self.score = sc
self.year = yr
students = [Student(row.student, row.score, row.year) for index, row in df_students.iterrows()]
#check to see if list of objects was created correctly
s1 = students[1]
s1.__dict__
Run Code Online (Sandbox Code Playgroud)
提前致谢!
小智 1
使用应用方法:
import pandas as pd
data1 = {'Student': ['John','Kim','Adam','Sonia'],
'Score': [92,100,76,82],
'Year': [2015,2013,2016,2018]}
df_students = pd.DataFrame(data1, columns=['Student','Score','Year'])
data2 = {'Session': [1,2,3,4],
'Student1': ['Sonia','Kim','John','Adam'],
'Student2': ['Adam','Sonia','Kim','John']}
df = pd.DataFrame(data2, columns=['Session','Student1','Student2'])
# SOLUTION
def sum_scores(student1, student2):
_score_s1 = df_students.loc[(df_students['Student']==student1)]['Score'].values[0]
_score_s2 = df_students.loc[(df_students['Student']==student2)]['Score'].values[0]
return _score_s1 + _score_s2
def years_elapsed(student1, student2):
_year = pd.to_datetime("today").year
_year_s1 = df_students.loc[(df_students['Student']==student1)]['Year'].values[0]
_year_s2 = df_students.loc[(df_students['Student']==student2)]['Year'].values[0]
return _year - max(_year_s1, _year_s2)
df['sum_score'] = df.apply(lambda row: sum_scores(row['Student1'], row['Student2']), axis=1)
df['years_elapsed'] = df.apply(lambda row: years_elapsed(row['Student1'], row['Student2']), axis=1)
df
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
190 次 |
| 最近记录: |