Ali*_*Ali 5 python group-by pandas pandas-groupby
我想将所有资格(作为分隔符分隔列表)与作业标题分组.
在以下数据集中,相同类型的作业(.net开发人员)需要不同的资格集,而另一个作业不需要任何资格.
JobID Job Title Qualification ID Qualification Name
34455226 .Net Developer ICT50715 Diploma of Software Development
34455226 .Net Developer ICT40515 Certificate IV in Programming
34466933 .Net Developer ICT50715 Diploma of Software Development
34466111 .Net Developer ICT50655 Diploma of Software Testing
34479964 Snr Finance Systems Analyst
Run Code Online (Sandbox Code Playgroud)
我想要一个关于特定类型工作可能需要的所有独特资格的综合视图,如下所示
Job Title Qualifications
.Net Developer Diploma of Software Development,Certificate IV in Programming,Diploma of Software Testing
Snr Finance Systems Analyst N/A
Run Code Online (Sandbox Code Playgroud)
这就是我到目前为止所做的.
def f(x):
return pd.Series(dict(Qualifications = ",".join(map(str, x["Qualification Name"]))))
df_jobs_qualifications\
.groupby("Job Title")[['Qualification Name']]\
.apply(f)
Run Code Online (Sandbox Code Playgroud)
但它给了我重复的资格名称(见下文 - 软件开发文凭重复),而我想要独特的资格名称
Job Title Qualifications
.Net Developer Diploma of Software Development,Certificate IV in Programming,Diploma of Software Development,Diploma of Software Testing
Snr Finance Systems Analyst N/A
Run Code Online (Sandbox Code Playgroud)
UPDATE
我的问题与这个问题不同,因为即使遵循前面提到的问题中提到的步骤,我也没有获得独特的价值

如果需要唯一的字符串 s:
你可以添加set或者unique如果可能的话添加一些Nones或NaNs dropna:
df1 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: ','.join(set(x.dropna())))
.reset_index())
print (df1)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 Diploma of Software Development,Diploma of Sof...
1
Run Code Online (Sandbox Code Playgroud)
如果订单很重要:
df1 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: ','.join(x.dropna().unique()))
.reset_index())
print (df1)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 Diploma of Software Development,Certificate IV...
1
Run Code Online (Sandbox Code Playgroud)
如果想要NaN没有值:
def f(x):
val = set(x.dropna())
if len(val) > 0:
val = ','.join(val)
else:
val = np.nan
return val
df2 = df.groupby('Job Title')['Qualification Name'].apply(f).reset_index()
print (df2)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 Diploma of Software Development,Diploma of Sof...
1 NaN
Run Code Online (Sandbox Code Playgroud)
如果需要唯一列表:
df2 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: list(set(x)))
.reset_index())
print (df2)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 [Diploma of Software Development, Diploma of S...
1 [None]
df2 = (df.groupby('Job Title')['Qualification Name']
.apply(lambda x: list(x.unique()))
.reset_index())
print (df2)
Job Title \
0 .Net Developer
1 Snr Finance Systems Analyst
Qualification Name
0 [Diploma of Software Development, Certificate ...
1 [None]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2472 次 |
| 最近记录: |