在列上使用合并在Pandas中使用索引

Question

在列上使用合并在Pandas中使用索引

use*_*044 27 python merge python-2.7 pandas

我有两个共享项目编号的独立数据框.在type_df,项目编号是索引.在time_df,项目编号是一列.我想计数的行数中type_df有一个Project Type的2.我正试图这样做pandas.merge().它在使用两列时效果很好,但不是索引.我不确定如何引用索引,如果merge是正确的方法来做到这一点.

import pandas as pd
type_df = pd.DataFrame(data = [['Type 1'], ['Type 2']], 
                       columns=['Project Type'], 
                       index=['Project2', 'Project1'])
time_df = pd.DataFrame(data = [['Project1', 13], ['Project1', 12], 
                               ['Project2', 41]], 
                       columns=['Project', 'Time'])
merged = pd.merge(time_df,type_df, on=[index,'Project'])
print merged[merged['Project Type'] == 'Type 2']['Project Type'].count()

Run Code Online (Sandbox Code Playgroud)

错误:

名称"索引"未定义.

期望的输出:

Run Code Online (Sandbox Code Playgroud)

Answer 1

max*_*moo 33

如果要在合并中使用索引,则必须指定left_index=True或right_index=True,然后使用left_on或right_on.对你来说它应该是这样的:

merged = pd.merge(type_df, time_df, left_index=True, right_on='Project')

Run Code Online (Sandbox Code Playgroud)

Answer 2

jez*_*ael 8

另一个解决方案是使用DataFrame.join：

df3 = type_df.join(time_df, on='Project')

Run Code Online (Sandbox Code Playgroud)

对于版本pandas 0.23.0+ 的on，left_on和right_on参数现在可以指代列名或索引级别名称：

left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1')
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
                    'B': ['B0', 'B1', 'B2', 'B3'],
                     'key2': ['K0', 'K1', 'K0', 'K1']},
                    index=left_index)

right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1')

right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                     'D': ['D0', 'D1', 'D2', 'D3'],
                     'key2': ['K0', 'K0', 'K0', 'K1']},
                      index=right_index)

print (left)    
       A   B key2
key1             
K0    A0  B0   K0
K0    A1  B1   K1
K1    A2  B2   K0
K2    A3  B3   K1

print (right)
       C   D key2
key1             
K0    C0  D0   K0
K1    C1  D1   K0
K2    C2  D2   K0
K2    C3  D3   K1

Run Code Online (Sandbox Code Playgroud)

df = left.merge(right, on=['key1', 'key2'])
print (df)
       A   B key2   C   D
key1                     
K0    A0  B0   K0  C0  D0
K1    A2  B2   K0  C1  D1
K2    A3  B3   K1  C3  D3

Run Code Online (Sandbox Code Playgroud)

我可以传递列的数字索引而不是列名称吗？我有重复的列名，因此该列名失败。 (2认同)

归档时间：	10 年，7 月前
查看次数：	28847 次
最近记录：	7 年，2 月前