jon*_*boy 3 python merge pandas
我有两个代表相似数据的数据框,但我想在更改列名称后合并。有几种方法可以实现这一点,但考虑到我的实际数据帧的大小,我想使用以下方法。我正在为第二个 df 返回 nan 值。
import pandas as pd
df1 = pd.DataFrame({
'time': ['2012-08-02 09:50:20.0','2012-08-02 09:50:32.5','2012-08-02 09:50:34.8'],
'Val': ['1,2,3','1,2,3','1,2,3'],
'Val2': [1,2,3],
'Val3': [1.1,2.1,3.1]
})
df2 = pd.DataFrame({
'time': ['2012-08-02 09:50:20.0','2012-08-02 09:50:32.5','2012-08-02 09:50:34.8'],
'Val': ['1,2,3','1,2,3','1,2,3'],
'Val2': [1,2,3],
'Val3': [1.1,2.1,3.1]
})
df1['time'] = pd.to_datetime(df1['time'])
df2['time'] = pd.to_datetime(df2['time'])
df1.columns.values[1:4] = ['first_' + str(x) for x in df1.columns[1:4]]
df2.columns.values[1:4] = ['second_' + str(x) for x in df2.columns[1:4]]
df3 = pd.merge(df1, df2, on = 'time')
print(df3)
time first_Val first_Val2 first_Val3 second_Val second_Val2 second_Val3
0 2012-08-02 09:50:20.000 1,2,3 1 1.1 NaN NaN NaN
1 2012-08-02 09:50:32.500 1,2,3 2 2.1 NaN NaN NaN
2 2012-08-02 09:50:34.800 1,2,3 3 3.1 NaN NaN NaN
Run Code Online (Sandbox Code Playgroud)
预期输出:
time first_Val first_Val2 first_Val3 second_Val second_Val2 second_Val3
0 2012-08-02 09:50:20.000 1,2,3 1 1.1 1,2,3 1 1.1
1 2012-08-02 09:50:32.500 1,2,3 2 2.1 1,2,3 2 2.1
2 2012-08-02 09:50:34.800 1,2,3 3 3.1 1,2,3 3 3.1
Run Code Online (Sandbox Code Playgroud)
df1.columns.values[1:4] = new values'time' 设置为索引,然后在更改列表理解中的列名后重置。
df.columns。.reset_index()可以删除,'time'作为索引保留,在这种情况下,使用df.join, 而不是pd.merge。.rename用于特定列。df1 = pd.DataFrame({
'time': ['2012-08-02 09:50:20.0','2012-08-02 09:50:32.5','2012-08-02 09:50:34.8'],
'first_Val': ['1,2,3','1,2,3','1,2,3'],
'first_Val2': [1,2,3],
'first_Val3': [1.1,2.1,3.1]
})
df1['time'] = pd.to_datetime(df1['time'])
df1.set_index('time', inplace=True)
df1.columns = ['first_' + str(x) for x in df1.columns]
df1.reset_index(inplace=True)
df2 = pd.DataFrame({
'time': ['2012-08-02 09:50:20.0','2012-08-02 09:50:32.5','2012-08-02 09:50:34.8'],
'Val': ['1,2,3','1,2,3','1,2,3'],
'Val2': [1,2,3],
'Val3': [1.1,2.1,3.1]
})
df2['time'] = pd.to_datetime(df2['time'])
df2.set_index('time', inplace=True)
df2.columns = ['second_' + str(x) for x in df2.columns]
df2.reset_index(inplace=True)
# merge
df3 = pd.merge(df1, df2, on = 'time', how='left')
time first_first_Val first_first_Val2 first_first_Val3 second_Val second_Val2 second_Val3
0 2012-08-02 09:50:20.000 1,2,3 1 1.1 1,2,3 1 1.1
1 2012-08-02 09:50:32.500 1,2,3 2 2.1 1,2,3 2 2.1
2 2012-08-02 09:50:34.800 1,2,3 3 3.1 1,2,3 3 3.1
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
52 次 |
| 最近记录: |