对数据帧进行排序和连接

The*_*Guy 7 python sorting concatenation dataframe pandas

我有以下两个数据框:

>>> df1
  c1  c2  v1  v2
0  A NaN   9   2
1  B NaN   2   5
2  C NaN   3   5
3  D NaN   4   2

>>> df2
   c1 c2  v1  v2
0   A  P   4   1
1   A  T   3   1
2   A  Y   2   0
3   B  P   0   1
4   B  T   2   2
5   B  Y   0   2
6   C  P   1   2
7   C  T   1   2
8   C  Y   1   1
9   D  P   1   1
10  D  T   2   0
11  D  Y   1   1
Run Code Online (Sandbox Code Playgroud)

我需要连接数据帧并对它们进行排序,反之亦然。第一个数据帧需要按列进行排序v1,然后需要根据对 c1第一个数据帧进行排序后列中的值的顺序以及v2第二个数据帧中的列对第二个数据帧进行排序。

工作版本是这样的:对第一个数据帧进行排序v1,然后迭代行,过滤第二个数据帧的列值c2,并对过滤后的第二个数据帧进行排序v2,最后连接所有帧。

result = []
for i,row in df1.sort_values('v1').iterrows():
    result.append(row.to_frame().T)
    result.append(df2[df2['c1'].eq(row['c1'])].sort_values('v2'))
Run Code Online (Sandbox Code Playgroud)

排序后得到的数据框:

>>> pd.concat(result, ignore_index=True)
   c1   c2 v1 v2
0   B  NaN  2  5
1   B    P  0  1
2   B    T  2  2
3   B    Y  0  2
4   C  NaN  3  5
5   C    Y  1  1
6   C    P  1  2
7   C    T  1  2
8   D  NaN  4  2
9   D    T  2  0
10  D    P  1  1
11  D    Y  1  1
12  A  NaN  9  2
13  A    Y  2  0
14  A    P  4  1
15  A    T  3  1
Run Code Online (Sandbox Code Playgroud)

上述方法的问题在于其迭代,并且当数据帧的数量增加和/或这些数据帧中的行数增加时效率不高。真实的用例场景有 2 到 6 个数据帧,其中行数从几千到几十万不等。

更新:

首先对数据帧进行排序然后连接它们,或者先连接数据帧然后进行排序都可以,这就是为什么我只包含两个数据帧而不是仅仅连接它们并呈现单个数据帧。

编辑:

以下是来自实际用例场景的 4 个数据帧:

from  math import nan
import pandas as pd

df4 = pd.DataFrame({'c1': ['BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT'], 'c2': ['D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'D1', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'Sc', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w1', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2', 'w2'], 'c3': ['BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH', 'BAF', 'BAF', 'BAF', 'BAF', 'BAF', 'WH', 'WH', 'WH', 'WH', 'WH'], 'c4': ['001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss', '001', '002', '003', '004', 'mss'], 'v1': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 2, 0, 2, 4, 6, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 2, 0, 2, 4, 6, 4, 0, 2, 2, 0, 2, 0, 2, 4, 6, 4, 0, 2, 2, 0, 1, 0, 2, 3, 6, 2, 0, 2, 2, 0, 1, 0, 1, 3, 5, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 2, 0, 2, 4, 6, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 2, 0, 2, 4, 6, 4, 0, 2, 2, 0, 2, 0, 2, 4, 6, 4, 0, 2, 2, 0, 1, 0, 2, 3, 6, 2, 0, 2, 2, 0, 1, 0, 1, 3, 5, 1, 0, 2, 2, 0, 2, 0, 2, 4, 6, 4, 0, 2, 2, 0, 2, 0, 2, 4, 6, 4, 0, 2, 2, 0, 1, 0, 2, 3, 6, 2, 0, 2, 2, 0, 1, 0, 1, 3, 5, 1, 0], 'v2': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 2, 4, 6, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 2, 4, 6, 5, 0, 0, 1, 0, 1, 0, 2, 4, 6, 5, 0, 0, 0, 0, 1, 0, 2, 3, 5, 4, 0, 0, 0, 0, 1, 0, 1, 3, 5, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 2, 4, 6, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 2, 4, 6, 5, 0, 0, 1, 0, 1, 0, 2, 4, 6, 5, 0, 0, 0, 0, 1, 0, 2, 3, 5, 4, 0, 0, 0, 0, 1, 0, 1, 3, 5, 3, 0, 0, 1, 0, 1, 0, 2, 4, 6, 5, 0, 0, 1, 0, 1, 0, 2, 4, 6, 5, 0, 0, 0, 0, 1, 0, 2, 3, 5, 4, 0, 0, 0, 0, 1, 0, 1, 3, 5, 3, 0], 'v3': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 1, 0, 0, 1, 5, 9, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 1, 0, 0, 1, 5, 9, 7, 0, 1, 2, 1, 0, 0, 1, 5, 9, 7, 0, 1, 2, 1, 0, 0, 0, 4, 6, 4, 0, 1, 2, 1, 0, 0, 0, 2, 6, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 1, 0, 0, 1, 5, 9, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 1, 0, 0, 1, 5, 9, 7, 0, 1, 2, 1, 0, 0, 1, 5, 9, 7, 0, 1, 2, 1, 0, 0, 0, 4, 6, 4, 0, 1, 2, 1, 0, 0, 0, 2, 6, 3, 0, 1, 2, 1, 0, 0, 1, 5, 9, 7, 0, 1, 2, 1, 0, 0, 1, 5, 9, 7, 0, 1, 2, 1, 0, 0, 0, 4, 6, 4, 0, 1, 2, 1, 0, 0, 0, 2, 6, 3, 0], 'v4': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 5, 1, 3, 0, 5, 13, 21, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 5, 1, 3, 0, 5, 13, 21, 16, 0, 3, 5, 1, 3, 0, 5, 13, 21, 16, 0, 3, 4, 1, 2, 0, 4, 10, 17, 10, 0, 3, 4, 1, 2, 0, 2, 8, 16, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 5, 1, 3, 0, 5, 13, 21, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 5, 1, 3, 0, 5, 13, 21, 16, 0, 3, 5, 1, 3, 0, 5, 13, 21, 16, 0, 3, 4, 1, 2, 0, 4, 10, 17, 10, 0, 3, 4, 1, 2, 0, 2, 8, 16, 7, 0, 3, 5, 1, 3, 0, 5, 13, 21, 16, 0, 3, 5, 1, 3, 0, 5, 13, 21, 16, 0, 3, 4, 1, 2, 0, 4, 10, 17, 10, 0, 3, 4, 1, 2, 0, 2, 8, 16, 7, 0]})
df3 = pd.DataFrame({'c1': ['BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT'], 'c2': ['D1', 'D1', 'D1', 'Sc', 'Sc', 'Sc', 'w1', 'w1', 'w1', 'w2', 'w2', 'w2', 'D1', 'D1', 'D1', 'Sc', 'Sc', 'Sc', 'w1', 'w1', 'w1', 'w2', 'w2', 'w2', 'D1', 'D1', 'D1', 'Sc', 'Sc', 'Sc', 'w1', 'w1', 'w1', 'w2', 'w2', 'w2', 'D1', 'D1', 'D1', 'Sc', 'Sc', 'Sc', 'w1', 'w1', 'w1', 'w2', 'w2', 'w2', 'D1', 'D1', 'D1', 'Sc', 'Sc', 'Sc', 'w1', 'w1', 'w1', 'w2', 'w2', 'w2'], 'c3': ['BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss', 'BAF', 'WH', 'mss'], 'c4': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], 'v1': [0, 0, 0, 6, 16, 0, 0, 0, 0, 0, 0, 0, 6, 16, 0, 6, 16, 0, 5, 13, 0, 5, 10, 0, 0, 0, 0, 6, 16, 0, 0, 0, 0, 0, 0, 0, 6, 16, 0, 6, 16, 0, 5, 13, 0, 5, 10, 0, 6, 16, 0, 6, 16, 0, 5, 13, 0, 5, 10, 0], 'v2': [0, 0, 0, 2, 17, 0, 0, 0, 0, 0, 0, 0, 2, 17, 0, 2, 17, 0, 1, 14, 0, 1, 12, 0, 0, 0, 0, 2, 17, 0, 0, 0, 0, 0, 0, 0, 2, 17, 0, 2, 17, 0, 1, 14, 0, 1, 12, 0, 2, 17, 0, 2, 17, 0, 1, 14, 0, 1, 12, 0], 'v3': [0, 0, 0, 4, 22, 0, 0, 0, 0, 0, 0, 0, 4, 22, 0, 4, 22, 0, 4, 14, 0, 4, 11, 0, 0, 0, 0, 4, 22, 0, 0, 0, 0, 0, 0, 0, 4, 22, 0, 4, 22, 0, 4, 14, 0, 4, 11, 0, 4, 22, 0, 4, 22, 0, 4, 14, 0, 4, 11, 0], 'v4': [0, 0, 0, 12, 55, 0, 0, 0, 0, 0, 0, 0, 12, 55, 0, 12, 55, 0, 10, 41, 0, 10, 33, 0, 0, 0, 0, 12, 55, 0, 0, 0, 0, 0, 0, 0, 12, 55, 0, 12, 55, 0, 10, 41, 0, 10, 33, 0, 12, 55, 0, 12, 55, 0, 10, 41, 0, 10, 33, 0]})
df2 = pd.DataFrame({'c1': ['BMI', 'BMI', 'BMI', 'BMI', 'BMI', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'DIABP', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'HEIGHT', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'SYSBP', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT', 'WEIGHT'], 'c2': ['D1', 'Sc', 'w1', 'w2', 'mss', 'D1', 'Sc', 'w1', 'w2', 'mss', 'D1', 'Sc', 'w1', 'w2', 'mss', 'D1', 'Sc', 'w1', 'w2', 'mss', 'D1', 'Sc', 'w1', 'w2', 'mss'], 'c3': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], 'c4': [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], 'v1': [0, 22, 0, 0, 0, 22, 22, 18, 15, 0, 0, 22, 0, 0, 0, 22, 22, 18, 15, 0, 22, 22, 18, 15, 0], 'v2': [0, 19, 0, 0, 0, 19, 19, 15, 13, 0, 0, 19, 0, 0, 0, 19, 19, 15, 13, 0, 19, 19, 15, 13, 0], 'v3': [0, 26, 0, 0, 0, 26, 26, 18, 15, 0, 0, 26, 0, 0, 0, 26, 26, 18, 15, 0, 26, 26, 18, 15, 0], 'v4': [0, 67, 0, 0, 0, 67, 67, 51, 43, 0, 0, 67, 0, 0, 0, 67, 67, 51, 43, 0, 67, 67, 51, 43, 0]})
df1 = pd.DataFrame({'c1': ['BMI', 'DIABP', 'HEIGHT', 'SYSBP', 'WEIGHT', 'mss'], 'c2': [nan, nan, nan, nan, nan, nan], 'c3': [nan, nan, nan, nan, nan, nan], 'c4': [nan, nan, nan, nan, nan, nan], 'v1': [22, 22, 22, 22, 22, 0], 'v2': [19, 19, 19, 19, 19, 0], 'v3': [26, 26, 26, 26, 26, 0], 'v4': [67, 67, 67, 67, 67, 0]})

# Comment for easy code selection
Run Code Online (Sandbox Code Playgroud)

即使对于上述四个数据帧,排序和合并标准仍然相同

  • 在 v1 上对 df1 进行排序
  • 在 v2 上对 df2 中的 c2 进行排序,保持 df1 中 c1 的顺序
  • 在 v3 上对 df3 中的 c3 进行排序,保持 df1 中的 c1 和 df2 中的 c2 的顺序
  • 在 v4 上对 df3 中的 c4 进行排序,保持 df1 中的 c1、df2 中的 c2 和 df3 中的 c3 的顺序

在这种情况下,当要排序和合并的数据帧数量增加时,我上面使用的解决方案变得非常低效。

Cor*_*ien 4

另一种不使用排序组的解决方案groupby

import itertools

out = pd.concat([df1.sort_values('v1'),
                 df2.sort_values('v2')],
                 ignore_index=True)
Run Code Online (Sandbox Code Playgroud)
# Original answer
# >>> out.reindex(out.groupby('c1', sort=False)
#         .apply(lambda x: x.index)
#         .explode())

# Faster alternative
>>> out.loc[itertools.chain.from_iterable(out.groupby('c1', sort=False)
                                             .groups.values())]
Run Code Online (Sandbox Code Playgroud)
>>> out
   c1   c2  v1  v2
0   B  NaN   2   5
8   B    P   0   1
12  B    T   2   2
13  B    Y   0   2
1   C  NaN   3   5
9   C    Y   1   1
14  C    P   1   2
15  C    T   1   2
2   D  NaN   4   2
5   D    T   2   0
10  D    P   1   1
11  D    Y   1   1
3   A  NaN   9   2
4   A    Y   2   0
6   A    P   4   1
7   A    T   3   1
Run Code Online (Sandbox Code Playgroud)