连接 Pandas DataFrame 中的列值,用逗号替换“NaN”值

Apa*_*awa 2 python dataframe python-3.x pandas

我正在尝试连接 Pandas DataFrame 列,用逗号替换 \xe2\x80\x9cNaN\xe2\x80\x9d 值。

\n\n
df = pd.DataFrame({\'col1\' : ["1","2","3","4","5",np.nan],\n                   \'col2\'  : ["p1","p2","p1",np.nan,"p2",np.nan], \n                   \'col3\' : ["A","B","C","D","E","F"]})\n\n\ndf\n\n\n col1    col2  col3\n0    1    p1    A\n1    2    p2    B\n2    3    p1    C\n3    4    NaN   D\n4    5    p2    E\n5    NaN  NaN   F\n\n
Run Code Online (Sandbox Code Playgroud)\n\n

我需要一个输出:-

\n\n
   col1  col2  col3  col4\n0    1    p1    A    1, p1, A\n1    2    p2    B    2, p2, B\n2    3    p1    C    3, p1, C\n3    4    NaN   D    4, , D\n4    5    p2    E    5, p2, E\n5    NaN  NaN   F     , , F\n
Run Code Online (Sandbox Code Playgroud)\n\n

基本上我需要为 中的每一行使用相同数量的逗号col4

\n\n

我在这里先向您的帮助表示感谢

\n

jez*_*ael 5

将缺失值替换为DataFrame.fillna,然后join按行使用:

df['col4'] = df.astype(str).fillna('').apply(', '.join, axis=1)
Run Code Online (Sandbox Code Playgroud)

或者添加,并用于sum连接,最后删除最后,通过Series.str.rstrip

df['col4'] = df.astype(str).fillna('').add(', ').sum(axis=1).str.rstrip(', ')
Run Code Online (Sandbox Code Playgroud)

或者单独处理每一列:

df['col4'] = (df['col1'].astype(str).fillna('') + ', ' + 
              df['col2'].astype(str).fillna('') + ', ' + 
              df['col3'].astype(str))
Run Code Online (Sandbox Code Playgroud)
print (df)
  col1 col2 col3      col4
0    1   p1    A  1, p1, A
1    2   p2    B  2, p2, B
2    3   p1    C  3, p1, C
3    4  NaN    D    4, , D
4    5   p2    E  5, p2, E
5  NaN  NaN    F     , , F
Run Code Online (Sandbox Code Playgroud)