将每隔一行移到新列并分组pandas python

Jes*_*ica 2 python pandas

我有一个示例数据集,该数据集比我的实际数据集小得多,它实际上是一个文本文件,我想将其作为熊猫表读取并对其进行处理:

import pandas as pd
d = {
     'one': ['title1', 'R2G', 'title2', 'K5G', 'title2','R14G', 'title2','R2T','title3', 'K10C', 'title4', 'W7C', 'title4', 'R2G', 'title5', 'K8C']
    }
df = pd.DataFrame(d)
Run Code Online (Sandbox Code Playgroud)

示例数据集如下所示:

df
Out[20]: 

      one
0   title1
1      R2G
2   title2
3      K5G
4   title2
5     R14G
6   title2
7      R2T
8   title3
9     K10C
10  title4
11     W7C
12  title4
13     R2G
14  title5
15     K8C
Run Code Online (Sandbox Code Playgroud)

我添加了第二列,称为“值”:

df.insert(1,'value','')
df
Out[22]: 
      one      value
0   title1
1      R2G
2   title2
3      K5G
4   title2
5     R14G
6   title2
7      R2T
8   title3
9     K10C
10  title4
11     W7C
12  title4
13     R2G
14  title5
15     K8C
Run Code Online (Sandbox Code Playgroud)

我想首先将其他所有行移至“值”列:

      one    value
0   title1    R2G          
1   title2    K5G  
2   title2    R14G 
3   title2    R2T    
4   title3    K10C          
5   title4    W7C            
6   title4    R2G           
7   title5    K8C  
Run Code Online (Sandbox Code Playgroud)

我想,然后按标题名称,因为有可能是相同的标题超过1点的值:

     one     value
0   title1    R2G          
1   title2    K5G, R14G, R2T   
2   title3    K10C          
3   title4    W7C , R2G                        
4   title5    K8C  
Run Code Online (Sandbox Code Playgroud)

EdC*_*ica 11

通过使用iloc和步骤 arg对列进行切片来构造一个新的 df :

In [185]:
new_df = pd.DataFrame({'one':df['one'].iloc[::2].values, 'value':df['one'].iloc[1::2].values})
new_df

Out[185]:
      one value
0  title1   R2G
1  title2   K5G
2  title2  R14G
3  title2   R2T
4  title3  K10C
5  title4   W7C
6  title4   R2G
7  title5   K8C
Run Code Online (Sandbox Code Playgroud)

然后groupby,您可以在 'one' 上并lambda在 'value' 列和仅join值上应用 a :

In [188]:
new_df.groupby('one')['value'].apply(','.join).reset_index()

Out[188]:
      one         value
0  title1           R2G
1  title2  K5G,R14G,R2T
2  title3          K10C
3  title4       W7C,R2G
4  title5           K8C
Run Code Online (Sandbox Code Playgroud)


hil*_*lem 5

另外,您可以通过将值组传递到列表中来重塑和聚合。

import pandas as pd
d = {
     'one': ['title1', 'R2G', 'title2', 'K5G', 'title2','R14G', 'title2','R2T','title3', 'K10C', 'title4', 'W7C', 'title4', 'R2G', 'title5', 'K8C']
    }
df = pd.DataFrame(d)
# because you have simple alternating pattern, you can just reshape
df = pd.DataFrame(df.values.reshape(-1, 2), columns = ['one', 'value'])
# groupby on value and aggregate by joining a string
df = df.groupby('one')['value'].apply(', '.join).reset_index()
Run Code Online (Sandbox Code Playgroud)