如何通过“/”拆分字符串并通过数据框中的拆分子字符串对其进行重组?

Eli*_* L. 5 python dataframe pandas

我需要根据字符 '/' 拆分单词并以这种方式改造单词:

这个数据框包含一些孩子和他们的复活节礼物。有些孩子有两件礼物,而有些孩子只有一件。

data = {'Presents':['Pink Doll / Ball', 'Bear/ Ball', 'Barbie', 'Blue Sunglasses/Airplane', 'Orange Kitchen/Car', 'Bear/Doll', 'Purple Game'],
        'Kids':  ['Chris', 'Jane', 'Betty', 'Harry', 'Claire', 'Sofia', 'Alex']
        }

df = pd.DataFrame (data, columns = ['Presents', 'Kids'])

print (df)
Run Code Online (Sandbox Code Playgroud)

这个数据框看起来像这样:

                   Presents    Kids
0          Pink Doll / Ball   Chris
1                Bear/ Ball    Jane
2                    Barbie   Betty
3  Blue Sunglasses/Airplane   Harry
4        Orange Kitchen/Car  Claire
5                 Bear/Doll   Sofia
6               Purple Game    Alex
Run Code Online (Sandbox Code Playgroud)

我试图划定他们的礼物,并以这种方式改造他们,保持他们相关的颜色:

'Pink Doll/Ball'将分为两部分:'Pink Doll', 'Pink Ball'. 除此之外,同一个孩子应该与他们的礼物相关联。

颜色和礼物可以是任何东西,我们只知道结构是:Color Present1/Present2,或Color PresentJust Present。所以最后应该是:

  • 用于 Color Present/Present --> Color Present1 和 Color Present2
  • 对于彩色呈现 ---> 彩色呈现
  • 礼物 ---> 礼物

所以最终的数据框应该是这样的:

           Presents    Kids
0         Pink Doll   Chris
1         Pink Ball   Chris
2              Bear    Jane
3              Ball    Jane
4            Barbie   Betty
5   Blue Sunglasses   Harry
6     Blue Airplane   Harry
7    Orange Kitchen  Claire
8        Orange Car  Claire
9              Bear   Sofia
10             Doll   Sofia
11      Purple Game    Alex
Run Code Online (Sandbox Code Playgroud)

我的第一种方法是将列转换为列表并使用列表。像这样:

def count_total_words(string):
    total = 1
    for i in range(len(string)):
        if (string[i] == ' '):
            total = total + 1
    return total

coloured_presents_to_remove_list = []
index_with_slash_list = []
first_present = ''
second_present= ''
index_with_slash = -1
refactored_second_present = ''
for coloured_present in coloured_presents_list:
    if (coloured_present.find('/') >= 0):
        index_with_slash = coloured_presents_list.index(coloured_present)
        index_with_slash_list.append(index_with_slash)
        first_present, second_present = coloured_present.split('/')
        coloured_presents_to_remove_list.append(coloured_present)
        if count_total_words(first_present) == 2:
            refactored_second_present = first_present.split(' ', 1)[0] + ' ' + second_present
            second_present = refactored_second_present
        coloured_presents_list.append(first_present)
        coloured_presents_list.append(second_present)
        kids_list.insert(coloured_presents_list.index(first_present), kids_list[index_with_slash])
        kids_list.insert(coloured_presents_list.index(second_present), kids_list[index_with_slash])
        
for present in coloured_presents_to_remove_list:
    coloured_presents_list.remove(present)

for index in index_with_slash_list:
    kids_list.pop(index)
Run Code Online (Sandbox Code Playgroud)

但是,我意识到在某些时候,我可能会错误地丢失一些索引,因此我尝试将 Pandas 用于数据帧。

mask = df['Presents'].str.contains('/', na=False, regex=False)
df['First Present'], df['Second Present'] = df.loc[mask, 'Presents'].split('/')
Run Code Online (Sandbox Code Playgroud)

Oli*_*ver 1

试试这个:

s = df['Presents'].str.split('/')
a , b = s.str[0].str.strip() , s.str[-1].str.strip()
c = a.str.count(' ').gt(0) & s.str.len().ge(2)
arr = np.where(c,b.radd(a.str.split().str[0].str.strip()+' '),b)
out = (pd.concat((a,pd.Series(arr,index=s.index,name=s.name)))
       .sort_index().to_frame().join(df[['Kids']]))
pd.DataFrame.drop_duplicates(out)
Run Code Online (Sandbox Code Playgroud)

使用上面的代码得到的结果如下:

         Presents    Kids
0        Pink Doll   Chris
0        Pink Ball   Chris
1             Bear    Jane
1             Ball    Jane
2           Barbie   Betty
2           Barbie   Betty
3  Blue Sunglasses   Harry
3    Blue Airplane   Harry
4   Orange Kitchen  Claire
4       Orange Car  Claire
5             Bear   Sofia
5             Doll   Sofia
6      Purple Game    Alex
6      Purple Game    Alex
Run Code Online (Sandbox Code Playgroud)

快乐编码!