Eli*_* L. 5 python dataframe pandas
我需要根据字符 '/' 拆分单词并以这种方式改造单词:
这个数据框包含一些孩子和他们的复活节礼物。有些孩子有两件礼物,而有些孩子只有一件。
data = {'Presents':['Pink Doll / Ball', 'Bear/ Ball', 'Barbie', 'Blue Sunglasses/Airplane', 'Orange Kitchen/Car', 'Bear/Doll', 'Purple Game'],
'Kids': ['Chris', 'Jane', 'Betty', 'Harry', 'Claire', 'Sofia', 'Alex']
}
df = pd.DataFrame (data, columns = ['Presents', 'Kids'])
print (df)
Run Code Online (Sandbox Code Playgroud)
这个数据框看起来像这样:
Presents Kids
0 Pink Doll / Ball Chris
1 Bear/ Ball Jane
2 Barbie Betty
3 Blue Sunglasses/Airplane Harry
4 Orange Kitchen/Car Claire
5 Bear/Doll Sofia
6 Purple Game Alex
Run Code Online (Sandbox Code Playgroud)
我试图划定他们的礼物,并以这种方式改造他们,保持他们相关的颜色:
'Pink Doll/Ball'
将分为两部分:'Pink Doll'
, 'Pink Ball'
. 除此之外,同一个孩子应该与他们的礼物相关联。
颜色和礼物可以是任何东西,我们只知道结构是:Color Present1/Present2,或Color Present或Just Present。所以最后应该是:
所以最终的数据框应该是这样的:
Presents Kids
0 Pink Doll Chris
1 Pink Ball Chris
2 Bear Jane
3 Ball Jane
4 Barbie Betty
5 Blue Sunglasses Harry
6 Blue Airplane Harry
7 Orange Kitchen Claire
8 Orange Car Claire
9 Bear Sofia
10 Doll Sofia
11 Purple Game Alex
Run Code Online (Sandbox Code Playgroud)
我的第一种方法是将列转换为列表并使用列表。像这样:
def count_total_words(string):
total = 1
for i in range(len(string)):
if (string[i] == ' '):
total = total + 1
return total
coloured_presents_to_remove_list = []
index_with_slash_list = []
first_present = ''
second_present= ''
index_with_slash = -1
refactored_second_present = ''
for coloured_present in coloured_presents_list:
if (coloured_present.find('/') >= 0):
index_with_slash = coloured_presents_list.index(coloured_present)
index_with_slash_list.append(index_with_slash)
first_present, second_present = coloured_present.split('/')
coloured_presents_to_remove_list.append(coloured_present)
if count_total_words(first_present) == 2:
refactored_second_present = first_present.split(' ', 1)[0] + ' ' + second_present
second_present = refactored_second_present
coloured_presents_list.append(first_present)
coloured_presents_list.append(second_present)
kids_list.insert(coloured_presents_list.index(first_present), kids_list[index_with_slash])
kids_list.insert(coloured_presents_list.index(second_present), kids_list[index_with_slash])
for present in coloured_presents_to_remove_list:
coloured_presents_list.remove(present)
for index in index_with_slash_list:
kids_list.pop(index)
Run Code Online (Sandbox Code Playgroud)
但是,我意识到在某些时候,我可能会错误地丢失一些索引,因此我尝试将 Pandas 用于数据帧。
mask = df['Presents'].str.contains('/', na=False, regex=False)
df['First Present'], df['Second Present'] = df.loc[mask, 'Presents'].split('/')
Run Code Online (Sandbox Code Playgroud)
试试这个:
s = df['Presents'].str.split('/')
a , b = s.str[0].str.strip() , s.str[-1].str.strip()
c = a.str.count(' ').gt(0) & s.str.len().ge(2)
arr = np.where(c,b.radd(a.str.split().str[0].str.strip()+' '),b)
out = (pd.concat((a,pd.Series(arr,index=s.index,name=s.name)))
.sort_index().to_frame().join(df[['Kids']]))
pd.DataFrame.drop_duplicates(out)
Run Code Online (Sandbox Code Playgroud)
使用上面的代码得到的结果如下:
Presents Kids
0 Pink Doll Chris
0 Pink Ball Chris
1 Bear Jane
1 Ball Jane
2 Barbie Betty
2 Barbie Betty
3 Blue Sunglasses Harry
3 Blue Airplane Harry
4 Orange Kitchen Claire
4 Orange Car Claire
5 Bear Sofia
5 Doll Sofia
6 Purple Game Alex
6 Purple Game Alex
Run Code Online (Sandbox Code Playgroud)
快乐编码!
归档时间: |
|
查看次数: |
141 次 |
最近记录: |