Yos*_*nti 2 python grouping pandas
如果单词的成对映射得分超过 0.5,则将它们分组在一起。如果组中任何其他得分超过 0.5 的关键字,则将该关键字添加到该组中。
例子:
输入:
word1 word2 score
hello hello world 0.75
hello world hi world 0.555
hello hi world 0
good morning hello 0
good morning morning 0.75
morning hello 0
morning hello world 0
morning hi world 0
good morning hello world 0
good morning hi world 0
Run Code Online (Sandbox Code Playgroud)
输出:
word group
hello 1
hello world 1
hi world 1
good morning 2
morning 2
Run Code Online (Sandbox Code Playgroud)
首先按boolean indexing和过滤行Series.gt:
df1 = df[df['score'].gt(0.5)]
print (df1)
word1 word2 score
0 hello hello world 0.750
1 hello world hi world 0.555
4 good morning morning 0.750
Run Code Online (Sandbox Code Playgroud)
networkx与connected_components字典一起使用:
import networkx as nx
# Create the graph from the dataframe
g = nx.Graph()
g.add_edges_from(df1[['word1','word2']].itertuples(index=False))
connected_components = nx.connected_components(g)
# Find the component id of the nodes
node2id = {}
for cid, component in enumerate(connected_components):
for node in component:
node2id[node] = cid + 1
Run Code Online (Sandbox Code Playgroud)
最后一次整形依据DataFrame.stack、删除重复项Series.drop_duplicates以及最后一次使用Series.map新列:
df2 = df1[['word1','word2']].stack().drop_duplicates().reset_index(drop=True).to_frame('word')
df2['group'] = df2['word'].map(node2id)
print (df2)
word group
0 hello 1
1 hello world 1
2 hi world 1
3 good morning 2
4 morning 2
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
84 次 |
| 最近记录: |