相关疑难解决方法(0)

大熊猫中的笛卡儿产品

我有两个pandas数据帧:

from pandas import DataFrame
df1 = DataFrame({'col1':[1,2],'col2':[3,4]})
df2 = DataFrame({'col3':[5,6]})

Run Code Online (Sandbox Code Playgroud)

获得笛卡尔积的最佳做法是什么(当然不像我这样明确地写出来)？

#df1, df2 cartesian product
df_cartesian = DataFrame({'col1':[1,2,1,2],'col2':[3,4,3,4],'col3':[5,5,6,6]})

Run Code Online (Sandbox Code Playgroud)

python pandas

Ido*_*dok

2012 11-07

84
推荐指数

7
解决办法

7万
查看次数

建立数以百万计的多对多关系

我在python sqlite3数据库中有以下信息,该数据库有大约400万条记录.

Term         No of articles      Article Ids
Obama           300            [411,523,534, …. 846]
Gandhi         3900            [23,32,33…..4578]
Mandela        3900            [21,14,56,145 …4536]
George Bush     450            [230,310 … 700]
Tony Blair      350            [225,320 … 800]
Justin Bieber   25             [401 , 420, 690 …. 904]
Lionel Messi    150            [23, 78, …… 570]

Run Code Online (Sandbox Code Playgroud)

'Article Ids'是一个包含id列表的blob(由API返回)

我的任务是从Id列表中为每个术语查找common-id并将它们保存在'relationships.db'中

我如何建立关系,我发现哪些文章一起谈论甘地和曼德拉(交叉文章ID)？

relationships.db应该是这样的;

Term 1              Term 2          No of Common Article Ids    Common Article IDS
Obama               Gandhi                17                    [34,123,25 ...]
Obama               Mandela               43                    [145,111,234,456 ....]
Obama               George Bush           46
Obama …

Run Code Online (Sandbox Code Playgroud)

python sqlite data-structures

ric*_*hie

2014 07-24

5
推荐指数

1
解决办法

431
查看次数