是否有类似于R中的expand.grid()函数的Python函数?提前致谢.
(编辑)以下是此R功能的说明和示例.
Create a Data Frame from All Combinations of Factors
Description:
Create a data frame from all combinations of the supplied vectors
or factors.
> x <- 1:3
> y <- 1:3
> expand.grid(x,y)
Var1 Var2
1 1 1
2 2 1
3 3 1
4 1 2
5 2 2
6 3 2
7 1 3
8 2 3
9 3 3
Run Code Online (Sandbox Code Playgroud)
(EDIT2)下面是rpy包的示例.我想得到相同的输出对象但不使用R:
>>> from rpy import *
>>> a = [1,2,3]
>>> b = [5,7,9]
>>> r.assign("a",a)
[1, 2, 3]
>>> r.assign("b",b)
[5, 7, 9]
>>> r("expand.grid(a,b)")
{'Var1': [1, 2, 3, 1, 2, 3, 1, 2, 3], 'Var2': [5, 5, 5, 7, 7, 7, 9, 9, 9]}
Run Code Online (Sandbox Code Playgroud)
编辑02/09/2012:我真的迷失了Python.Lev Levitsky在他的回答中给出的代码对我不起作用:
>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 2, in expandgrid
NameError: global name 'itertools' is not defined
Run Code Online (Sandbox Code Playgroud)
但是似乎安装了itertools模块(键入时from itertools import *不会返回任何错误消息)
Tho*_*wne 26
只需使用列表推导:
>>> [(x, y) for x in range(5) for y in range(5)]
[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]
Run Code Online (Sandbox Code Playgroud)
如果需要,转换为numpy数组:
>>> import numpy as np
>>> x = np.array([(x, y) for x in range(5) for y in range(5)])
>>> x.shape
(25, 2)
Run Code Online (Sandbox Code Playgroud)
我测试了高达10000 x 10000,python的性能与R中的expand.grid相当.使用元组(x,y)比在理解中使用列表[x,y]快约40%.
要么...使用np.meshgrid大约快3倍,内存密集程度更低.
%timeit np.array(np.meshgrid(range(10000), range(10000))).reshape(2, 100000000).T
1 loops, best of 3: 736 ms per loop
Run Code Online (Sandbox Code Playgroud)
在R:
> system.time(expand.grid(1:10000, 1:10000))
user system elapsed
1.991 0.416 2.424
Run Code Online (Sandbox Code Playgroud)
请记住,R具有基于1的数组,而Python基于0.
Ale*_*der 18
product来自itertools您解决方案的关键.它产生输入的笛卡尔积.
from itertools import product
def expand_grid(dictionary):
return pd.DataFrame([row for row in product(*dictionary.values())],
columns=dictionary.keys())
dictionary = {'color': ['red', 'green', 'blue'],
'vehicle': ['car', 'van', 'truck'],
'cylinders': [6, 8]}
>>> expand_grid(dictionary)
color cylinders vehicle
0 red 6 car
1 red 6 van
2 red 6 truck
3 red 8 car
4 red 8 van
5 red 8 truck
6 green 6 car
7 green 6 van
8 green 6 truck
9 green 8 car
10 green 8 van
11 green 8 truck
12 blue 6 car
13 blue 6 van
14 blue 6 truck
15 blue 8 car
16 blue 8 van
17 blue 8 truck
Run Code Online (Sandbox Code Playgroud)
Lev*_*sky 15
这是一个示例,提供类似于您需要的输出:
import itertools
def expandgrid(*itrs):
product = list(itertools.product(*itrs))
return {'Var{}'.format(i+1):[x[i] for x in product] for i in range(len(itrs))}
>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
{'Var1': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'Var2': [5, 7, 9, 5, 7, 9, 5, 7, 9]}
Run Code Online (Sandbox Code Playgroud)
差异与itertools.product 最右边的元素在每次迭代中前进的事实有关.如果重要,可以通过product巧妙地对列表进行排序来调整函数.
Nat*_*ate 13
我已经想了一段时间,我对目前提出的解决方案并不满意,所以我想出了自己的,这相当简单(但可能更慢).该函数使用numpy.meshgrid制作网格,然后将网格展平为1d数组并将它们放在一起:
def expand_grid(x, y):
xG, yG = np.meshgrid(x, y) # create the actual grid
xG = xG.flatten() # make the grid 1d
yG = yG.flatten() # same
return pd.DataFrame({'x':xG, 'y':yG}) # return a dataframe
Run Code Online (Sandbox Code Playgroud)
例如:
import numpy as np
import pandas as pd
p, q = np.linspace(1, 10, 10), np.linspace(1, 10, 10)
def expand_grid(x, y):
xG, yG = np.meshgrid(x, y) # create the actual grid
xG = xG.flatten() # make the grid 1d
yG = yG.flatten() # same
return pd.DataFrame({'x':xG, 'y':yG})
print expand_grid(p, q).head(n = 20)
Run Code Online (Sandbox Code Playgroud)
我知道这是一个老帖子,但我想我会分享我的简单版本!
Dan*_*ein 10
的熊猫文档定义了一个expand_grid函数:
def expand_grid(data_dict):
"""Create a dataframe from every combination of given values."""
rows = itertools.product(*data_dict.values())
return pd.DataFrame.from_records(rows, columns=data_dict.keys())
Run Code Online (Sandbox Code Playgroud)
要使此代码生效,您需要以下两个导入:
import itertools
import pandas as pd
Run Code Online (Sandbox Code Playgroud)
输出是pandas.DataFramePython中与R最相似的对象data.frame.
小智 8
从上述解决方案,我做到了这一点
import itertools
import pandas as pd
a = [1,2,3]
b = [4,5,6]
ab = list(itertools.product(a,b))
abdf = pd.DataFrame(ab,columns=("a","b"))
Run Code Online (Sandbox Code Playgroud)
以下是输出
a b
0 1 4
1 1 5
2 1 6
3 2 4
4 2 5
5 2 6
6 3 4
7 3 5
8 3 6
Run Code Online (Sandbox Code Playgroud)
Scikit 中的 ParameterGrid 函数与 Expand_grid(来自 R)的功能相同。例子:
from sklearn.model_selection import ParameterGrid
param_grid = {'a': [1,2,3], 'b': [5,7,9]}
expanded_grid = ParameterGrid(param_grid)
Run Code Online (Sandbox Code Playgroud)
您可以访问将其转换为列表的内容:
list(expanded_grid))
Run Code Online (Sandbox Code Playgroud)
输出:
[{'a': 1, 'b': 5},
{'a': 1, 'b': 7},
{'a': 1, 'b': 9},
{'a': 2, 'b': 5},
{'a': 2, 'b': 7},
{'a': 2, 'b': 9},
{'a': 3, 'b': 5},
{'a': 3, 'b': 7},
{'a': 3, 'b': 9}]
Run Code Online (Sandbox Code Playgroud)
通过索引访问元素
list(expanded_grid)[1]
Run Code Online (Sandbox Code Playgroud)
你会得到这样的东西:
{'a': 1, 'b': 7}
Run Code Online (Sandbox Code Playgroud)
只需添加一些用法...您可以使用像上面打印的那样的字典列表来传递给带有 **kwargs 的函数。例子:
def f(a,b): return((a+b, a-b))
list(map(lambda x: f(**x), list(expanded_grid)))
Run Code Online (Sandbox Code Playgroud)
输出:
[(6, -4),
(8, -6),
(10, -8),
(7, -3),
(9, -5),
(11, -7),
(8, -2),
(10, -4),
(12, -6)]
Run Code Online (Sandbox Code Playgroud)
这是另一个返回 pandas.DataFrame 的版本:
import itertools as it
import pandas as pd
def expand_grid(*args, **kwargs):
columns = []
lst = []
if args:
columns += xrange(len(args))
lst += args
if kwargs:
columns += kwargs.iterkeys()
lst += kwargs.itervalues()
return pd.DataFrame(list(it.product(*lst)), columns=columns)
print expand_grid([0,1], [1,2,3])
print expand_grid(a=[0,1], b=[1,2,3])
print expand_grid([0,1], b=[1,2,3])
Run Code Online (Sandbox Code Playgroud)