在Python中使用R expand.grid()函数

Sté*_*ent 37 python r

是否有类似于R中的expand.grid()函数的Python函数?提前致谢.

(编辑)以下是此R功能的说明和示例.

Create a Data Frame from All Combinations of Factors

Description:

     Create a data frame from all combinations of the supplied vectors
     or factors.  

> x <- 1:3
> y <- 1:3
> expand.grid(x,y)
  Var1 Var2
1    1    1
2    2    1
3    3    1
4    1    2
5    2    2
6    3    2
7    1    3
8    2    3
9    3    3
Run Code Online (Sandbox Code Playgroud)

(EDIT2)下面是rpy包的示例.我想得到相同的输出对象但不使用R:

>>> from rpy import *
>>> a = [1,2,3]
>>> b = [5,7,9]
>>> r.assign("a",a)
[1, 2, 3]
>>> r.assign("b",b)
[5, 7, 9]
>>> r("expand.grid(a,b)")
{'Var1': [1, 2, 3, 1, 2, 3, 1, 2, 3], 'Var2': [5, 5, 5, 7, 7, 7, 9, 9, 9]}
Run Code Online (Sandbox Code Playgroud)

编辑02/09/2012:我真的迷失了Python.Lev Levitsky在他的回答中给出的代码对我不起作用:

>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in expandgrid
NameError: global name 'itertools' is not defined
Run Code Online (Sandbox Code Playgroud)

但是似乎安装了itertools模块(键入时from itertools import *不会返回任何错误消息)

Tho*_*wne 26

只需使用列表推导:

>>> [(x, y) for x in range(5) for y in range(5)]

[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]
Run Code Online (Sandbox Code Playgroud)

如果需要,转换为numpy数组:

>>> import numpy as np
>>> x = np.array([(x, y) for x in range(5) for y in range(5)])
>>> x.shape
(25, 2)
Run Code Online (Sandbox Code Playgroud)

我测试了高达10000 x 10000,python的性能与R中的expand.grid相当.使用元组(x,y)比在理解中使用列表[x,y]快约40%.

要么...

使用np.meshgrid大约快3倍,内存密集程度更低.

%timeit np.array(np.meshgrid(range(10000), range(10000))).reshape(2, 100000000).T
1 loops, best of 3: 736 ms per loop
Run Code Online (Sandbox Code Playgroud)

在R:

> system.time(expand.grid(1:10000, 1:10000))
   user  system elapsed 
  1.991   0.416   2.424 
Run Code Online (Sandbox Code Playgroud)

请记住,R具有基于1的数组,而Python基于0.

  • 我使用 R 的 `expand.grid` 来进行更复杂的交互。我喜欢这个答案,但它对于更多组合变得笨拙。有没有办法保持这个要点,但抽象用于任意数量的输入? (3认同)
  • 迄今为止最好的答案之一:Pythonic,快速,并且不需要定义自定义函数! (2认同)

Ale*_*der 18

product来自itertools您解决方案的关键.它产生输入的笛卡尔积.

from itertools import product

def expand_grid(dictionary):
   return pd.DataFrame([row for row in product(*dictionary.values())], 
                       columns=dictionary.keys())

dictionary = {'color': ['red', 'green', 'blue'], 
              'vehicle': ['car', 'van', 'truck'], 
              'cylinders': [6, 8]}

>>> expand_grid(dictionary)
    color  cylinders vehicle
0     red          6     car
1     red          6     van
2     red          6   truck
3     red          8     car
4     red          8     van
5     red          8   truck
6   green          6     car
7   green          6     van
8   green          6   truck
9   green          8     car
10  green          8     van
11  green          8   truck
12   blue          6     car
13   blue          6     van
14   blue          6   truck
15   blue          8     car
16   blue          8     van
17   blue          8   truck
Run Code Online (Sandbox Code Playgroud)


Lev*_*sky 15

这是一个示例,提供类似于您需要的输出:

import itertools
def expandgrid(*itrs):
   product = list(itertools.product(*itrs))
   return {'Var{}'.format(i+1):[x[i] for x in product] for i in range(len(itrs))}

>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
{'Var1': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'Var2': [5, 7, 9, 5, 7, 9, 5, 7, 9]}
Run Code Online (Sandbox Code Playgroud)

差异与itertools.product 最右边的元素在每次迭代中前进的事实有关.如果重要,可以通过product巧妙地对列表进行排序来调整函数.


Nat*_*ate 13

我已经想了一段时间,我对目前提出的解决方案并不满意,所以我想出了自己的,这相当简单(但可能更慢).该函数使用numpy.meshgrid制作网格,然后将网格展平为1d数组并将它们放在一起:

def expand_grid(x, y):
    xG, yG = np.meshgrid(x, y) # create the actual grid
    xG = xG.flatten() # make the grid 1d
    yG = yG.flatten() # same
    return pd.DataFrame({'x':xG, 'y':yG}) # return a dataframe
Run Code Online (Sandbox Code Playgroud)

例如:

import numpy as np
import pandas as pd

p, q = np.linspace(1, 10, 10), np.linspace(1, 10, 10)

def expand_grid(x, y):
    xG, yG = np.meshgrid(x, y) # create the actual grid
    xG = xG.flatten() # make the grid 1d
    yG = yG.flatten() # same
    return pd.DataFrame({'x':xG, 'y':yG})

print expand_grid(p, q).head(n = 20)
Run Code Online (Sandbox Code Playgroud)

我知道这是一个老帖子,但我想我会分享我的简单版本!

  • 对于任意数量的参数:`def expand_grid(* args):mesh = np.meshgrid(* args); 返回pd.DataFrame(m中的m.flatten()表示网格)` (2认同)

Dan*_*ein 10

熊猫文档定义了一个expand_grid函数:

def expand_grid(data_dict):
    """Create a dataframe from every combination of given values."""
    rows = itertools.product(*data_dict.values())
    return pd.DataFrame.from_records(rows, columns=data_dict.keys())
Run Code Online (Sandbox Code Playgroud)

要使此代码生效,您需要以下两个导入:

import itertools
import pandas as pd
Run Code Online (Sandbox Code Playgroud)

输出是pandas.DataFramePython中与R最相似的对象data.frame.


小智 8

从上述解决方案,我做到了这一点

import itertools
import pandas as pd

a = [1,2,3]
b = [4,5,6]
ab = list(itertools.product(a,b))
abdf = pd.DataFrame(ab,columns=("a","b"))
Run Code Online (Sandbox Code Playgroud)

以下是输出

    a   b
0   1   4
1   1   5
2   1   6
3   2   4
4   2   5
5   2   6
6   3   4
7   3   5
8   3   6
Run Code Online (Sandbox Code Playgroud)


Vin*_*pes 6

Scikit 中的 ParameterGrid 函数与 Expand_grid(来自 R)的功能相同。例子:

from sklearn.model_selection import ParameterGrid
param_grid = {'a': [1,2,3], 'b': [5,7,9]}
expanded_grid = ParameterGrid(param_grid)
Run Code Online (Sandbox Code Playgroud)

您可以访问将其转换为列表的内容:

list(expanded_grid))
Run Code Online (Sandbox Code Playgroud)

输出:

[{'a': 1, 'b': 5},
 {'a': 1, 'b': 7},
 {'a': 1, 'b': 9},
 {'a': 2, 'b': 5},
 {'a': 2, 'b': 7},
 {'a': 2, 'b': 9},
 {'a': 3, 'b': 5},
 {'a': 3, 'b': 7},
 {'a': 3, 'b': 9}]
Run Code Online (Sandbox Code Playgroud)

通过索引访问元素

list(expanded_grid)[1]
Run Code Online (Sandbox Code Playgroud)

你会得到这样的东西:

{'a': 1, 'b': 7}
Run Code Online (Sandbox Code Playgroud)

只需添加一些用法...您可以使用像上面打印的那样的字典列表来传递给带有 **kwargs 的函数。例子:

def f(a,b): return((a+b, a-b))
list(map(lambda x: f(**x), list(expanded_grid)))
Run Code Online (Sandbox Code Playgroud)

输出:

[(6, -4),
 (8, -6),
 (10, -8),
 (7, -3),
 (9, -5),
 (11, -7),
 (8, -2),
 (10, -4),
 (12, -6)]
Run Code Online (Sandbox Code Playgroud)


snt*_*nth 5

这是另一个返回 pandas.DataFrame 的版本:

import itertools as it
import pandas as pd

def expand_grid(*args, **kwargs):
    columns = []
    lst = []
    if args:
        columns += xrange(len(args))
        lst += args
    if kwargs:
        columns += kwargs.iterkeys()
        lst += kwargs.itervalues()
    return pd.DataFrame(list(it.product(*lst)), columns=columns)

print expand_grid([0,1], [1,2,3])
print expand_grid(a=[0,1], b=[1,2,3])
print expand_grid([0,1], b=[1,2,3])
Run Code Online (Sandbox Code Playgroud)