usu*_* me 6 python data-partitioning
说我有一个列表L。如何获得K组所有分区上的迭代器?
示例:L = [2,3,5,7,11,13],K = 3
3组的所有可能分区的列表:
[ [ 2 ], [ 3, 5], [ 7,11,13] ]
[ [ 2,3,5 ], [ 7, 11], [ 13] ]
[ [ 3, 11 ], [ 5, 7], [ 2, 13] ]
[ [ 3 ], [ 11 ], [ 5, 7, 2, 13] ]
etc...
Run Code Online (Sandbox Code Playgroud)
===更新===
我正在研究一个似乎可行的解决方案,所以我只复制粘贴它即可
# -*- coding: utf-8 -*-
import itertools
# return ( list1 - list0 )
def l1_sub_l0( l1, l0 ) :
"""Substract two lists"""
#
copy_l1 = list( l1 )
copy_l0 = list( l0 )
#
for xx in l0 :
#
if copy_l1.count( xx ) > 0 :
#
copy_l1.remove( xx )
copy_l0.remove( xx )
#
return [ copy_l1, copy_l0 ]
#
def gen_group_len( n, k ) :
"""Generate all possible group sizes"""
# avoid doubles
stop_list = []
#
for t in itertools.combinations_with_replacement( xrange( 1, n - 1 ), k - 1 ) :
#
last_n = n - sum( t )
# valid group size
if last_n >= 1 :
res = tuple( sorted( t + ( last_n, ) ) )
#
if res not in stop_list :
yield res
stop_list.append( res )
# group_len = (1, 1, 3)
def gen( group_len, my_list ) :
"""Generate all possible partitions of all possible group sizes"""
#
if len( group_len ) == 1 :
yield ( tuple( my_list ), )
#
else :
# need for a stop list if 2 groups of same size
stop_list = []
#
for t in itertools.combinations( my_list, group_len[ 0 ] ) :
#
reduced_list = l1_sub_l0( my_list, t )[ 0 ]
#
for t2 in gen( group_len[ 1: ], reduced_list ) :
#
tmp = set( ( t, t2[ 0 ] ) )
#
if tmp not in stop_list :
yield ( t, ) + t2
# avoid doing same thing twice
if group_len[ 1 ] == group_len[ 0 ] :
stop_list.append( tmp )
#
my_list = [ 3,5,7,11,13 ]
n = len( my_list )
k = 3
#
group_len_list = list( gen_group_len( n, k ) )
print "for %i elements, %i configurations of group sizes" % ( n, len( group_len_list ) )
print group_len_list
#
for group_len in group_len_list :
#
print "group sizes", group_len
#
for x in gen( group_len, my_list ) :
print x
#
print "==="
Run Code Online (Sandbox Code Playgroud)
输出:
for 5 elements, 2 configurations of group sizes
[(1, 1, 3), (1, 2, 2)]
group sizes (1, 1, 3)
((3,), (5,), (7, 11, 13))
((3,), (7,), (5, 11, 13))
((3,), (11,), (5, 7, 13))
((3,), (13,), (5, 7, 11))
((5,), (7,), (3, 11, 13))
((5,), (11,), (3, 7, 13))
((5,), (13,), (3, 7, 11))
((7,), (11,), (3, 5, 13))
((7,), (13,), (3, 5, 11))
((11,), (13,), (3, 5, 7))
===
group sizes (1, 2, 2)
((3,), (5, 7), (11, 13))
((3,), (5, 11), (7, 13))
((3,), (5, 13), (7, 11))
((5,), (3, 7), (11, 13))
((5,), (3, 11), (7, 13))
((5,), (3, 13), (7, 11))
((7,), (3, 5), (11, 13))
((7,), (3, 11), (5, 13))
((7,), (3, 13), (5, 11))
((11,), (3, 5), (7, 13))
((11,), (3, 7), (5, 13))
((11,), (3, 13), (5, 7))
((13,), (3, 5), (7, 11))
((13,), (3, 7), (5, 11))
((13,), (3, 11), (5, 7))
===
Run Code Online (Sandbox Code Playgroud)
这是可行的,尽管它可能非常低效(我将它们全部排序以避免重复计算):
def clusters(l, K):
if l:
prev = None
for t in clusters(l[1:], K):
tup = sorted(t)
if tup != prev:
prev = tup
for i in xrange(K):
yield tup[:i] + [[l[0]] + tup[i],] + tup[i+1:]
else:
yield [[] for _ in xrange(K)]
Run Code Online (Sandbox Code Playgroud)
它还返回空簇,因此您可能希望将其包装起来以便仅获取非空簇:
def neclusters(l, K):
for c in clusters(l, K):
if all(x for x in c): yield c
Run Code Online (Sandbox Code Playgroud)
计数只是为了检查:
def kamongn(n, k):
res = 1
for x in xrange(n-k, n):
res *= x + 1
for x in xrange(k):
res /= x + 1
return res
def Stirling(n, k):
res = 0
for j in xrange(k + 1):
res += (-1)**(k-j) * kamongn(k, j) * j ** n
for x in xrange(k):
res /= x + 1
return res
>>> sum(1 for _ in neclusters([2,3,5,7,11,13], K=3)) == Stirling(len([2,3,5,7,11,13]), k=3)
True
Run Code Online (Sandbox Code Playgroud)
有用 !
输出:
>>> clust = neclusters([2,3,5,7,11,13], K=3)
>>> [clust.next() for _ in xrange(5)]
[[[2, 3, 5, 7], [11], [13]], [[3, 5, 7], [2, 11], [13]], [[3, 5, 7], [11], [2, 13]], [[2, 3, 11], [5, 7], [13]], [[3, 11], [2, 5, 7], [13]]]
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1842 次 |
| 最近记录: |