Aza*_*kov 6 python iterable python-itertools
我有一个测试,我正在使用嵌套的iterables(通过嵌套的iterable,我的意思是迭代,只有iterables作为元素).
作为测试级联考虑
from itertools import tee
from typing import (Any,
Iterable)
def foo(nested_iterable: Iterable[Iterable[Any]]) -> Any:
...
def test_foo(nested_iterable: Iterable[Iterable[Any]]) -> None:
original, target = tee(nested_iterable) # this doesn't copy iterators elements
result = foo(target)
assert is_contract_satisfied(result, original)
def is_contract_satisfied(result: Any,
original: Iterable[Iterable[Any]]) -> bool:
...
Run Code Online (Sandbox Code Playgroud)
例如,foo可以是简单的身份功能
def foo(nested_iterable: Iterable[Iterable[Any]]) -> Iterable[Iterable[Any]]:
return nested_iterable
Run Code Online (Sandbox Code Playgroud)
和契约只是检查扁平迭代具有相同的元素
from itertools import (chain,
starmap,
zip_longest)
from operator import eq
...
flatten = chain.from_iterable
def is_contract_satisfied(result: Iterable[Iterable[Any]],
original: Iterable[Iterable[Any]]) -> bool:
return all(starmap(eq,
zip_longest(flatten(result), flatten(original),
# we're assuming that ``object()``
# will create some unique object
# not presented in any of arguments
fillvalue=object())))
Run Code Online (Sandbox Code Playgroud)
但是如果某些nested_iterable元素是迭代器,那么它可能会耗尽,因为它tee是浅层副本,而不是深层副本,即用于给定foo和is_contract_satisfied下一个语句
>>> test_foo([iter(range(10))])
Run Code Online (Sandbox Code Playgroud)
导致可预测
Traceback (most recent call last):
...
test_foo([iter(range(10))])
File "...", line 19, in test_foo
assert is_contract_satisfied(result, original)
AssertionError
Run Code Online (Sandbox Code Playgroud)
如何深度复制任意嵌套的iterable?
我知道copy.deepcopy函数,但它不适用于文件对象.
简单的算法是
n按元素复制的副本。这可以像这样实现
from itertools import tee
from operator import itemgetter
from typing import (Any,
Iterable,
Tuple,
TypeVar)
Domain = TypeVar('Domain')
def copy_nested_iterable(nested_iterable: Iterable[Iterable[Domain]],
*,
count: int = 2
) -> Tuple[Iterable[Iterable[Domain]], ...]:
def shallow_copy(iterable: Iterable[Domain]) -> Tuple[Iterable[Domain], ...]:
return tee(iterable, count)
copies = shallow_copy(map(shallow_copy, nested_iterable))
return tuple(map(itemgetter(index), iterables)
for index, iterables in enumerate(copies))
Run Code Online (Sandbox Code Playgroud)
优点:
缺点:
我们可以做得更好。
如果我们查看itertools.tee函数文档,它包含 Python 配方,在functools.singledispatch装饰器的帮助下可以重写为
from collections import (abc,
deque)
from functools import singledispatch
from itertools import repeat
from typing import (Iterable,
Tuple,
TypeVar)
Domain = TypeVar('Domain')
@functools.singledispatch
def copy(object_: Domain,
*,
count: int) -> Iterable[Domain]:
raise TypeError('Unsupported object type: {type}.'
.format(type=type(object_)))
# handle general case
@copy.register(object)
# immutable strings represent a special kind of iterables
# that can be copied by simply repeating
@copy.register(bytes)
@copy.register(str)
# mappings cannot be copied as other iterables
# since they are iterable only by key
@copy.register(abc.Mapping)
def copy_object(object_: Domain,
*,
count: int) -> Iterable[Domain]:
return itertools.repeat(object_, count)
@copy.register(abc.Iterable)
def copy_iterable(object_: Iterable[Domain],
*,
count: int = 2) -> Tuple[Iterable[Domain], ...]:
iterator = iter(object_)
# we are using `itertools.repeat` instead of `range` here
# due to efficiency of the former
# more info at
# /sf/ask/634142141/#9098860
queues = [deque() for _ in repeat(None, count)]
def replica(queue: deque) -> Iterable[Domain]:
while True:
if not queue:
try:
element = next(iterator)
except StopIteration:
return
element_copies = copy(element,
count=count)
for sub_queue, element_copy in zip(queues, element_copies):
sub_queue.append(element_copy)
yield queue.popleft()
return tuple(replica(queue) for queue in queues)
Run Code Online (Sandbox Code Playgroud)
优点:
缺点:
O(1)复杂性的字典查找)。让我们定义嵌套迭代如下
nested_iterable = [range(10 ** index) for index in range(1, 7)]
Run Code Online (Sandbox Code Playgroud)
由于迭代器的创建与底层副本性能无关,因此让我们定义迭代器耗尽的函数(此处描述)
exhaust_iterable = deque(maxlen=0).extend
Run Code Online (Sandbox Code Playgroud)
使用timeit包
import timeit
def naive(): exhaust_iterable(copy_nested_iterable(nested_iterable))
def improved(): exhaust_iterable(copy_iterable(nested_iterable))
print('naive approach:', min(timeit.repeat(naive)))
print('improved approach:', min(timeit.repeat(improved)))
Run Code Online (Sandbox Code Playgroud)
我的笔记本电脑运行 Windows 10 x64,运行 Python 3.5.4
naive approach: 5.1863865
improved approach: 3.5602296000000013
Run Code Online (Sandbox Code Playgroud)
Line # Mem usage Increment Line Contents
================================================
78 17.2 MiB 17.2 MiB @profile
79 def profile_memory(nested_iterable: Iterable[Iterable[Any]]) -> None:
80 68.6 MiB 51.4 MiB result = list(flatten(flatten(copy_nested_iterable(nested_iterable))))
Run Code Online (Sandbox Code Playgroud)
对于“天真的”方法和
Line # Mem usage Increment Line Contents
================================================
78 17.2 MiB 17.2 MiB @profile
79 def profile_memory(nested_iterable: Iterable[Iterable[Any]]) -> None:
80 68.7 MiB 51.4 MiB result = list(flatten(flatten(copy_iterable(nested_iterable))))
Run Code Online (Sandbox Code Playgroud)
为“改进”之一。
注意:我已经运行了不同的脚本,因为立即运行它们并不具有代表性,因为第二条语句将重用以前创建的底层int对象。
正如我们所看到的,这两个函数具有相似的性能,但最后一个函数支持更深层次的嵌套,并且看起来非常可扩展。
我已经添加了“改进的”解决方案来从版本中lz打包0.4.0,可以像这样使用
>>> from lz.replication import replicate
>>> iterable = iter(range(5))
>>> list(map(list, replicate(iterable,
count=3)))
[[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]
Run Code Online (Sandbox Code Playgroud)
hypothesis它是使用框架进行基于属性的测试,因此我们可以确定它按预期工作。