Python中的表达方式组合生成器

chl*_*dni 5 python generator function-composition

我真的很喜欢Python生成器。特别是,我发现它们只是连接到Rest端点的正确工具-我的客户端代码只需要在连接该端点的生成器上进行迭代。但是,我发现Python生成器的表现力不如我所愿。通常,我需要过滤从端点获取的数据。在当前代码中,我将谓词函数传递给生成器,并且将谓词应用于要处理的数据,并且仅在谓词为True时才产生数据。

我想转向生成器的组成-如data_filter(datasource())。这是一些演示代码,显示了我尝试过的内容。很清楚为什么它不起作用,我试图弄清楚的是达到解决方案的最富有表现力的方式:

# Mock of Rest Endpoint: In actual code, generator is 
# connected to a Rest endpoint which returns dictionary(from JSON).
def mock_datasource ():
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
                 "formula","short-circuit", "generate", "comedy"]
    for d in mock_data:
        yield d

# Mock of a filter: simplification, in reality I am filtering on some
# aspect of the data, like data['type'] == "external" 
def data_filter (d):
    if len(d) < 8:
        yield d

# First Try:
# for w in data_filter(mock_datasource()):
#     print(w)
# >> TypeError: object of type 'generator' has no len()

# Second Try 
# for w in (data_filter(d) for d in mock_datasource()):
#     print(w)
# I don't get words out, 
# rather <generator object data_filter at 0x101106a40>

# Using a predicate to filter works, but is not the expressive 
# composition I am after
for w in (d for d in mock_datasource() if len(d) < 8):
    print(w)
Run Code Online (Sandbox Code Playgroud)

meo*_*eow 0

您可以传递应用于每个项目的过滤器函数:

def mock_datasource(filter_function):
    mock_data = ["sanctuary", "movement", "liberty", "seminar",
             "formula","short-circuit", "generate", "comedy"]

    for d in mock_data:
        yield filter_function(d)

def filter_function(d):
    # filter
    return filtered_data
Run Code Online (Sandbox Code Playgroud)