如何解决巨大的期望“MetricResolutionError:无法编译 Column 对象,直到分配其“名称”。” 错误?

Şev*_*man 6 python great-expectations

我正在尝试用远大的期望。
我想使用的功能是expect_compound_columns_to_be_unique. 这是代码(主代码 - 模板):

import datetime

import pandas as pd

import great_expectations as ge
import great_expectations.jupyter_ux
from great_expectations.core.batch import BatchRequest
from great_expectations.checkpoint import SimpleCheckpoint
from great_expectations.exceptions import DataContextError

context = ge.data_context.DataContext()

# Note that if you modify this batch request, you may save the new version as a .json file
#  to pass in later via the --batch-request option
batch_request = {'datasource_name': 'impala_okh', 'data_connector_name': 'default_inferred_data_connector_name', 'data_asset_name': 'okh.okh_forecast_prod', 'limit': 1000}


# Feel free to change the name of your suite here. Renaming this will not remove the other one.
expectation_suite_name = "okh_forecast_prod"
try:
    suite = context.get_expectation_suite(expectation_suite_name=expectation_suite_name)
    print(f'Loaded ExpectationSuite "{suite.expectation_suite_name}" containing {len(suite.expectations)} expectations.')
except DataContextError:
    suite = context.create_expectation_suite(expectation_suite_name=expectation_suite_name)
    print(f'Created ExpectationSuite "{suite.expectation_suite_name}".')


validator = context.get_validator(
    batch_request=BatchRequest(**batch_request),
    expectation_suite_name=expectation_suite_name
)
column_names = [f'"{column_name}"' for column_name in validator.columns()]
print(f"Columns: {', '.join(column_names)}.")
validator.head(n_rows=5, fetch_all=False)
Run Code Online (Sandbox Code Playgroud)

使用此代码调用

validator.expect_compound_columns_to_be_unique(['column1', 'column2'])
Run Code Online (Sandbox Code Playgroud)

产生以下错误:

MetricResolutionError:在分配“名称”之前无法编译 Column 对象。

我怎么解决这个问题?

小智 0

通过编写:您正在检查由和 的validator.expect_compound_columns_to_be_unique(['column1', 'column2'])元素组成的元组是否唯一。'column1''column2'

从您收到的错误来看,它似乎'column1''column2'没有在您的数据中定义。您应该尝试验证数据中列的实际名称:由于您通过提供的数据仅batch_request包含列'datasource_name''data_connector_name''data_asset_name''limit'那么您应该验证这些列的子集。

例如: validator.expect_compound_columns_to_be_unique(['datasource_name', 'data_connector_name'])