带分页的 DynamoDB Python 查询(不扫描)

dan*_*v91 1 amazon-web-services amazon-dynamodb dynamodb-queries

我正在使用以下代码通过 DynamoDB 查询进行查询和分页:

class DecimalEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, decimal.Decimal):
            return str(o)
        return super(DecimalEncoder, self).default(o)


def run(date: int, start_epoch: int, end_epoch: int):
    dynamodb = boto3.resource('dynamodb',
                              region_name='REGION',
                              config=Config(proxies={'https': 'PROXYIP'}))

    table = dynamodb.Table('XYZ')

    response = table.query(
        # ProjectionExpression="#yr, title, info.genres, info.actors[0]", #THIS IS A SELECT STATEMENT
        # ExpressionAttributeNames={"#yr": "year"},  #SELECT STATEMENT RENAME
        KeyConditionExpression=Key('date').eq(date) & Key('uid').between(start_epoch, end_epoch)
    )

    for i in response[u'Items']:
        print(json.dumps(i, cls=DecimalEncoder))

    while 'LastEvaluatedKey' in response:
        response = table.scan( ##IS THIS INEFFICIENT CODE?
            # ProjectionExpression=pe,
            # FilterExpression=fe,
            # ExpressionAttributeNames=ean,
            ExclusiveStartKey=response['LastEvaluatedKey']
        )

        for i in response['Items']:
            print(json.dumps(i, cls=DecimalEncoder))
Run Code Online (Sandbox Code Playgroud)

尽管此代码有效,但速度非常慢,我担心 ' response = table.scan' 是其结果。我的印象是查询比扫描快得多(因为扫描需要表的整个迭代)。此代码是否导致数据库表的完整迭代?

这可能是一个单独的问题,但是有没有更有效的方法(带有代码示例)来做到这一点?我尝试使用 Boto3 的分页,但我也无法使用查询。

dan*_*v91 5

Nadav Har'El 提供的答案是解决此问题的关键。我通过执行初始 DynamoDB 查询错误地使用了 DynamoDB 分页代码示例,但随后使用 scan 进行分页!

正确的方法是最初使用查询 AND 进行分页:

class DecimalEncoder(json.JSONEncoder):
        def default(self, o):
            if isinstance(o, decimal.Decimal):
                return str(o)
            return super(DecimalEncoder, self).default(o)


    def run(date: int, start_epoch: int, end_epoch: int):
        dynamodb = boto3.resource('dynamodb',
                                  region_name='REGION',
                                  config=Config(proxies={'https': 'PROXYIP'}))

        table = dynamodb.Table('XYZ')

        response = table.query(
            KeyConditionExpression=Key('date').eq(date) & Key('uid').between(start_epoch, end_epoch)
        )

        for i in response[u'Items']:
            print(json.dumps(i, cls=DecimalEncoder))

        while 'LastEvaluatedKey' in response:
            response = table.query(
                KeyConditionExpression=Key('date').eq(date) & Key('uid').between(start_epoch, end_epoch),
                ExclusiveStartKey=response['LastEvaluatedKey']
            )

            for i in response['Items']:
                print(json.dumps(i, cls=DecimalEncoder))
Run Code Online (Sandbox Code Playgroud)

我仍然将 Nadav Har'El 的回答标记为正确,因为他的回答导致了这个代码示例。


Nad*_*'El 3

Unfortunately, yes, a "Scan" operation reads the entire table. You didn't say what is your table's partition key, but if it is a date, then what you are really doing here is to read a single partition, and this indeed, what a "Query" operation does much more efficiently, because it can jump directly to the required partition instead of scanning the entire table looking for it.

即使使用查询,您仍然需要像以前一样进行分页,因为分区可能仍然有很多项目。但至少你不会扫描整个表。

顺便说一下,扫描整个表会花费你大量的读操作。您可以询问 AWS 您的读取次数,这可以帮助您发现读取过多的情况 - 除了您注意到的明显缓慢之外。