使用Boto清空DynamoDB表

Fra*_*urt 5 python boto amazon-dynamodb

如何最佳地(就财务成本而言)使用boto清空DynamoDB表?(就像我们可以在SQL中使用一条truncate语句一样。)

boto.dynamodb2.table.delete()boto.dynamodb2.layer1.DynamoDBConnection.delete_table()删除整个表格,而boto.dynamodb2.table.delete_item() boto.dynamodb2.table.BatchTable.delete_item()只删除指定的项目。

Eph*_*ris 10

虽然我同意Johnny Wu 的观点,即删除表并重新创建它的效率要高得多,但在某些情况下,例如许多 GSI 或 Tirgger 事件都与一个表相关联,而您不想重新关联它们。下面的脚本应该可以递归扫描表并使用批处理功能删除表中的所有项目。但是,对于大型表,这可能不起作用,因为它需要将表中的所有项目加载到您的计算机中

import boto3
dynamo = boto3.resource('dynamodb')

def truncateTable(tableName):
    table = dynamo.Table(tableName)
    
    #get the table keys
    tableKeyNames = [key.get("AttributeName") for key in table.key_schema]
    
    """
    NOTE: there are reserved attributes for key names, please see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ReservedWords.html
    if a hash or range key is in the reserved word list, you will need to use the ExpressionAttributeNames parameter
    described at https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Table.scan
    """

    #Only retrieve the keys for each item in the table (minimize data transfer)
    ProjectionExpression = ", ".join(tableKeyNames)
    
    response = table.scan(ProjectionExpression=ProjectionExpression)
    data = response.get('Items')
    
    while 'LastEvaluatedKey' in response:
        response = table.scan(
            ProjectionExpression=ProjectionExpression, 
            ExclusiveStartKey=response['LastEvaluatedKey'])
        data.extend(response['Items'])

    with table.batch_writer() as batch:
        for each in data:
            batch.delete_item(
                Key={key: each[key] for key in tableKeyNames}
            )
            
truncateTable("YOUR_TABLE_NAME")
Run Code Online (Sandbox Code Playgroud)


Dro*_*sis 5

正如Johnny Wu提到的那样,删除表并重新创建它比删除单个项目更有效。您应该确保代码在完全删除之前不会尝试创建新表。

def deleteTable(table_name):
    print('deleting table')
    return client.delete_table(TableName=table_name)


def createTable(table_name):
    waiter = client.get_waiter('table_not_exists')
    waiter.wait(TableName=table_name)
    print('creating table')
    table = dynamodb.create_table(
        TableName=table_name,
        KeySchema=[
            {
                'AttributeName': 'YOURATTRIBUTENAME',
                'KeyType': 'HASH'
            }
        ],
        AttributeDefinitions= [
            {
                'AttributeName': 'YOURATTRIBUTENAME',
                'AttributeType': 'S'
            }
        ],
        ProvisionedThroughput={
            'ReadCapacityUnits': 1,
            'WriteCapacityUnits': 1
        },
        StreamSpecification={
            'StreamEnabled': False
        }
    )


def emptyTable(table_name):
    deleteTable(table_name)
    createTable(table_name)
Run Code Online (Sandbox Code Playgroud)


小智 3

删除一个表比逐个删除项目要高效得多。如果您能够控制截断点,那么您可以执行类似于时间序列数据文档中建议的旋转表的操作。