python中具有最大字符串长度的.join()命令

Question

python中具有最大字符串长度的.join()命令

我想将一个 id 列表加入一个字符串，其中每个 id 由一个“或”分隔。在python中我可以做到这一点

' OR '.join(list_of_ids)

Run Code Online (Sandbox Code Playgroud)

我想知道是否有办法防止这个字符串变得太大（就字节而言）。这对我来说很重要的原因是我在 API 中使用该字符串并且该 API 强加了 4094 字节的最大长度。我的解决方案如下，我只是想知道是否有更好的解决方案？

list_of_query_strings = []
substring = list_of_ids[0]
list_of_ids.pop(0)
while list_of_ids:
    new_addition = ' OR ' + list_of_ids[0]
    if sys.getsizeof(substring + new_addition) < 4094:
        substring += new_addition
    else:
        list_of_query_strings.append(substring)
        substring = list_of_ids[0]
    list_of_ids.pop(0)
list_of_query_strings.append(substring)

Run Code Online (Sandbox Code Playgroud)

Answer 1

Sha*_*ger 5

只是为了好玩，一个过度设计的解决方案（避免 Schlemiel the Painter 重复连接算法，允许您使用str.join高效组合）：

from itertools import count, groupby

class CumulativeLengthGrouper:
    def __init__(self, joiner, maxblocksize):
        self.joinerlen = len(joiner)
        self.maxblocksize = maxblocksize
        self.groupctr = count()
        self.curgrp = next(self.groupctr)
        # Special cases initial case to cancel out treating first element
        # as requiring joiner, without requiring per call special case
        self.accumlen = -self.joinerlen

    def __call__(self, newstr):
        self.accumlen += self.joinerlen + len(newstr)
        # If accumulated length exceeds block limit...
        if self.accumlen > self.maxblocksize:
            # Move to new group
            self.curgrp = next(self.groupctr)
            self.accumlen = len(newstr)
        return self.curgrp

Run Code Online (Sandbox Code Playgroud)

有了这个，您可以itertools.groupby将您的可迭代对象分解为预先确定大小的组，然后join它们不使用重复连接：

 mystrings = [...]

 myblocks = [' OR '.join(grp) for _, grp in 
             groupby(mystrings, key=CumulativeLengthGrouper(' OR ', 4094)]

Run Code Online (Sandbox Code Playgroud)

如果目标是使用指定的编码生成具有给定字节大小的字符串，则可以调整CumulativeLengthGrouper以接受第三个构造函数参数：

class CumulativeLengthGrouper:
    def __init__(self, joiner, maxblocksize, encoding='utf-8'):
        self.encoding = encoding
        self.joinerlen = len(joiner.encode(encoding))
        self.maxblocksize = maxblocksize
        self.groupctr = count()
        self.curgrp = next(self.groupctr)
        # Special cases initial case to cancel out treating first element
        # as requiring joiner, without requiring per call special case
        self.accumlen = -self.joinerlen

    def __call__(self, newstr):
        newbytes = newstr.encode(encoding)
        self.accumlen += self.joinerlen + len(newbytes)
        # If accumulated length exceeds block limit...
        if self.accumlen > self.maxblocksize:
            # Move to new group
            self.curgrp = next(self.groupctr)
            self.accumlen = len(newbytes)
        return self.curgrp

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，1 月前
查看次数：	2686 次
最近记录：	8 年，1 月前