Python/线性时间字符串连接中str.join(iterable)方法是如何实现的

Question

Python/线性时间字符串连接中str.join(iterable)方法是如何实现的

Ngo*_*ong 6 python string algorithm string-concatenation

我正在尝试用str.joinPython实现我自己的方法，例如： ''.join(['aa','bbb','cccc'])returns 'aabbbcccc'。我知道使用 join 方法进行字符串连接会导致线性（结果的字符数）复杂性，并且我想知道如何做到这一点，因为在'+'for 循环中使用运算符会导致二次复杂性，例如：

res=''
for word in ['aa','bbb','cccc']:
  res = res +  word

Run Code Online (Sandbox Code Playgroud)

由于字符串是不可变的，因此每次迭代都会复制一个新字符串，从而导致运行时间成二次方。但是，我想知道如何在线性时间内完成它或找到''.join确切的工作原理。

我在任何地方都找不到线性时间算法，也找不到 str.join(iterable) 的实现。任何帮助深表感谢。

Answer 1

Mis*_*agi 5

str按实际情况进行连接str是一种转移注意力的行为，而不是 Python 本身所做的事情：Python 操作的是可变的bytes，而不是 the ，这也消除了了解字符串内部结构的str需要。具体来说，将其参数转换为字节，然后预分配并改变其结果。str.join

这直接对应于：

str用于对参数进行编码/解码的包装器bytes

len对元素和分隔符求和

分配一个可变的bytesarray来构造结果

将每个元素/分隔符直接复制到结果中

# helper to convert to/from joinable bytes def str_join(sep: "str", elements: "list[str]") -> "str": joined_bytes = bytes_join( sep.encode(), [elem.encode() for elem in elements], ) return joined_bytes.decode() # actual joining at bytes level def bytes_join(sep: "bytes", elements: "list[bytes]") -> "bytes": # create a mutable buffer that is long enough to hold the result total_length = sum(len(elem) for elem in elements) total_length += (len(elements) - 1) * len(sep) result = bytearray(total_length) # copy all characters from the inputs to the result insert_idx = 0 for elem in elements: result[insert_idx:insert_idx+len(elem)] = elem insert_idx += len(elem) if insert_idx < total_length: result[insert_idx:insert_idx+len(sep)] = sep insert_idx += len(sep) return bytes(result) print(str_join(" ", ["Hello", "World!"]))
Run Code Online (Sandbox Code Playgroud)
值得注意的是，虽然元素迭代和元素复制基本上是两个嵌套循环，但它们迭代不同的事物。该算法仍然只接触每个字符/字节三次/一次。

归档时间：	4 年，7 月前
查看次数：	1609 次
最近记录：	4 年，7 月前