如何在Python中模拟以文本模式打开的文件

Question

如何在Python中模拟以文本模式打开的文件

我正在研究测试一些作用在文件上的代码的方法，但是我想编写一些测试，这些测试仅依赖于源文件中的特定字符串，而不是在文件系统中的某个位置具有特定的文件。

我知道可以file通过向字符串提供类似的流接口io.StringIO。

问题在于操作不遵循相同的语义。例如，根据文件对象是来自还是来自包含非ASCII字符的字符串，组合file.seek()和file.read()将产生不同的结果：open()io.StringIO

import io

#      'abgdezhjiklmnxoprstufqyw'
text = '??????????????o?????????'


with open('test.txt', 'w') as file_obj:
    file_obj.write(text)


with open('test.txt', 'r') as file_obj:
    file_obj.seek(8)
    print(file_obj.read(8))
# ????????


with io.StringIO(text) as file_obj:
    file_obj.seek(8)
    print(file_obj.read(8))
# ??????o?

Run Code Online (Sandbox Code Playgroud)

仅纯ASCII的字符串不会出现此问题：

import io

text = 'abgdezhjiklmnxoprstufqyw'


with open('test.txt', 'w') as file_obj:
    file_obj.write(text)


with open('test.txt', 'r') as file_obj:
    file_obj.seek(8)
    print(file_obj.read(8))
# iklmnxop


with io.StringIO(text) as file_obj:
    file_obj.seek(8)
    print(file_obj.read(8))
# iklmnxop

Run Code Online (Sandbox Code Playgroud)

显然，这是由于.seek()以下一个bytes语义为offset在打开的文件的情况下参数open()，而io.StringIO它遵循str的语义。

我确实了解，出于性能原因，即使文件以文本模式打开，具有seek()以下str语义也不可行。

因此，我的问题是：如何获得符合语义io.StringIO()的seek方法的等效项bytes？我需要超越io.StringIO自己还是有更好的方法？

Answer 1

Ara*_*Fey 5

您可以使用BytesIO和TextIOWrapper来模拟真实文件的行为：

text = '??????????????o?????????'

with io.BytesIO(text.encode('utf8')) as binary_file:
    with io.TextIOWrapper(binary_file, encoding='utf8') as file_obj:
        file_obj.seek(8)
        print(file_obj.read(8))
        # ????????

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年前
查看次数：	124 次
最近记录：	6 年前