dam*_*mon 4 python string list
在调用readlines().srt文件时,我得到了一个包含大量前导和尾随空格的字符列表,如下所示
with open(infile) as f:
r=f.readlines()
return r
Run Code Online (Sandbox Code Playgroud)
我得到了这份清单
['\xef\xbb\xbf1\r\n', '00:00:00,000 --> 00:00:03,000\r\n', "[D. Evans] Now that you've written your first Python program,\r\n",'\r\n', '2\r\n', '00:00:03,000 --> 00:00:06,000\r\n', 'you might be wondering why we need to invent new languages like Python\r\n', '\r\n']
Run Code Online (Sandbox Code Playgroud)
为简洁起见,我只包含了一些元素.如何清理此列表,以便删除所有空白字符并仅获取相关元素
['1','00:00:00,000 --> 00:00:03,000',"[D. Evans] Now that you've written your first Python program"...]
Run Code Online (Sandbox Code Playgroud)
Jor*_*dan 11
你可以去除每一行.如果你正在处理一个大文件,那么将它作为生成器运行也可以节省一些内存.
此外,看起来你正在处理带有BOM的UTF-8文件(对于前几个字符来说有点傻,或者至少是不必要的),所以你需要以不同的方式打开它.
import codecs
def strip_it_good(file):
with codecs.open(file, "r", "utf-8-sig") as f:
for line in f:
yield line.strip()
Run Code Online (Sandbox Code Playgroud)