我有一个很大的压缩文件(5000列×1M行),由0和1组成:
0 1 1 0 0 0 1 1 1....(×5000)
0 0 0 1 0 1 1 0 0
....(×1M)
Run Code Online (Sandbox Code Playgroud)
我想对它进行转置,但是使用numpy或其他方法只会将整个表加载到RAM中,而我只能使用6GB。
因此,我想使用一种方法将每条转置的行写入一个打开的文件,而不是将其存储在RAM中。我想出了以下代码:
import gzip
with open("output.txt", "w") as out:
with gzip.open("file.txt", "rt") as file:
number_of_columns = len(file.readline().split())
# iterate over number of columns (~5000)
for column in range(number_of_columns):
# in each iteration, go to the top line to start again
file.seek(0)
# initiate list storing the ith column's elements that will form the transposed column
transposed_column = []
# iterate …Run Code Online (Sandbox Code Playgroud)