小编use*_*061的帖子

Python：写入大文件时，保持文件打开还是打开文件并根据需要追加到文件中？

我想知道如何最好地处理 python 中的大文件写入。

我的Python代码多次循环运行外部程序（具有奇怪输入文件格式的古老Fortran），读取其输出（一行文件），进行一些非常简单的处理并写入编译后的输出文件。外部程序执行速度很快（远低于 1 秒）。

import subprocess as sp

f_compiled_out = open("compiled.output", "w") 

for i in range(len(large_integer)):

  write_input_for_legacy_program = prepare_input()

  sp.call(["legacy.program"])

  with open("legacy.output", "r") as f:
    input = f.readline()

  output = process(input)

  f_compiled_out.write(output)


close(f_compiled_out)

Run Code Online (Sandbox Code Playgroud)

我可以想到三个选项来生成编译的输出文件。

我已经在做什么了。
open("comiled.output", "a") as f: f.write(output)在主循环的每个周期使用 with 打开 f_compiled_out
使用 awk 进行简单处理并将输出放在“compiled.output”末尾。

那么（1）保持大文件打开并写入到其末尾与（2）每次写入时打开并附加到它与（3）使用 awk 进行处理并构建文件的开销是多少cat？ “编译.输出”。

在任何阶段，整个输出都不需要存储在内存中。

PS，如果有人能看到任何其他明显的事情，当 N_loops 变大时，这些事情会减慢速度，那也太棒了！

python io performance

use*_*061

2014 05-15

3
推荐指数

1
解决办法

9103
查看次数

geom_point控制半径完全而不是缩放它

我的数据由一组带半径的圆组成.x,y和半径的比例相同.

x    y    radius
0.1  0.8  0.1
0.4  0.4  0.2
0.6  0.2  0.9
0.3  0.6  0.5
0.5  0.5  0.2
...
0.9  0.1  0.1

Run Code Online (Sandbox Code Playgroud)

我用的时候:

myplot <- ggplot() + geom_point(data=df, aes(x=x, y=y, size=(2*radius)))

Run Code Online (Sandbox Code Playgroud)

得到的图是一个气泡图,其大小按比例缩放到半径.我想要一个气泡图,其中radius of bubble = radius(即气泡的半径是原始单位).

我怎样才能实现这个目标(在ggplot2中)？

r ggplot2

use*_*061

2014 05-28

2
推荐指数

1
解决办法

1782
查看次数

标签统计

ggplot2 ×1

io ×1

performance ×1

python ×1

r ×1

Python：写入大文件时，保持文件打开还是打开文件并根据需要追加到文件中？

geom_point控制半径完全而不是缩放它

标签 统计

小编use_061的帖子

标签统计