为什么 node.js 在文件读取方面比 python 快？

Question

为什么 node.js 在文件读取方面比 python 快？

我正在同步读取文件 (48KB) 中分析 node.js 与 python。

Node.js 代码

var fs = require('fs');
var stime = new Date().getTime() / 1000;

for (var i=0; i<1000; i++){
  var content = fs.readFileSync('npm-debug.log');
}

console.log("Total time took is: " + ((new Date().getTime() / 1000) - stime));

Run Code Online (Sandbox Code Playgroud)

Python代码

import time
stime = time.time()
for i in range(1000):
    with open('npm-debug.log', mode='r') as infile:
        ax = infile.read();

print("Total time is: " + str(time.time() - stime));

Run Code Online (Sandbox Code Playgroud)

时间安排如下：

$ python test.py
Total time is: 0.5195660591125488

$ node test.js
Total time took is: 0.25799989700317383

Run Code Online (Sandbox Code Playgroud)

区别在哪里？

在文件 IO 或
Python list ds 分配

或者我不是在比较苹果和苹果吗？

编辑：

将 python 的 readlines() 更新为 read() 以便进行很好的比较
将迭代次数从 500 更改为 1000

目的：

理解 node.js 中的真相比 python 慢，比 C 类的东西慢，如果在这种情况下在哪个地方这么慢。

Answer 1

For*_*Bru 7

readlines返回文件中的行列表，因此它必须逐个字符读取数据，不断将当前字符与任何换行符进行比较，并继续组成行列表。

这比 simple 更复杂file.read()，这与 Node.js 所做的等效。

此外，Python 脚本计算的长度是行数，而 Node.js 则是字符数。

如果您想要更快的速度，请使用os.open代替open：

import os, time


def Test_os(n):
    for x in range(n):
        f = os.open('Speed test.py', os.O_RDONLY)
        data = ""
        t = os.read(f, 1048576).decode('utf8')
        while t:
            data += t
            t = os.read(f, 1048576).decode('utf8')
        os.close(f)

def Test_open(n):
    for x in range(n):
        with open('Speed test.py') as f:
            data = f.read()

s = time.monotonic()
Test_os(500000)
print(time.monotonic() - s)

s = time.monotonic()
Test_open(500000)
print(time.monotonic() - s)

Run Code Online (Sandbox Code Playgroud)

在我的机器os.open上比open. 输出如下：

53.68909174999999
58.12600833400029

Run Code Online (Sandbox Code Playgroud)

正如你所看到的，open是4.4秒慢os.open，但由于运行的数量减少，所以做这种差异。

此外，您应该尝试调整os.read函数的缓冲区大小，因为不同的值可能会给出非常不同的时间：

这里的“操作”是指对的一次调用Test_os。

如果您摆脱字节解码并使用io.BytesIO而不是单纯的bytes对象，您将获得相当大的加速：

def Test_os(n, buf):
    for x in range(n):
        f = os.open('test.txt', os.O_RDONLY)
        data = io.BytesIO()
        while data.write(os.read(f, buf)):
            ...
        os.close(f)

Run Code Online (Sandbox Code Playgroud)

因此，现在最好的结果是0.038每次调用的秒数，而不是0.052（约 37% 的加速）。

归档时间：	9 年，2 月前
查看次数：	1733 次
最近记录：	9 年，2 月前