小编jon*_*han的帖子

In Matthew's example, he has a 20GB file and 64 workers (8 physical nodes).

In my case, I have a 82GB file and 288 workers (12 physical nodes; there's a HDFS data node on each).

On all 12 nodes, I can access HDFS and execute a simple Python script that displays info on a file:

import pyarrow as pa
fs = pa.hdfs.connect([url], 8022)
print(str(fs.info('/path/to/file.csv'))) …

5
推荐指数

1
解决办法

2509
查看次数

小编jon_han的帖子