Phi*_*hil 6 python hadoop hdfs tensorflow
以下脚本执行速度非常慢.我只想计算twitter-follwer-graph中的总行数(文本文件大约为26 GB).
我需要执行机器学习任务.这只是通过tensorflow从hdfs访问数据的测试.
import tensorflow as tf
import time
filename_queue = tf.train.string_input_producer(["hdfs://default/twitter/twitter_rv.net"], num_epochs=1, shuffle=False)
def read_filename_queue(filename_queue):
reader = tf.TextLineReader()
_, line = reader.read(filename_queue)
return line
line = read_filename_queue(filename_queue)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1500,inter_op_parallelism_threads=1500)
with tf.Session(config=session_conf) as sess:
sess.run(tf.initialize_local_variables())
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
start = time.time()
i = 0
while True:
i = i + 1
if i%100000 == 0:
print(i)
print(time.time() - start)
try:
sess.run([line])
except tf.errors.OutOfRangeError:
print('end of file')
break
print('total number of lines = ' + str(i))
print(time.time() - start)
Run Code Online (Sandbox Code Playgroud)
对于前100000行,该过程需要大约40秒.我试图设置intra_op_parallelism_threads和inter_op_parallelism_threads为0,4,8,40,400和1500.但它并没有显著影响的执行时间......
你能帮助我吗?
系统规格:
| 归档时间: |
|
| 查看次数: |
616 次 |
| 最近记录: |