有人可以解释以下TensorFlow术语
inter_op_parallelism_threads
intra_op_parallelism_threads
或者,请提供正确解释来源的链接.
我通过改变参数进行了一些测试,但结果并不一致,无法得出结论.
我使用以下命令来运行wordcount的spark java示例: -
time spark-submit --deploy-mode cluster --master spark://192.168.0.7:6066 --class org.apache.spark.examples.JavaWordCount /home/pi/Desktop/example/new/target/javaword.jar /books_50.txt
Run Code Online (Sandbox Code Playgroud)
当我运行它时,以下是输出: -
Running Spark using the REST application submission protocol.
16/07/18 03:55:41 INFO rest.RestSubmissionClient: Submitting a request to launch an application in spark://192.168.0.7:6066.
16/07/18 03:55:44 INFO rest.RestSubmissionClient: Submission successfully created as driver-20160718035543-0000. Polling submission state...
16/07/18 03:55:44 INFO rest.RestSubmissionClient: Submitting a request for the status of submission driver-20160718035543-0000 in spark://192.168.0.7:6066.
16/07/18 03:55:44 INFO rest.RestSubmissionClient: State of driver driver-20160718035543-0000 is now RUNNING.
16/07/18 03:55:44 INFO rest.RestSubmissionClient: Driver is …
Run Code Online (Sandbox Code Playgroud) 我试图通过检查点保存变量,以便为我的程序引入容错.我试图通过使用MonitoredTrainingSession函数来实现这一目标.以下是我的配置: -
import tensorflow as tf
global_step = tf.Variable(10, trainable=False, name='global_step')
x = tf.constant(2)
with tf.device("/job:local/task:0"):
y1 = tf.Variable(x + 300)
with tf.device("/job:local/task:1"):
y2 = tf.Variable(x**2)
with tf.device("/job:local/task:2"):
y3 = tf.Variable(5*x)
with tf.device("/job:local/task:3"):
y0 = tf.Variable(x - 66)
y = y0 + y1 + y2 + y3
model = tf.global_variables_initializer()
saver = tf.train.Saver(sharded=True)
chief = tf.train.ChiefSessionCreator(scaffold=None, master='grpc://localhost:2222', config=None, checkpoint_dir='/home/tensorflow/codes/checkpoints')
summary_hook = tf.train.SummarySaverHook(save_steps=None, save_secs=10, output_dir='/home/tensorflow/codes/savepoints', summary_writer=None, scaffold=None, summary_op=tf.summary.tensor_summary(name="y", tensor=y))
saver_hook = tf.train.CheckpointSaverHook(checkpoint_dir='/home/tensorflow/codes/checkpoints', save_secs=None, save_steps=True, saver=saver, checkpoint_basename='model.ckpt', scaffold=None)
# with tf.train.MonitoredSession(session_creator=ChiefSessionCreator,hooks=[saver_hook, summary_hook]) …
Run Code Online (Sandbox Code Playgroud)