tf.train.replica_device_setter
tf.train.replica_device_setter
tf.train.replica_device_setter
replica_device_setter( ps_tasks=0, ps_device='/job:ps', worker_device='/job:worker', merge_devices=True, cluster=None, ps_ops=None, ps_strategy=None )
Defined in tensorflow/python/training/device_setter.py
.
See the guide: Training > Distributed execution
Return a device function
to use when building a Graph for replicas.
Device Functions are used in with tf.device(device_function):
statement to automatically assign devices to Operation
objects as they are constructed, Device constraints are added from the inner-most context first, working outwards. The merging behavior adds constraints to fields that are yet unset by a more inner context. Currently the fields are (job, task, cpu/gpu).
If cluster
is None
, and ps_tasks
is 0, the returned function is a no-op. Otherwise, the value of ps_tasks
is derived from cluster
.
By default, only Variable ops are placed on ps tasks, and the placement strategy is round-robin over all ps tasks. A custom ps_strategy
may be used to do more intelligent placement, such as tf.contrib.training.GreedyLoadBalancingStrategy
.
For example,
# To build a cluster with two ps jobs on hosts ps0 and ps1, and 3 worker # jobs on hosts worker0, worker1 and worker2. cluster_spec = { "ps": ["ps0:2222", "ps1:2222"], "worker": ["worker0:2222", "worker1:2222", "worker2:2222"]} with tf.device(tf.train.replica_device_setter(cluster=cluster_spec)): # Build your graph v1 = tf.Variable(...) # assigned to /job:ps/task:0 v2 = tf.Variable(...) # assigned to /job:ps/task:1 v3 = tf.Variable(...) # assigned to /job:ps/task:0 # Run compute
Args:
-
ps_tasks
: Number of tasks in theps
job. Ignored ifcluster
is provided. -
ps_device
: String. Device of theps
job. If empty nops
job is used. Defaults tops
. -
worker_device
: String. Device of theworker
job. If empty noworker
job is used. -
merge_devices
:Boolean
. IfTrue
, merges or only sets a device if the device constraint is completely unset. merges device specification rather than overriding them. -
cluster
:ClusterDef
proto orClusterSpec
. -
ps_ops
: List of strings representingOperation
types that need to be placed onps
devices. IfNone
, defaults to["Variable"]
. -
ps_strategy
: A callable invoked for every psOperation
(i.e. matched byps_ops
), that takes theOperation
and returns the ps task index to use. IfNone
, defaults to a round-robin strategy across allps
devices.
Returns:
A function to pass to tf.device()
.
Raises:
TypeError if cluster
is not a dictionary or ClusterDef
protocol buffer, or if ps_strategy
is provided but not a callable.
© 2017 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/api_docs/python/tf/train/replica_device_setter