contrib.training.FailureTolerator

tf.contrib.training.FailureTolerator

class tf.contrib.training.FailureTolerator

Defined in tensorflow/contrib/training/python/training/failure_tolerator.py.

Helper for tolerating certain exceptions.

When encountering a handled exception inside tolerator.forgive(), it is suppressed (but logged). A subsequent call to tolerator.forgive() will sleep for a period of time before continuing, with exponential backoff on multiple exceptions. (The delay avoids retrying too quickly -- a subsequent attempt will often only succeed after a transient failure has resolved itself.)

If more than limit exceptions have been encountered, the error will not be suppressed.

Exceptions occurring more than forgive_after_seconds ago (excluding time spent waiting between retries) are forgiven and no longer count towards the limit.

An example loop using FailureTolerator to retry until a successful session.run(...) would look like:

failure_tolerator = FailureTolerator()
while True:
  with failure_tolerator.forgive():
    session = make_session_somehow()
    while not should_stop():
      session.run(...)
    break  # session.run was successful

By using FailureTolerator, failures are logged, there are delays between retries, and there's a ceiling on the maximum number of retries available. (In the case of persistent errors, the task won't just loop forever!)

Methods

__init__

__init__(
    limit=5,
    init_delay=5.0,
    backoff_factor=2.0,
    forgive_after_seconds=6000,
    handled_exceptions=None
)

Creates a FailureTolerator.

The result will pause for init_delay * (backoff_factor^(failure_count-1)) when re-entering forgive() after a failure.

Args:

  • limit: The maximum number of suppressed, unforgiven, failures.
  • init_delay: How long to pause once the first failure is encountered. Defaults to five seconds.
  • backoff_factor: Each subsequent failure grows the pause by this factor.
  • forgive_after_seconds: Failures older than this are forgiven.
  • handled_exceptions: The exceptions to forgive. Defaults to (errors.AbortedError,).

forgive

forgive(
    *args,
    **kwds
)

© 2017 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/api_docs/python/tf/contrib/training/FailureTolerator

在线笔记
App下载
App下载

扫描二维码

下载编程狮App

公众号
微信公众号

编程狮公众号

意见反馈
返回顶部