contrib.data.TFRecordDataset

tf.contrib.data.TFRecordDataset

class tf.contrib.data.TFRecordDataset

Defined in tensorflow/contrib/data/python/ops/dataset_ops.py.

A Dataset comprising records from one or more TFRecord files.

Properties

output_shapes

output_types

Methods

__init__

__init__(
    filenames,
    compression_type=None
)

Creates a TFRecordDataset.

Args:

  • filenames: A tf.string tensor containing one or more filenames.
  • compression_type: A tf.string scalar evaluating to one of "" (no compression), "ZLIB", or "GZIP".

batch

batch(batch_size)

Combines consecutive elements of this dataset into batches.

Args:

  • batch_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements of this dataset to combine in a single batch.

Returns:

A Dataset.

dense_to_sparse_batch

dense_to_sparse_batch(
    batch_size,
    row_shape
)

Batches ragged elements of this dataset into tf.SparseTensors.

Like Dataset.padded_batch(), this method combines multiple consecutive elements of this dataset, which might have different shapes, into a single element. The resulting element has three components (indices, values, and dense_shape), which comprise a tf.SparseTensor that represents the same data. The row_shape represents the dense shape of each row in the resulting tf.SparseTensor, to which the effective batch size is prepended. For example:

# NOTE: The following examples use `{ ... }` to represent the
# contents of a dataset.
a = { ['a', 'b', 'c'], ['a', 'b'], ['a', 'b', 'c', 'd'] }

a.dense_to_sparse_batch(batch_size=2, row_shape=[6]) == {
    ([[0, 0], [0, 1], [0, 2], [1, 0], [1, 1]],  # indices
     ['a', 'b', 'c', 'a', 'b'],                 # values
     [2, 6]),                                   # dense_shape
    ([[2, 0], [2, 1], [2, 2], [2, 3]],
     ['a', 'b', 'c', 'd'],
     [1, 6])
}

Args:

  • batch_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements of this dataset to combine in a single batch.
  • row_shape: A tf.TensorShape or tf.int64 vector tensor-like object representing the equivalent dense shape of a row in the resulting tf.SparseTensor. Each element of this dataset must have the same rank as row_shape, and must have size less than or equal to row_shape in each dimension.

Returns:

A Dataset.

enumerate

enumerate(start=0)

Enumerate the elements of this dataset. Similar to python's enumerate.

For example:

# NOTE: The following examples use `{ ... }` to represent the
# contents of a dataset.
a = { 1, 2, 3 }
b = { (7, 8), (9, 10), (11, 12) }

# The nested structure of the `datasets` argument determines the
# structure of elements in the resulting dataset.
a.enumerate(start=5) == { (5, 1), (6, 2), (7, 3) }
b.enumerate() == { (0, (7, 8)), (1, (9, 10)), (2, (11, 12)) }

#### Args:

* <b>`start`</b>: A `tf.int64` scalar `tf.Tensor`, representing the start
    value for enumeration.

#### Returns:

  A `Dataset`.

<h3 id="filter"><code>filter</code></h3>

``` python
filter(predicate)

Filters this dataset according to predicate.

Args:

  • predicate: A function mapping a nested structure of tensors (having shapes and types defined by self.output_shapes and self.output_types) to a scalar tf.bool tensor.

Returns:

A Dataset.

flat_map

flat_map(map_func)

Maps map_func across this dataset and flattens the result.

Args:

  • map_func: A function mapping a nested structure of tensors (having shapes and types defined by self.output_shapes and self.output_types) to a Dataset.

Returns:

A Dataset.

from_sparse_tensor_slices

from_sparse_tensor_slices(sparse_tensor)

Splits each rank-N tf.SparseTensor in this dataset row-wise.

Args:

  • sparse_tensor: A tf.SparseTensor.

Returns:

A Dataset of rank-(N-1) sparse tensors.

from_tensor_slices

from_tensor_slices(tensors)

Creates a Dataset whose elements are slices of the given tensors.

Args:

  • tensors: A nested structure of tensors, each having the same size in the 0th dimension.

Returns:

A Dataset.

from_tensors

from_tensors(tensors)

Creates a Dataset with a single element, comprising the given tensors.

Args:

  • tensors: A nested structure of tensors.

Returns:

A Dataset.

group_by_window

group_by_window(
    key_func,
    reduce_func,
    window_size
)

Performs a windowed "group-by" operation on this dataset.

This method maps each consecutive element in this dataset to a key using key_func and groups the elements by key. It then applies reduce_func to at most window_size elements matching the same key. All execpt the final window for each key will contain window_size elements; the final window may be smaller.

Args:

  • key_func: A function mapping a nested structure of tensors (having shapes and types defined by self.output_shapes and self.output_types) to a scalar tf.int64 tensor.
  • reduce_func: A function mapping a key and a dataset of up to batch_size consecutive elements matching that key to another dataset.
  • window_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements matching the same key to combine in a single batch, which will be passed to reduce_func.

Returns:

A Dataset.

make_dataset_resource

make_dataset_resource()

make_initializable_iterator

make_initializable_iterator(shared_name=None)

Creates an Iterator for enumerating the elements of this dataset.

N.B. The returned iterator will be in an uninitialized state, and you must run the iterator.initializer operation before using it.

Args:

  • shared_name: (Optional.) If non-empty, this iterator will be shared under the given name across multiple sessions that share the same devices (e.g. when using a remote server).

Returns:

An Iterator over the elements of this dataset.

make_one_shot_iterator

make_one_shot_iterator()

Creates an Iterator for enumerating the elements of this dataset.

N.B. The returned iterator will be initialized automatically. A "one-shot" iterator does not currently support re-initialization.

Returns:

An Iterator over the elements of this dataset.

map

map(
    map_func,
    num_threads=None,
    output_buffer_size=None
)

Maps map_func across this datset.

Args:

  • map_func: A function mapping a nested structure of tensors (having shapes and types defined by self.output_shapes and self.output_types) to another nested structure of tensors.
  • num_threads: (Optional.) A tf.int32 scalar tf.Tensor, representing the number of threads to use for processing elements in parallel. If not specified, elements will be processed sequentially without buffering.
  • output_buffer_size: (Optional.) A tf.int64 scalar tf.Tensor, representing the maximum number of processed elements that will be buffered when processing in parallel.

Returns:

A Dataset.

padded_batch

padded_batch(
    batch_size,
    padded_shapes,
    padding_values=None
)

Combines consecutive elements of this dataset into padded batches.

Like Dataset.dense_to_sparse_batch(), this method combines multiple consecutive elements of this dataset, which might have different shapes, into a single element. The tensors in the resulting element have an additional outer dimension, and are padded to the respective shape in padded_shapes.

Args:

  • batch_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements of this dataset to combine in a single batch.
  • padded_shapes: A nested structure of tf.TensorShape or tf.int64 vector tensor-like objects representing the shape to which the respective component of each input element should be padded prior to batching. Any unknown dimensions (e.g. tf.Dimension(None) in a tf.TensorShape or -1 in a tensor-like object) will be padded to the maximum size of that dimension in each batch.
  • padding_values: (Optional.) A nested structure of scalar-shaped tf.Tensor, representing the padding values to use for the respective components. Defaults are 0 for numeric types and the empty string for string types.

Returns:

A Dataset.

range

range(*args)

Creates a Dataset of a step-separated range of values.

For example:

Dataset.range(5) == [0, 1, 2, 3, 4]
Dataset.range(2, 5) == [2, 3, 4]
Dataset.range(1, 5, 2) == [1, 3]
Dataset.range(1, 5, -2) == []
Dataset.range(5, 1) == []
Dataset.range(5, 1, -2) == [5, 3]

Args:

*args: follow same semantics as python's xrange. len(args) == 1 -> start = 0, stop = args[0], step = 1 len(args) == 2 -> start = args[0], stop = args[1], step = 1 len(args) == 3 -> start = args[0], stop = args[1, stop = args[2]

Returns:

A RangeDataset.

Raises:

  • ValueError: if len(args) == 0.

read_batch_features

read_batch_features(
    file_pattern,
    batch_size,
    features,
    reader,
    reader_args=None,
    randomize_input=True,
    num_epochs=None,
    capacity=10000
)

Reads batches of Examples.

Args:

  • file_pattern: A string pattern or a placeholder with list of filenames.
  • batch_size: A tf.int64 scalar tf.Tensor, representing the number of consecutive elements of this dataset to combine in a single batch.
  • features: A dict mapping feature keys to FixedLenFeature or VarLenFeature values. See tf.parse_example.
  • reader: A function or class that can be called with a filenames tensor and (optional) reader_args and returns a Dataset of serialized Examples.
  • reader_args: Additional arguments to pass to the reader class.
  • randomize_input: Whether the input should be randomized.
  • num_epochs: Integer specifying the number of times to read through the dataset. If None, cycles through the dataset forever.
  • capacity: Capacity of the ShuffleDataset.

Returns:

A Dataset.

repeat

repeat(count=None)

Repeats this dataset count times.

Args:

  • count: (Optional.) A tf.int64 scalar tf.Tensor, representing the number of times the elements of this dataset should be repeated. The default behavior (if count is None or -1) is for the elements to be repeated indefinitely.

Returns:

A Dataset.

shuffle

shuffle(
    buffer_size,
    seed=None
)

Randomly shuffles the elements of this dataset.

Args:

  • buffer_size: A tf.int64 scalar tf.Tensor, representing the number of elements from this dataset from which the new dataset will sample.
  • seed: (Optional.) A tf.int64 scalar tf.Tensor, representing the random seed that will be used to create the distribution. See tf.set_random_seed for behavior.

Returns:

A Dataset.

skip

skip(count)

Creates a Dataset that skips count elements from this dataset.

Args:

  • count: A tf.int64 scalar tf.Tensor, representing the number of elements of this dataset that should be skipped to form the new dataset. If count is greater than the size of this dataset, the new dataset will contain no elements. If count is -1, skips the entire dataset.

Returns:

A Dataset.

take

take(count)

Creates a Dataset with at most count elements from this dataset.

Args:

  • count: A tf.int64 scalar tf.Tensor, representing the number of elements of this dataset that should be taken to form the new dataset. If count is -1, or if count is greater than the size of this dataset, the new dataset will contain all elements of this dataset.

Returns:

A Dataset.

unbatch

unbatch()

Splits elements of this dataset into sequences of consecutive elements.

For example, if elements of this dataset are shaped [B, a0, a1, ...], where B may vary from element to element, then for each element in this dataset, the unbatched dataset will contain B consecutive elements of shape [a0, a1, ...].

Returns:

A Dataset.

zip

zip(datasets)

Creates a Dataset by zipping together the given datasets.

This method has similar semantics to the built-in zip() function in Python, with the main difference being that the datasets argument can be an arbitrary nested structure of Dataset objects. For example:

# NOTE: The following examples use `{ ... }` to represent the
# contents of a dataset.
a = { 1, 2, 3 }
b = { 4, 5, 6 }
c = { (7, 8), (9, 10), (11, 12) }
d = { 13, 14 }

# The nested structure of the `datasets` argument determines the
# structure of elements in the resulting dataset.
Dataset.zip((a, b)) == { (1, 4), (2, 5), (3, 6) }
Dataset.zip((b, a)) == { (4, 1), (5, 2), (6, 3) }

# The `datasets` argument may contain an arbitrary number of
# datasets.
Dataset.zip((a, b, c) == { (1, 4, (7, 8)),
                           (2, 5, (9, 10)),
                           (3, 6, (11, 12)) }

# The number of elements in the resulting dataset is the same as
# the size of the smallest dataset in `datasets`.
Dataset.zip((a, d)) == { (1, 13), (2, 14) }

Args:

  • datasets: A nested structure of datasets.

Returns:

A Dataset.

© 2017 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/api_docs/python/tf/contrib/data/TFRecordDataset

在线笔记
App下载
App下载

扫描二维码

下载编程狮App

公众号
微信公众号

编程狮公众号

意见反馈
返回顶部