contrib.lookup.index_table_from_file
tf.contrib.lookup.index_table_from_file
tf.contrib.lookup.index_table_from_file
index_table_from_file( vocabulary_file=None, num_oov_buckets=0, vocab_size=None, default_value=-1, hasher_spec=tf.contrib.lookup.FastHashSpec, key_dtype=tf.string, name=None )
Defined in tensorflow/contrib/lookup/lookup_ops.py
.
Returns a lookup table that converts a string tensor into int64 IDs.
This operation constructs a lookup table to convert tensor of strings into int64 IDs. The mapping can be initialized from a vocabulary file specified in vocabulary_file
, where the whole line is the key and the zero-based line number is the ID.
Any lookup of an out-of-vocabulary token will return a bucket ID based on its hash if num_oov_buckets
is greater than zero. Otherwise it is assigned the default_value
. The bucket ID range is [vocabulary size, vocabulary size + num_oov_buckets]
.
The underlying table must be initialized by calling tf.tables_initializer.run()
or table.init.run()
once.
Sample Usages:
If we have a vocabulary file "test.txt" with the following content:
emerson lake palmer
features = tf.constant(["emerson", "lake", "and", "palmer"]) table = tf.contrib.lookup.index_table_from_file( vocabulary_file="test.txt", num_oov_buckets=1) ids = table.lookup(features) ... tf.tables_initializer().run() ids.eval() ==> [0, 1, 3, 2] # where 3 is the out-of-vocabulary bucket
Args:
-
vocabulary_file
: The vocabulary filename. -
num_oov_buckets
: The number of out-of-vocabulary buckets. -
vocab_size
: Number of the elements in the vocabulary, if known. -
default_value
: The value to use for out-of-vocabulary feature values. Defaults to -1. -
hasher_spec
: AHasherSpec
to specify the hash function to use for assignation of out-of-vocabulary buckets. -
key_dtype
: Thekey
data type. -
name
: A name for this op (optional).
Returns:
The lookup table to map a key_dtype
Tensor
to index int64
Tensor
.
Raises:
-
ValueError
: Ifvocabulary_file
is not set. -
ValueError
: Ifnum_oov_buckets
is negative orvocab_size
is not greater than zero.
© 2017 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/api_docs/python/tf/contrib/lookup/index_table_from_file