contrib.keras.preprocessing.sequence.skipgrams
tf.contrib.keras.preprocessing.sequence.skipgrams
tf.contrib.keras.preprocessing.sequence.skipgrams
skipgrams( sequence, vocabulary_size, window_size=4, negative_samples=1.0, shuffle=True, categorical=False, sampling_table=None )
Defined in tensorflow/contrib/keras/python/keras/preprocessing/sequence.py
.
Generates skipgram word pairs.
Takes a sequence (list of indexes of words), returns couples of [word_index, other_word index] and labels (1s or 0s), where label = 1 if 'other_word' belongs to the context of 'word', and label=0 if 'other_word' is randomly sampled
Arguments:
sequence: a word sequence (sentence), encoded as a list of word indices (integers). If using a `sampling_table`, word indices are expected to match the rank of the words in a reference dataset (e.g. 10 would encode the 10-th most frequently occurring token). Note that index 0 is expected to be a non-word and will be skipped. vocabulary_size: int. maximum possible word index + 1 window_size: int. actually half-window. The window of a word wi will be [i-window_size, i+window_size+1] negative_samples: float >= 0. 0 for no negative (=random) samples. 1 for same number as positive samples. etc. shuffle: whether to shuffle the word couples before returning them. categorical: bool. if False, labels will be integers (eg. [0, 1, 1 .. ]), if True labels will be categorical eg. [[1,0],[0,1],[0,1] .. ] sampling_table: 1D array of size `vocabulary_size` where the entry i encodes the probabibily to sample a word of rank i.
Returns:
couples, labels: where `couples` are int pairs and `labels` are either 0 or 1.
Note
By convention, index 0 in the vocabulary is a non-word and will be skipped.
© 2017 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/api_docs/python/tf/contrib/keras/preprocessing/sequence/skipgrams