计算TensorFlow序列之间的编辑距离

2018-10-12 16:38 更新

tf.edit_distance

edit_distance ( 
    hypothesis , 
    truth , 
    normalize = True , 
    name = 'edit_distance' 
)

定义在：tensorflow/python/ops/array_ops.py.

参见指南：数学函数>序列比较和索引

计算序列之间的编辑距离.

该操作采用可变长度序列(假设(hypothesis)和真值(truth)),每个序列都提供 SparseTensor,并计算编辑距离.通过将规范化设置为 true, 可以将编辑距离正常化.

例如,给出以下输入：

# 'hypothesis' is a tensor of shape `[2, 1]` with variable-length values:
#   (0,0) = ["a"]
#   (1,0) = ["b"]
hypothesis = tf.SparseTensor(
    [[0, 0, 0],
     [1, 0, 0]],
    ["a", "b"]
    (2, 1, 1))

# 'truth' is a tensor of shape `[2, 2]` with variable-length values:
#   (0,0) = []
#   (0,1) = ["a"]
#   (1,0) = ["b", "c"]
#   (1,1) = ["a"]
truth = tf.SparseTensor(
    [[0, 1, 0],
     [1, 0, 0],
     [1, 0, 1],
     [1, 1, 0]]
    ["a", "b", "c", "a"],
    (2, 2, 2))

normalize = True

此操作将返回以下内容：

# 'output' is a tensor of shape `[2, 2]` with edit distances normalized
# by 'truth' lengths.
output ==> [[inf, 1.0],  # (0,0): no truth, (0,1): no hypothesis
           [0.5, 1.0]]  # (1,0): addition, (1,1): no hypothesis