kl_div differs from rel_entr, as kl_div is defined as:
z = rel_entr( x, y ) - x + y
When x and y are normalized, the functions are identical, as already documented (thanks!). But most of sources in information theory write something like “the Kullback Leibler divergence, also known as the relative entropy…”, so I cannot find where the finer distinction comes from.
Does anybody have a reference?