Simply:
categorical_crossentropy
(cce
) produces a one-hot array containing the probable match for each category,sparse_categorical_crossentropy
(scce
) produces a category index of the most likely matching category.
Consider a classification problem with 5 categories (or classes).
-
In the case of
cce
, the one-hot target may be[0, 1, 0, 0, 0]
and the model may predict[.2, .5, .1, .1, .1]
(probably right) -
In the case of
scce
, the target index may be [1] and the model may predict: [.5].
Consider now a classification problem with 3 classes.
- In the case of
cce
, the one-hot target might be[0, 0, 1]
and the model may predict[.5, .1, .4]
(probably inaccurate, given that it gives more probability to the first class) - In the case of
scce
, the target index might be[0]
, and the model may predict[.5]
Many categorical models produce scce
output because you save space, but lose A LOT of information (for example, in the 2nd example, index 2 was also very close.) I generally prefer cce
output for model reliability.
There are a number of situations to use scce
, including:
- when your classes are mutually exclusive, i.e. you don’t care at all about other close-enough predictions,
- the number of categories is large to the prediction output becomes overwhelming.
220405: response to “one-hot encoding” comments:
one-hot encoding is used for a category feature INPUT to select a specific category (e.g. male versus female). This encoding allows the model to train more efficiently: training weight is a product of category, which is 0 for all categories except for the given one.
cce
and scce
are a model OUTPUT. cce
is a probability array of each category, totally 1.0. scce
shows the MOST LIKELY category, totally 1.0.
scce
is technically a one-hot array, just like a hammer used as a door stop is still a hammer, but its purpose is different. cce
is NOT one-hot.