Pytorch - 官方损失函数汇总

Author： AIHGF
发布时间：February 10, 2020
692views
No comments
17608 words
Categories：深度平台

主要参考 pytorch - Loss functions.
Pytorch == 1.4.0

1. L1Loss

class torch.nn.L1Loss(size_average=None,
                      reduce=None, 
                      reduction='mean')

1.1. 说明

L1 Loss 计算的是，input x 与 target y 间各元素值的平均绝对误差(mean absolute error MAE).

$x_n$ 和 $y_n$ 均是长度为 $n$ 的tensor.

[1] - 未简化版(unreduced，reduction='none')：

$$ \ell(x, y) = L = \{l_1, ..., l_N\}^T, l_n = |x_n - y_n| $$

其中，$N$ - batchsize.

[2] - 简化版(reduced, reduction='mean' 或 reduction='sum')：

$$ \ell(x) = \begin{cases} mean(L) &\text{if reduction='mean' } \\ sum(L) &\text{if reduction = 'sum'} \end{cases} $$

其中，sum 操作仍是对所用元素进行操作，并除以 $n$.

如果设置 reduction='sum' 则不会再除以 $n$.

1.2. 参数

[1] - size_average 和 reduce 都是待弃用的参数. 不推荐使用.

[2] - reduction: 可选参数：none、mean和sum 三种. 默认为 mean.

none: 不进行简化(reduction).
mean: 对输出求和，并除以元素总数.
sum: 对输出求和，但不除以元素总数.

1.3. Shape

[1] - 输入 - input x, $(N, \ast )$.

[2] - 输入 - target y, $(N, \ast )$，与输出保持一致.

[3] - 输出 - 如果 reduce='none', 输出与其输入一致, $(N, \ast)$.

1.4. 示例

import torch 
import torch.nn as nn

loss = nn.L1Loss()
input = torch.randn(2, 3, requires_grad=True)
target = torch.randn(2, 3)
output = loss(input, target)
output.backward()

2. MSELoss

class torch.nn.MSELoss(size_average=None,
                       reduce=None,
                       reduction='mean')

2.1. 说明

MSE Loss 计算的是，input x 和 target y 的 n 个元素间的平方差(mean squard error, squard L2 norm).

[1] - 未简化版(unreduced，reduction='none')：

$$ \ell(x,y) = L = \{l_1, ..., l_N\}^T, l_n = (x_n - y_n)^2 $$

其中，$N$ - batchsize.

[2] - 简化版(reduced, reduction='mean' 或 reduction='sum')：

$$ \ell(x) = \begin{cases} mean(L) &\text{if reduction='mean' } \\ sum(L) &\text{if reduction = 'sum'}\end{cases} $$

其中，sum 操作仍是对所用元素进行操作，并除以 $n$.

如果设置 reduction='sum' 则不会再除以 $n$.

2.2. 参数

与 1.1. 参数 一致.

2.3. Shape

[1] - 输入 - input x, $(N, \ast )$.

[2] - 输入 - target y, $(N,\ast )$，与输出保持一致.

2.4. 示例

import torch 
import torch.nn as nn

loss = nn.MSELoss()
input = torch.randn(2, 3, requires_grad=True)
target = torch.randn(2, 3)
output = loss(input, target)
output.backward()

3. CrossEntropyLoss

class torch.nn.CrossEntropyLoss(weight=None, 
                                size_average=None, 
                                ignore_index=-100,
                                reduce=None,
                                reduction='mean')

3.1. 说明

CrossEntropyLoss，交叉熵损失函数，等价于nn.LogSoftmax() 和 nn.NLLloss() 的计算.

其应用于，$C$ 类的分类问题.

如果设置了 weight 参数，则应该是对应每一个类别的权重，1D Tensor. 对于训练集数据不均衡的场景比较有用.

CrossEntropyLoss 的输入是关于每个类别的原始未归一化的值. input 是 $(minibatch, C)$ 或 $(minibatch, C, d_1, d_2, ..., d_K), (K \geq 1)$ Shape 的 Tensor.

CrossEntropyLoss 要求类别索引是 $[0, C-1]$. 如果设置了 ignore_index 参数，该损失函数也会接受该ignore_index类别索引(ignore_index 可以不在 $[0, C-1]$ 范围内.)

[1] - 未设定 weight 参数，不带权重时，

$$ loss(x, class) = -log (\frac{exp(x[class])}{\sum_j exp(x[j])}) = -x[class] + log(\sum_j exp(x[j])) $$

[2] - 设定 weight 参数，带权重时，

$$ loss(x, class) = weight[class](-x[class] + log(\sum_j exp(x[j]))) $$

loss 计算时对每个 minibatch 内的所有样本求均值.

CrossEntropyLoss 也可以用于高维输入，比如，2D 图像，此时其输入shape 为 $(minibatch, C, d_1, d_2, ..., d_K), (K \geq 1)$，其中 $K$ 是维数.

3.2. 参数

[1] - size_average 和 reduce 都是待弃用的参数. 不推荐使用.

[2] - weight: 关于每个类别的缩放权重，size 为 $C$ 的 Tensor.

[3] - ignore_index: 指定要忽略的 target 值，其不对输入梯度有贡献. 当 size_average='True'时，损失函数值仅对 non-ignored targets 进行求均值.

[4] - reduction: 可选参数：none、mean和sum 三种. 默认为 mean.

none: 不进行简化(reduction).
mean: 对输出求和，并除以元素总数.
sum: 对输出求和，但不除以元素总数.

3.3. Shape

[1] - 输入 - input，$(N, C)$，其中，$C$ 是类别数. 或，$(N, C, d_1, d_2, ..., d_K), K\geq 1$.

[2] - 输入 - target，$(N)$，其中，每个值的是 $ 0 \leq target[i] < \leq C-1$. 或，$(N, d_1, d_2, ..., d_K)$.

[3] - 输出 - output，标量值. 如果 reduction='none'，则与target 的size一致, $(N)$. 或，$(N, d_1, d_2, ..., d_k)$.

3.4. 示例

loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()

4. NLLLoss

class torch.nn.NLLLoss(weight=None, 
                       size_average=None,
                       ignore_index=-100, 
                       reduce=None,
                       reduction='mean')

4.1. 说明

NLLLoss，负对数似然函数(negative log likelihood). 适用于 $C$ 类的分类任务.

在神经网络中，可以通过在最后一层添加 LogSoftmax 层来得到对数概率(log-probability). 此时，损失函数计算为 LogSoftmax + NLLLoss 的输出. 如果不喜欢新增额外的网络层，可以直接采用 CrossEntropyLoss.

[1] - 未简化版(unreduced，reduction='none')：

$$ \ell (x, y) = L = \lbrace l_1, ..., l_N \rbrace ^T $$

$$ l_n = -w_{y_n} x_{n, y_n} $$

$$ w_c = weight[c] \cdot 1 \lbrace c \neq \text{ignore_index} \rbrace $$

[2] - 简化版(reduced, reduction='mean' 或 reduction='sum')：

$$ \ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N} l_n &\text{if reduction='mean' } \\ \sum_{n=1}^N l_n &\text{if reduction = 'sum'} \end{cases} $$

支持高维输入 inputs, 如 2D images, NLLLoss 计算的是对每个像素样本的.

4.2. 参数

同 3.2. 参数.

4.3. Shape

同 3.3. Shape.

4.4. 示例

# 1D
import torch 
import torch.nn as nn

logsoft = nn.LogSoftmax()
loss = nn.NLLLoss()
# input, NxC=2x3
input = torch.randn(3, 5, requires_grad=True)
# target, N, 每个值的是 [0, C).
target = torch.tensor([1, 0, 4])
output = loss(logsoft(input), target)
output.backward()

# 2D

N, C = 5, 4
loss = nn.NLLLoss()
# input, N x C x height x width
data = torch.randn(N, 16, 10, 10)
conv = nn.Conv2d(16, C, (3, 3))
m = nn.LogSoftmax(dim=1)
# target
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
output = loss(m(conv(data)), target)
output.backward()

5. KLDivLoss

class torch.nn.KLDivLoss(size_average=None, 
                         reduce=None, 
                         reduction='mean')

5.1. 说明

KLDivLoss, Kullback-Leibler divergence Loss.

KL 散度适用于连续分布间的距离度量. 往往用于对(离散采样的)连续输出分布的空间进行直接回归.

类似于 NLLLoss，KLDivLoss 的输入也是 log-probabilities，且不严格要求于 2D Tensor. target 也是以概率形式给定的(未取对数).

[1] - 未简化版(unreduced，reduction='none')：

$$ \ell (x, y) = L = \lbrace l_1, ...,l_N \rbrace $$

$$ l_n = y_n \cdot (log \ y_n - x_n) $$

其中，索引 $N$ 贯穿了 input 的所有维度. $L$ 与 input 具有相同的 shape.

[2] - 简化版(reduced, reduction='mean' 或 reduction='sum')：

$$ \ell(x, y) = \begin{cases} mean(L) &\text{if reduction='mean' } \\ sum(L) &\text{if reduction = 'sum'} \end{cases} $$

KLDivLoss 对于每个 minibatch 的样本和维度进行求均值.

5.2. 参数

同 1.1. 参数 .

下一个 release 版本中，reduction='mean' 会更新为 reduction='batchmean'.

5.3. Shape

同 1.1. Shape.

6. BCELoss

class torch.nn.BCELoss(weight=None, 
                       size_average=None, 
                       reduce=None, 
                       reduction='mean')

6.1. 说明

BCELoss 计算的是 target 和 output 检测二值交叉熵.

[1] - 未简化版(unreduced，reduction='none')：

$$ \ell(x, y) = L = \lbrace l_1, ..., l_N \rbrace ^T $$

$$ l_n = -w_n [y_n \cdot log \ x_n + (1 - y_n) \cdot log(1 - x_n)] $$

其中，$N$ 为 batchsize.

[2] - 简化版(reduced, reduction='mean' 或 reduction='sum')：

$$ \ell(x) = \begin{cases} mean(L) &\text{if reduction='mean' } \\ sum(L) &\text{if reduction = 'sum'}\end{cases} $$

常用于度量重建误差，例如，auto-encoder.

注， targets $y$ 的值的取值范围为 [0, 1].

6.2. 参数

[1] - size_average 和 reduce 都是待弃用的参数. 不推荐使用.

[2] - weight: 关于每个类别的缩放权重，size 为 $C$ 的 Tensor.

[3] - reduction: 可选参数：none、mean和sum 三种. 默认为 mean.

none: 不进行简化(reduction).
mean: 对输出求和，并除以元素总数.
sum: 对输出求和，但不除以元素总数.

6.3. Shape

[1] - 输入 - input，$(N, \ast)$.

[2] - 输入 - target，$(N, \ast )$.

[3] - 输出 - output，标量值. 如果 reduction='none'，则与target 的size一致, $(N, \ast )$.

6.4. 示例

m = nn.Sigmoid()
loss = nn.BCELoss()

input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(m(input), target)
output.backward()

7. BCEWithLogitsLoss

class torch.nn.BCEWithLogitsLoss(weight=None, 
                                 size_average=None, 
                                 reduce=None, 
                                 reduction='mean',
                                 pos_weight=None)

7.1. 说明

BCEWithLogitsLoss，是将 Sigmoid 层和 BCELoss 层整合.

相比于，采用 Sigmoid + BCELoss，BCEWithLogitsLoss 具有更好的数值稳定性.(主要是利用了 log-sum-exp 技巧来实现数据稳定性).

[1] - 未简化版(unreduced，reduction='none')：

$$ \ell(x, y) = L = \lbrace l_1, ..., l_N \rbrace ^T $$

$$ l_n = -w_n [y_n \cdot log \ \sigma(x_n) + (1 - y_n) \cdot log(1 - \sigma (x_n)] $$

其中，$N$ 为 batchsize.

[2] - 简化版(reduced, reduction='mean' 或 reduction='sum')：

$$ \ell(x) = \begin{cases} mean(L) &\text{if reduction='mean' } \\ sum(L) &\text{if reduction = 'sum'}\end{cases} $$

常用于度量重建误差，例如，auto-encoder.

注，targets $t[i]$ 的取值范围为 [0, 1].

对于BCEWithLogitsLoss，通过增加正样本(positive examples)的权重，可以实现 recall 和 precision 间的均衡. 对于 multi-label 分类问题：

$$ \ell_c(x, y) = L_c = \lbrace l_{1, c}, ..., l_{N,c} \rbrace ^T $$

$$ l_{n,c} = -w_{n,c} [p_c y_{n,c} \cdot log \ \sigma(x_{n_c}) + (1 - y_{n, c}) \cdot log(1 - \sigma (x_{n, c})] $$

其中，$c$ 为类别数(对于 multi-label 二值分类问题， $c>1$；对于 single-label 二值分类问题，$c=1$).

$n$ 为 batch 内的样本数;

$p_c$ 为对于类别 $c$ ，其正样本的权重. $p_c > 1$ 其提升 recall；$p_c < 1$ 其提升 precision.

例如，对于一个数据集，其包含单个类别(single-class)，100 个正样本，300 个负样本，则对于该类别，其$pos\_weight = \frac{300}{100}=3$. 损失函数起作用需要数据集包含 $3 \times 100=300$ 个正样本.

如：

# 64 classes, batch size = 10
target = torch.ones([10, 64], dtype=torch.float32) 
# A prediction (logit)
output = torch.full([10, 64], 0.999)  
# All weights are equal to 1
pos_weight = torch.ones([64])  
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
# -log(sigmoid(0.999))
print(criterion(output, target))
#tensor(0.3135)

7.2. 参数

[1] - size_average 和 reduce 都是待弃用的参数. 不推荐使用.

[2] - weight: 关于每个类别的缩放权重，size 为 $C$ 的 Tensor.

[3] - reduction: 可选参数：none、mean和sum 三种. 默认为 mean.

none: 不进行简化(reduction).
mean: 对输出求和，并除以元素总数.
sum: 对输出求和，但不除以元素总数.

[4] - pos_weight: 正样本的权重. 必须是与类别数长度一致的向量.

7.3. Shape

同 6.3. Shape.

7.4. 示例

loss = nn.BCEWithLogitsLoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(input, target)
output.backward()

8. MarginRankingLoss

class torch.nn.MarginRankingLoss(margin=0.0, 
                                 size_average=None, 
                                 reduce=None, 
                                 reduction='mean')

8.1. 说明

MarginRankingLoss，计算的是，给定输入两个 1D min-batch Tensor $x1$ 和 $x2$ 以及一个 1D mini-batch tensor $y$ (包含1或-1) 的度量.

如果 $y=1$，则假设第一个输入比第二个输入的排名更高(值更大)；

如果 $y=-1$，则假设第二个输入比第一个输入的排名更高(值更大).

对于 mini-batch 内的每个样本，

$$ loss(x, y) = max(0, -y \ast (x1-x2) + \text{margin}) $$

8.2. 参数

[1] - margin - 默认值为 0.

其它参数同 1.2. 参数.

8.3. Shape

[1] - 输入 - input，$(N, D)$, $N$ 为 batch-size，$D$ 是样本的维度.

[2] - 输入 - target, $(N)$

[3] - 输出 - 标量，如果 reduction='none'，则为 $(N)$.

9. HingeEmbeddingLoss

class torch.nn.HingeEmbeddingLoss(margin=1.0, 
                                  size_average=None,
                                  reduce=None, 
                                  reduction='mean')

9.1. 说明

HingeEmbeddingLoss 计算的是，给定输入 tensor $x$ 和 labels tensor $y$ (包含1和-1) 时的损失函数.

HingeEmbeddingLoss 往往用于度量两个输入是否相似，如采用 L1 成对距离(pairdistance).

HingeEmbeddingLoss 多用于学习非线性嵌入(nonlinear embeddings) 或者半监督学习.

对于 mini-batch 内的第 $n$ 个样本，

$$ l_n = \begin{cases} x_n &\text { if } y_n=1 \\ max\lbrace 0, \Delta - x_n \rbrace &\text { if } y_n = -1 \end{cases} $$

总损失函数为：

$$ L = \{l_1, ..., l_N\}^T $$

$$ \ell(x) = \begin{cases} mean(L) &\text{if reduction='mean' } \\ sum(L) &\text{if reduction = 'sum'}\end{cases} $$

10. MultiLabelMarginLoss

class torch.nn.MultiLabelMarginLoss(size_average=None, 
                                    reduce=None, 
                                    reduction='mean')

10.1. 说明

MultiLabelMarginLoss 度量的是，给定输入2D mini-batch Tensor $x$ 和输出 2D Tensor 目标类别索引 $y$ 时的 multi-class multi-classification hinge loss 函数.

对于 mini-batch 内的每个样本：

$$ loss(x, y) = \sum_{ij} \frac{max(0, 1-(x[y[j]] -x[i]))}{x.size(0)} $$

其中，$x \in \lbrace 0, ..., x.size(0)-1 \rbrace$.

$y \in \lbrace 0, ..., y.size(0)-1 \rbrace$.

$0 \leq y[j] \leq x.size(0)-1$.

对于所有 $i$ 和 $j$，$i \neq y[j]$.

$y$ 和 $x$ size 一致.

MultiLabelMarginLoss 仅针对从上开始的非负 targets 连续块的部分.

MultiLabelMarginLoss 有助于解决不同样本具有不同数量 target 类别的问题.

10.2. 参数

同 1.2. 参数 .

10.3. Shape

[1] - 输入 - input，$C$ 或 $(N, C)$, $N$ 为 batch-size，$C$ 是类别.

[2] - 输入 - target, $(C)$ 或 $(N, C)$，为了确保与 input shape 一致，target label 会补 -1.

[3] - 输出 - 标量，如果 reduction='none'，则为 $(N)$.

10.4. 示例

loss = nn.MultiLabelMarginLoss()
x = torch.FloatTensor([[0.1, 0.2, 0.4, 0.8]])
# for target y, only consider labels 3 and 0, not after label -1
y = torch.LongTensor([[3, 0, -1, 1]])
print(loss(x, y))
# 0.25 * ((1-(0.1-0.2)) + (1-(0.1-0.4)) + (1-(0.8-0.2)) + (1-(0.8-0.4)))
#tensor(0.8500)

11. SmoothL1Loss

class torch.nn.SmoothL1Loss(size_average=None,
                            reduce=None, 
                            reduction='mean')

11.1. 说明

如果每个元素的绝对误差都小于 1，则 SmoothL1Loss 采用平方项；否则采用 L1项.

相比于 MSELoss，SmoothL1Loss 对 outliers 更不敏感，在某些场景下避免梯度爆炸(如，Fast R-CNN 论文里).

SmoothL1Loss，也叫做 Huber loss:

$$ loss(x, y) = \frac{1}{n} \sum_{i} z_i $$

其中，

$$ z_i = \begin{cases} 0.5(x_i - y_i)^2 &\text { if } |x_i - y_i| <1 \\ |x_i - y_i| - 0.5 &\text { otherwise } \end{cases} $$

11.2. 参数

同 1.2. 参数

11.3. Shape

同 1.3. Shape.

12. SoftMarginLoss

torch.nn.SoftMarginLoss(size_average=None, 
                        reduce=None, 
                        reduction='mean')

12. 说明

SoftMarginLoss 计算的是，输入 tensor $x$ 和 target tensor $y$ (包含 1 或 -1) 间的二类分类逻辑损失函数(two-class classification logistic loss ).

$$ loss(x,y) = \sum_i \frac{log(1 + exp(-y[i] \ast x[i]))}{x.nelement()} $$

13. MultiLabelSoftMarginLoss

class torch.nn.MultiLabelSoftMarginLoss(weight=None, 
                                        size_average=None, 
                                        reduce=None,
                                        reduction='mean')

13.1. 说明

MultiLabelSoftMarginLoss 计算的是，输入 tensor $x$ 和 target tensor $y$ (size 为 $(N, C)$ 间，基于最大熵(max-entropy)优化 multi-label one-versus-all(一对多)的损失函数.

对于 minibatch 内的每个样本，

$$ loss(x, y) = -\frac{1}{C} \ast \sum_i y[i] \ast log((1 + exp(-x[i]))^{-1}) + (1 - y[i]) \ast log(\frac{exp(-x[i])}{1+exp(-x[i])}) $$

其中，$i \in {0, ..., x/nElement()-1}$, $y[i] \in \lbrace 0, 1 \rbrace$.

14. CosineEmbeddingLoss

class torch.nn.CosineEmbeddingLoss(margin=0.0, 
                                   size_average=None, 
                                   reduce=None, 
                                   reduction='mean')

14.1. 说明

CosineEmbeddingLoss 计算的是，给定输入 tensor $x1$ 和 $x2$，以及 label Tensor $y$ 值为 1 或 -1 时的损失函数.

CosineEmbeddingLoss 用于采用 cosine 距离来度量两个输入是否相似. 往往被用于学习非线性嵌入和半监督学习中.

对于每个样本的损失函数为：

$$ l_n = \begin{cases} 1 - cos(x_1, x_2) &\text { if } y=1 \\ max\lbrace 0, cos(x_1, x_2)- \text{margin} \rbrace &\text { if } y = -1 \end{cases} $$

15. MultiMarginLoss

class torch.nn.MultiMarginLoss(p=1, 
                               margin=1.0, 
                               weight=None, 
                               size_average=None, 
                               reduce=None, 
                               reduction='mean')

15.1. 说明

MultiMarginLoss 计算的是，给定输入 2D mini-batch Tensor $x$ 和输出 target 类别索引的 1D Tensor $y$ ($0 \leq y \leq x.size(1)-1$) 时，multi-class 分类的 hinge loss(margin-based loss) 的优化.

对于每个 mini-batch 样本，关于 1D 输入 $x$ 和标量输出 $y$ 的损失函数为：

$$ loss(x, y) = \frac{\sum_i max(0, (\text{margin} - x[y] + x[i])^p)}{x.size(0)} $$

其中，$x \in \lbrace 0, ..., x.size(0)-1 \rbrace$ , $x \neq y$.

此外，也可以通过传递 1D weight tensor 参数，设定非负权重：

$$ loss(x, y) = \frac{\sum_i max(0, w[y] \ast (\text{margin} - x[y] + x[i])^p)}{x.size(0)} $$

16. TripletMarginLoss

class torch.nn.TripletMarginLoss(margin=1.0, 
                                 p=2.0, 
                                 eps=1e-06, 
                                 swap=False, 
                                 size_average=None, 
                                 reduce=None, 
                                 reduction='mean')

16.1. 说明

TripletMarginLoss 是三元损失函数度量，给定输入 tensor $x1$, $x2$, $x3$ 以及一个值大于 0 的 margin.

TripletMarginLoss 用于度量样本间的相对相似性.

一个三元组由 $a$、$p$ 和 $n$ 组成(分别表示 anchor， positive example 和 negative example.)

TripletMarginLoss 所有输入 tensors 的 shape 都是 $(N, D)$.

距离 swap 参考论文：Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al.

对于 mini-batch 内的每个样本，

$$ L(a, p, n) = max \lbrace d(a_i, p_i) - d(a_i, n_i) + \text{margin}, 0 \rbrace $$

其中，

$$ d(x_i, y_i) = ||\mathbf{x}_i - \mathbf{y}_i||_p $$

16.2. 参数

[1] - margin - 默认值为 1.

[2] - p - 成对距离的范围，默认为 2.

[3] - swap - swap 距离.

其它参数同 1.2. 参数.

16.3. Shape

[1] - 输入 - input, $(N, D)$，其中，$D$ 为向量维度.

[2] - 输出 - output, 标量. 如果 reduction='none'，则为 $(N)$.

16.4. 示例

triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2)
anchor = torch.randn(100, 128, requires_grad=True)
positive = torch.randn(100, 128, requires_grad=True)
negative = torch.randn(100, 128, requires_grad=True)
output = triplet_loss(anchor, positive, negative)
output.backward()

17. PoissonNLLLoss

class torch.nn.PoissonNLLLoss(log_input=True, 
                              full=False, 
                              size_average=None, 
                              eps=1e-08, 
                              reduce=None, 
                              reduction='mean')

17.1. 说明

PoissonNLLLoss 计算的是，target 服从 Possion 分布的负对数似然函数.

PoissonNLLLoss 可以表述为：

$$ target \sim Poisson(input) $$

$$ loss(input, target) = input - target \ast log(input) + log(target !) $$

最后一项可以忽略，或者采用 Stirling 表达式来逼近. 逼近形式用于 target 值大于1的场景. 对于 target 值小于等于 1时，损失函数加零.

17.2. 参数

[1] - log_input - 如果值为 True，则损失函数计算为： $exp(input) - target \ast input$. 如果值为 False，则损失函数计算为： $input - target \ast log(input + eps)$.

[2] - full - 是否计算完整损失函数，如，添加 Stirling 逼近项：$target \ast log(target) - target + 0.5 \ast log(2 \pi target)$.

其它参数同 1.2. 参数.

17.3. 示例

loss = nn.PoissonNLLLoss()
log_input = torch.randn(5, 2, requires_grad=True)
target = torch.randn(5, 2)
output = loss(log_input, target)
output.backward()

18. CTCLoss

class torch.nn.CTCLoss(blank=0, 
                       reduction='mean', 
                       zero_infinity=False)

18.1. 说明

Connectionist Temporal Classification loss.

时序数据分类的损失函数.

18.2. 示例

T = 50      # Input sequence length
C = 20      # Number of classes (including blank)
N = 16      # Batch size
S = 30      # Target sequence length of longest target in batch
S_min = 10  # Minimum target length, for demonstration purposes

# Initialize random batch of input vectors, for *size = (T,N,C)
input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()

# Initialize random batch of targets (0 = blank, 1:C = classes)
target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)

input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)
ctc_loss = nn.CTCLoss()
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()

Last modification：April 26th, 2021 at 05:30 pm

Pytorch - 官方损失函数汇总

AIHGF • 2020 年 02 月 10 日

主要参考 pytorch - Loss functions.
Pytorch == 1.4.0

1. L1Loss

class torch.nn.L1Loss(size_average=None,
                      reduce=None, 
                      reduction='mean')

1.1. 说明

L1 Loss 计算的是，input x 与 target y 间各元素值的平均绝对误差(mean absolute error MAE).

$x_n$ 和 $y_n$ 均是长度为 $n$ 的tensor.

[1] - 未简化版(unreduced，reduction='none')：

$$ \ell(x, y) = L = \{l_1, ..., l_N\}^T, l_n = |x_n - y_n| $$

其中，$N$ - batchsize.

[2] - 简化版(reduced, reduction='mean' 或 reduction='sum')：

$$ \ell(x) = \begin{cases} mean(L) &\text{if reduction='mean' } \\ sum(L) &\text{if reduction = 'sum'} \end{cases} $$

其中，sum 操作仍是对所用元素进行操作，并除以 $n$.

如果设置 reduction='sum' 则不会再除以 $n$.

1.2. 参数

[1] - size_average 和 reduce 都是待弃用的参数. 不推荐使用.

[2] - reduction: 可选参数：none、mean和sum 三种. 默认为 mean.

none: 不进行简化(reduction).
mean: 对输出求和，并除以元素总数.
sum: 对输出求和，但不除以元素总数.

1.3. Shape

[1] - 输入 - input x, $(N, \ast )$.

[2] - 输入 - target y, $(N, \ast )$，与输出保持一致.

[3] - 输出 - 如果 reduce='none', 输出与其输入一致, $(N, \ast)$.

1.4. 示例

import torch 
import torch.nn as nn

loss = nn.L1Loss()
input = torch.randn(2, 3, requires_grad=True)
target = torch.randn(2, 3)
output = loss(input, target)
output.backward()

2. MSELoss

class torch.nn.MSELoss(size_average=None,
                       reduce=None,
                       reduction='mean')

2.1. 说明

MSE Loss 计算的是，input x 和 target y 的 n 个元素间的平方差(mean squard error, squard L2 norm).

[1] - 未简化版(unreduced，reduction='none')：

$$ \ell(x,y) = L = \{l_1, ..., l_N\}^T, l_n = (x_n - y_n)^2 $$

其中，$N$ - batchsize.

[2] - 简化版(reduced, reduction='mean' 或 reduction='sum')：

$$ \ell(x) = \begin{cases} mean(L) &\text{if reduction='mean' } \\ sum(L) &\text{if reduction = 'sum'}\end{cases} $$

其中，sum 操作仍是对所用元素进行操作，并除以 $n$.

如果设置 reduction='sum' 则不会再除以 $n$.

2.2. 参数

与 1.1. 参数 一致.

2.3. Shape

[1] - 输入 - input x, $(N, \ast )$.

[2] - 输入 - target y, $(N,\ast )$，与输出保持一致.

2.4. 示例

import torch 
import torch.nn as nn

loss = nn.MSELoss()
input = torch.randn(2, 3, requires_grad=True)
target = torch.randn(2, 3)
output = loss(input, target)
output.backward()

3. CrossEntropyLoss

class torch.nn.CrossEntropyLoss(weight=None, 
                                size_average=None, 
                                ignore_index=-100,
                                reduce=None,
                                reduction='mean')

3.1. 说明

CrossEntropyLoss，交叉熵损失函数，等价于nn.LogSoftmax() 和 nn.NLLloss() 的计算.

其应用于，$C$ 类的分类问题.

如果设置了 weight 参数，则应该是对应每一个类别的权重，1D Tensor. 对于训练集数据不均衡的场景比较有用.

CrossEntropyLoss 的输入是关于每个类别的原始未归一化的值. input 是 $(minibatch, C)$ 或 $(minibatch, C, d_1, d_2, ..., d_K), (K \geq 1)$ Shape 的 Tensor.

CrossEntropyLoss 要求类别索引是 $[0, C-1]$. 如果设置了 ignore_index 参数，该损失函数也会接受该ignore_index类别索引(ignore_index 可以不在 $[0, C-1]$ 范围内.)

[1] - 未设定 weight 参数，不带权重时，

$$ loss(x, class) = -log (\frac{exp(x[class])}{\sum_j exp(x[j])}) = -x[class] + log(\sum_j exp(x[j])) $$

[2] - 设定 weight 参数，带权重时，

$$ loss(x, class) = weight[class](-x[class] + log(\sum_j exp(x[j]))) $$

loss 计算时对每个 minibatch 内的所有样本求均值.

CrossEntropyLoss 也可以用于高维输入，比如，2D 图像，此时其输入shape 为 $(minibatch, C, d_1, d_2, ..., d_K), (K \geq 1)$，其中 $K$ 是维数.

3.2. 参数

[1] - size_average 和 reduce 都是待弃用的参数. 不推荐使用.

[2] - weight: 关于每个类别的缩放权重，size 为 $C$ 的 Tensor.

[3] - ignore_index: 指定要忽略的 target 值，其不对输入梯度有贡献. 当 size_average='True'时，损失函数值仅对 non-ignored targets 进行求均值.

[4] - reduction: 可选参数：none、mean和sum 三种. 默认为 mean.

none: 不进行简化(reduction).
mean: 对输出求和，并除以元素总数.
sum: 对输出求和，但不除以元素总数.

3.3. Shape

[1] - 输入 - input，$(N, C)$，其中，$C$ 是类别数. 或，$(N, C, d_1, d_2, ..., d_K), K\geq 1$.

[2] - 输入 - target，$(N)$，其中，每个值的是 $ 0 \leq target[i] < \leq C-1$. 或，$(N, d_1, d_2, ..., d_K)$.

[3] - 输出 - output，标量值. 如果 reduction='none'，则与target 的size一致, $(N)$. 或，$(N, d_1, d_2, ..., d_k)$.

3.4. 示例

loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()

4. NLLLoss

class torch.nn.NLLLoss(weight=None, 
                       size_average=None,
                       ignore_index=-100, 
                       reduce=None,
                       reduction='mean')

4.1. 说明

NLLLoss，负对数似然函数(negative log likelihood). 适用于 $C$ 类的分类任务.

[1] - 未简化版(unreduced，reduction='none')：

$$ \ell (x, y) = L = \lbrace l_1, ..., l_N \rbrace ^T $$

$$ l_n = -w_{y_n} x_{n, y_n} $$

$$ w_c = weight[c] \cdot 1 \lbrace c \neq \text{ignore_index} \rbrace $$

[2] - 简化版(reduced, reduction='mean' 或 reduction='sum')：

$$ \ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N} l_n &\text{if reduction='mean' } \\ \sum_{n=1}^N l_n &\text{if reduction = 'sum'} \end{cases} $$

支持高维输入 inputs, 如 2D images, NLLLoss 计算的是对每个像素样本的.

4.2. 参数

同 3.2. 参数.

4.3. Shape

同 3.3. Shape.

4.4. 示例

# 1D
import torch 
import torch.nn as nn

logsoft = nn.LogSoftmax()
loss = nn.NLLLoss()
# input, NxC=2x3
input = torch.randn(3, 5, requires_grad=True)
# target, N, 每个值的是 [0, C).
target = torch.tensor([1, 0, 4])
output = loss(logsoft(input), target)
output.backward()

# 2D

N, C = 5, 4
loss = nn.NLLLoss()
# input, N x C x height x width
data = torch.randn(N, 16, 10, 10)
conv = nn.Conv2d(16, C, (3, 3))
m = nn.LogSoftmax(dim=1)
# target
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
output = loss(m(conv(data)), target)
output.backward()

5. KLDivLoss

class torch.nn.KLDivLoss(size_average=None, 
                         reduce=None, 
                         reduction='mean')

5.1. 说明

KLDivLoss, Kullback-Leibler divergence Loss.

KL 散度适用于连续分布间的距离度量. 往往用于对(离散采样的)连续输出分布的空间进行直接回归.

类似于 NLLLoss，KLDivLoss 的输入也是 log-probabilities，且不严格要求于 2D Tensor. target 也是以概率形式给定的(未取对数).

[1] - 未简化版(unreduced，reduction='none')：

$$ \ell (x, y) = L = \lbrace l_1, ...,l_N \rbrace $$

$$ l_n = y_n \cdot (log \ y_n - x_n) $$

其中，索引 $N$ 贯穿了 input 的所有维度. $L$ 与 input 具有相同的 shape.

[2] - 简化版(reduced, reduction='mean' 或 reduction='sum')：

$$ \ell(x, y) = \begin{cases} mean(L) &\text{if reduction='mean' } \\ sum(L) &\text{if reduction = 'sum'} \end{cases} $$

KLDivLoss 对于每个 minibatch 的样本和维度进行求均值.

5.2. 参数

同 1.1. 参数 .

下一个 release 版本中，reduction='mean' 会更新为 reduction='batchmean'.

5.3. Shape

同 1.1. Shape.

6. BCELoss

class torch.nn.BCELoss(weight=None, 
                       size_average=None, 
                       reduce=None, 
                       reduction='mean')

6.1. 说明

BCELoss 计算的是 target 和 output 检测二值交叉熵.

[1] - 未简化版(unreduced，reduction='none')：

$$ \ell(x, y) = L = \lbrace l_1, ..., l_N \rbrace ^T $$

$$ l_n = -w_n [y_n \cdot log \ x_n + (1 - y_n) \cdot log(1 - x_n)] $$

其中，$N$ 为 batchsize.

[2] - 简化版(reduced, reduction='mean' 或 reduction='sum')：

$$ \ell(x) = \begin{cases} mean(L) &\text{if reduction='mean' } \\ sum(L) &\text{if reduction = 'sum'}\end{cases} $$

常用于度量重建误差，例如，auto-encoder.

注， targets $y$ 的值的取值范围为 [0, 1].

6.2. 参数

[1] - size_average 和 reduce 都是待弃用的参数. 不推荐使用.

[2] - weight: 关于每个类别的缩放权重，size 为 $C$ 的 Tensor.

[3] - reduction: 可选参数：none、mean和sum 三种. 默认为 mean.

none: 不进行简化(reduction).
mean: 对输出求和，并除以元素总数.
sum: 对输出求和，但不除以元素总数.

6.3. Shape

[1] - 输入 - input，$(N, \ast)$.

[2] - 输入 - target，$(N, \ast )$.

[3] - 输出 - output，标量值. 如果 reduction='none'，则与target 的size一致, $(N, \ast )$.

6.4. 示例

m = nn.Sigmoid()
loss = nn.BCELoss()

input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(m(input), target)
output.backward()

7. BCEWithLogitsLoss

class torch.nn.BCEWithLogitsLoss(weight=None, 
                                 size_average=None, 
                                 reduce=None, 
                                 reduction='mean',
                                 pos_weight=None)

7.1. 说明

BCEWithLogitsLoss，是将 Sigmoid 层和 BCELoss 层整合.

相比于，采用 Sigmoid + BCELoss，BCEWithLogitsLoss 具有更好的数值稳定性.(主要是利用了 log-sum-exp 技巧来实现数据稳定性).

[1] - 未简化版(unreduced，reduction='none')：

$$ \ell(x, y) = L = \lbrace l_1, ..., l_N \rbrace ^T $$

$$ l_n = -w_n [y_n \cdot log \ \sigma(x_n) + (1 - y_n) \cdot log(1 - \sigma (x_n)] $$

其中，$N$ 为 batchsize.

[2] - 简化版(reduced, reduction='mean' 或 reduction='sum')：

$$ \ell(x) = \begin{cases} mean(L) &\text{if reduction='mean' } \\ sum(L) &\text{if reduction = 'sum'}\end{cases} $$

常用于度量重建误差，例如，auto-encoder.

注，targets $t[i]$ 的取值范围为 [0, 1].

对于BCEWithLogitsLoss，通过增加正样本(positive examples)的权重，可以实现 recall 和 precision 间的均衡. 对于 multi-label 分类问题：

$$ \ell_c(x, y) = L_c = \lbrace l_{1, c}, ..., l_{N,c} \rbrace ^T $$

$$ l_{n,c} = -w_{n,c} [p_c y_{n,c} \cdot log \ \sigma(x_{n_c}) + (1 - y_{n, c}) \cdot log(1 - \sigma (x_{n, c})] $$

其中，$c$ 为类别数(对于 multi-label 二值分类问题， $c>1$；对于 single-label 二值分类问题，$c=1$).

$n$ 为 batch 内的样本数;

$p_c$ 为对于类别 $c$ ，其正样本的权重. $p_c > 1$ 其提升 recall；$p_c < 1$ 其提升 precision.

如：

# 64 classes, batch size = 10
target = torch.ones([10, 64], dtype=torch.float32) 
# A prediction (logit)
output = torch.full([10, 64], 0.999)  
# All weights are equal to 1
pos_weight = torch.ones([64])  
criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
# -log(sigmoid(0.999))
print(criterion(output, target))
#tensor(0.3135)

7.2. 参数

[1] - size_average 和 reduce 都是待弃用的参数. 不推荐使用.

[2] - weight: 关于每个类别的缩放权重，size 为 $C$ 的 Tensor.

[3] - reduction: 可选参数：none、mean和sum 三种. 默认为 mean.

none: 不进行简化(reduction).
mean: 对输出求和，并除以元素总数.
sum: 对输出求和，但不除以元素总数.

[4] - pos_weight: 正样本的权重. 必须是与类别数长度一致的向量.

7.3. Shape

同 6.3. Shape.

7.4. 示例

loss = nn.BCEWithLogitsLoss()
input = torch.randn(3, requires_grad=True)
target = torch.empty(3).random_(2)
output = loss(input, target)
output.backward()

8. MarginRankingLoss

class torch.nn.MarginRankingLoss(margin=0.0, 
                                 size_average=None, 
                                 reduce=None, 
                                 reduction='mean')

8.1. 说明

MarginRankingLoss，计算的是，给定输入两个 1D min-batch Tensor $x1$ 和 $x2$ 以及一个 1D mini-batch tensor $y$ (包含1或-1) 的度量.

如果 $y=1$，则假设第一个输入比第二个输入的排名更高(值更大)；

如果 $y=-1$，则假设第二个输入比第一个输入的排名更高(值更大).

对于 mini-batch 内的每个样本，

$$ loss(x, y) = max(0, -y \ast (x1-x2) + \text{margin}) $$

8.2. 参数

[1] - margin - 默认值为 0.

其它参数同 1.2. 参数.

8.3. Shape

[1] - 输入 - input，$(N, D)$, $N$ 为 batch-size，$D$ 是样本的维度.

[2] - 输入 - target, $(N)$

[3] - 输出 - 标量，如果 reduction='none'，则为 $(N)$.

9. HingeEmbeddingLoss

class torch.nn.HingeEmbeddingLoss(margin=1.0, 
                                  size_average=None,
                                  reduce=None, 
                                  reduction='mean')

9.1. 说明

HingeEmbeddingLoss 计算的是，给定输入 tensor $x$ 和 labels tensor $y$ (包含1和-1) 时的损失函数.

HingeEmbeddingLoss 往往用于度量两个输入是否相似，如采用 L1 成对距离(pairdistance).

HingeEmbeddingLoss 多用于学习非线性嵌入(nonlinear embeddings) 或者半监督学习.

对于 mini-batch 内的第 $n$ 个样本，

$$ l_n = \begin{cases} x_n &\text { if } y_n=1 \\ max\lbrace 0, \Delta - x_n \rbrace &\text { if } y_n = -1 \end{cases} $$

总损失函数为：

$$ L = \{l_1, ..., l_N\}^T $$

$$ \ell(x) = \begin{cases} mean(L) &\text{if reduction='mean' } \\ sum(L) &\text{if reduction = 'sum'}\end{cases} $$

10. MultiLabelMarginLoss

class torch.nn.MultiLabelMarginLoss(size_average=None, 
                                    reduce=None, 
                                    reduction='mean')

10.1. 说明

MultiLabelMarginLoss 度量的是，给定输入2D mini-batch Tensor $x$ 和输出 2D Tensor 目标类别索引 $y$ 时的 multi-class multi-classification hinge loss 函数.

对于 mini-batch 内的每个样本：

$$ loss(x, y) = \sum_{ij} \frac{max(0, 1-(x[y[j]] -x[i]))}{x.size(0)} $$

其中，$x \in \lbrace 0, ..., x.size(0)-1 \rbrace$.

$y \in \lbrace 0, ..., y.size(0)-1 \rbrace$.

$0 \leq y[j] \leq x.size(0)-1$.

对于所有 $i$ 和 $j$，$i \neq y[j]$.

$y$ 和 $x$ size 一致.

MultiLabelMarginLoss 仅针对从上开始的非负 targets 连续块的部分.

MultiLabelMarginLoss 有助于解决不同样本具有不同数量 target 类别的问题.

10.2. 参数

同 1.2. 参数 .

10.3. Shape

[1] - 输入 - input，$C$ 或 $(N, C)$, $N$ 为 batch-size，$C$ 是类别.

[2] - 输入 - target, $(C)$ 或 $(N, C)$，为了确保与 input shape 一致，target label 会补 -1.

[3] - 输出 - 标量，如果 reduction='none'，则为 $(N)$.

10.4. 示例

loss = nn.MultiLabelMarginLoss()
x = torch.FloatTensor([[0.1, 0.2, 0.4, 0.8]])
# for target y, only consider labels 3 and 0, not after label -1
y = torch.LongTensor([[3, 0, -1, 1]])
print(loss(x, y))
# 0.25 * ((1-(0.1-0.2)) + (1-(0.1-0.4)) + (1-(0.8-0.2)) + (1-(0.8-0.4)))
#tensor(0.8500)

11. SmoothL1Loss

class torch.nn.SmoothL1Loss(size_average=None,
                            reduce=None, 
                            reduction='mean')

11.1. 说明

如果每个元素的绝对误差都小于 1，则 SmoothL1Loss 采用平方项；否则采用 L1项.

相比于 MSELoss，SmoothL1Loss 对 outliers 更不敏感，在某些场景下避免梯度爆炸(如，Fast R-CNN 论文里).

SmoothL1Loss，也叫做 Huber loss:

$$ loss(x, y) = \frac{1}{n} \sum_{i} z_i $$

其中，

$$ z_i = \begin{cases} 0.5(x_i - y_i)^2 &\text { if } |x_i - y_i| <1 \\ |x_i - y_i| - 0.5 &\text { otherwise } \end{cases} $$

11.2. 参数

同 1.2. 参数

11.3. Shape

同 1.3. Shape.

12. SoftMarginLoss

torch.nn.SoftMarginLoss(size_average=None, 
                        reduce=None, 
                        reduction='mean')

12. 说明

SoftMarginLoss 计算的是，输入 tensor $x$ 和 target tensor $y$ (包含 1 或 -1) 间的二类分类逻辑损失函数(two-class classification logistic loss ).

$$ loss(x,y) = \sum_i \frac{log(1 + exp(-y[i] \ast x[i]))}{x.nelement()} $$

13. MultiLabelSoftMarginLoss

class torch.nn.MultiLabelSoftMarginLoss(weight=None, 
                                        size_average=None, 
                                        reduce=None,
                                        reduction='mean')

13.1. 说明

MultiLabelSoftMarginLoss 计算的是，输入 tensor $x$ 和 target tensor $y$ (size 为 $(N, C)$ 间，基于最大熵(max-entropy)优化 multi-label one-versus-all(一对多)的损失函数.

对于 minibatch 内的每个样本，

$$ loss(x, y) = -\frac{1}{C} \ast \sum_i y[i] \ast log((1 + exp(-x[i]))^{-1}) + (1 - y[i]) \ast log(\frac{exp(-x[i])}{1+exp(-x[i])}) $$

其中，$i \in {0, ..., x/nElement()-1}$, $y[i] \in \lbrace 0, 1 \rbrace$.

14. CosineEmbeddingLoss

class torch.nn.CosineEmbeddingLoss(margin=0.0, 
                                   size_average=None, 
                                   reduce=None, 
                                   reduction='mean')

14.1. 说明

CosineEmbeddingLoss 计算的是，给定输入 tensor $x1$ 和 $x2$，以及 label Tensor $y$ 值为 1 或 -1 时的损失函数.

CosineEmbeddingLoss 用于采用 cosine 距离来度量两个输入是否相似. 往往被用于学习非线性嵌入和半监督学习中.

对于每个样本的损失函数为：

$$ l_n = \begin{cases} 1 - cos(x_1, x_2) &\text { if } y=1 \\ max\lbrace 0, cos(x_1, x_2)- \text{margin} \rbrace &\text { if } y = -1 \end{cases} $$

15. MultiMarginLoss

class torch.nn.MultiMarginLoss(p=1, 
                               margin=1.0, 
                               weight=None, 
                               size_average=None, 
                               reduce=None, 
                               reduction='mean')

15.1. 说明

对于每个 mini-batch 样本，关于 1D 输入 $x$ 和标量输出 $y$ 的损失函数为：

$$ loss(x, y) = \frac{\sum_i max(0, (\text{margin} - x[y] + x[i])^p)}{x.size(0)} $$

其中，$x \in \lbrace 0, ..., x.size(0)-1 \rbrace$ , $x \neq y$.

此外，也可以通过传递 1D weight tensor 参数，设定非负权重：

$$ loss(x, y) = \frac{\sum_i max(0, w[y] \ast (\text{margin} - x[y] + x[i])^p)}{x.size(0)} $$

16. TripletMarginLoss

class torch.nn.TripletMarginLoss(margin=1.0, 
                                 p=2.0, 
                                 eps=1e-06, 
                                 swap=False, 
                                 size_average=None, 
                                 reduce=None, 
                                 reduction='mean')

16.1. 说明

TripletMarginLoss 是三元损失函数度量，给定输入 tensor $x1$, $x2$, $x3$ 以及一个值大于 0 的 margin.

TripletMarginLoss 用于度量样本间的相对相似性.

一个三元组由 $a$、$p$ 和 $n$ 组成(分别表示 anchor， positive example 和 negative example.)

TripletMarginLoss 所有输入 tensors 的 shape 都是 $(N, D)$.

距离 swap 参考论文：Learning shallow convolutional feature descriptors with triplet losses by V. Balntas, E. Riba et al.

对于 mini-batch 内的每个样本，

$$ L(a, p, n) = max \lbrace d(a_i, p_i) - d(a_i, n_i) + \text{margin}, 0 \rbrace $$

其中，

$$ d(x_i, y_i) = ||\mathbf{x}_i - \mathbf{y}_i||_p $$

16.2. 参数

[1] - margin - 默认值为 1.

[2] - p - 成对距离的范围，默认为 2.

[3] - swap - swap 距离.

其它参数同 1.2. 参数.

16.3. Shape

[1] - 输入 - input, $(N, D)$，其中，$D$ 为向量维度.

[2] - 输出 - output, 标量. 如果 reduction='none'，则为 $(N)$.

16.4. 示例

triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2)
anchor = torch.randn(100, 128, requires_grad=True)
positive = torch.randn(100, 128, requires_grad=True)
negative = torch.randn(100, 128, requires_grad=True)
output = triplet_loss(anchor, positive, negative)
output.backward()

17. PoissonNLLLoss

class torch.nn.PoissonNLLLoss(log_input=True, 
                              full=False, 
                              size_average=None, 
                              eps=1e-08, 
                              reduce=None, 
                              reduction='mean')

17.1. 说明

PoissonNLLLoss 计算的是，target 服从 Possion 分布的负对数似然函数.

PoissonNLLLoss 可以表述为：

$$ target \sim Poisson(input) $$

$$ loss(input, target) = input - target \ast log(input) + log(target !) $$

最后一项可以忽略，或者采用 Stirling 表达式来逼近. 逼近形式用于 target 值大于1的场景. 对于 target 值小于等于 1时，损失函数加零.

17.2. 参数

[1] - log_input - 如果值为 True，则损失函数计算为： $exp(input) - target \ast input$. 如果值为 False，则损失函数计算为： $input - target \ast log(input + eps)$.

[2] - full - 是否计算完整损失函数，如，添加 Stirling 逼近项：$target \ast log(target) - target + 0.5 \ast log(2 \pi target)$.

其它参数同 1.2. 参数.

17.3. 示例

loss = nn.PoissonNLLLoss()
log_input = torch.randn(5, 2, requires_grad=True)
target = torch.randn(5, 2)
output = loss(log_input, target)
output.backward()

18. CTCLoss

class torch.nn.CTCLoss(blank=0, 
                       reduction='mean', 
                       zero_infinity=False)

18.1. 说明

Connectionist Temporal Classification loss.

时序数据分类的损失函数.

18.2. 示例

T = 50      # Input sequence length
C = 20      # Number of classes (including blank)
N = 16      # Batch size
S = 30      # Target sequence length of longest target in batch
S_min = 10  # Minimum target length, for demonstration purposes

# Initialize random batch of input vectors, for *size = (T,N,C)
input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()

# Initialize random batch of targets (0 = blank, 1:C = classes)
target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)

input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)
ctc_loss = nn.CTCLoss()
loss = ctc_loss(input, target, input_lengths, target_lengths)
loss.backward()

1. L1Loss

1.1. 说明

1.2. 参数

1.3. Shape

1.4. 示例

2. MSELoss

2.1. 说明

2.2. 参数

2.3. Shape

2.4. 示例

3. CrossEntropyLoss

3.1. 说明

3.2. 参数

3.3. Shape

3.4. 示例

4. NLLLoss

4.1. 说明

4.2. 参数

4.3. Shape

4.4. 示例

5. KLDivLoss

5.1. 说明

5.2. 参数

5.3. Shape

6. BCELoss

6.1. 说明

6.2. 参数

6.3. Shape

6.4. 示例

7. BCEWithLogitsLoss

7.1. 说明

7.2. 参数

7.3. Shape

7.4. 示例

8. MarginRankingLoss

8.1. 说明

8.2. 参数

8.3. Shape

9. HingeEmbeddingLoss

9.1. 说明

10. MultiLabelMarginLoss

10.1. 说明

10.2. 参数

10.3. Shape

10.4. 示例

11. SmoothL1Loss

11.1. 说明

11.2. 参数

11.3. Shape

12. SoftMarginLoss

12. 说明

13. MultiLabelSoftMarginLoss

13.1. 说明

14. CosineEmbeddingLoss

14.1. 说明

15. MultiMarginLoss

15.1. 说明

16. TripletMarginLoss

16.1. 说明

16.2. 参数

16.3. Shape

16.4. 示例

17. PoissonNLLLoss

17.1. 说明

17.2. 参数

17.3. 示例

18. CTCLoss

18.1. 说明

18.2. 示例

※相关文章推荐※

※最新文章推荐※