CaffeLoss - SigmoidCrossEntropyLoss 推导与Python实现

[原文 - Caffe custom sigmoid cross entropy loss layer].
SigmoidCrossEntropyLoss 的详细推导过程
基于 Python 实现 Caffe Layer

很清晰的一篇介绍，学习下.

1. Sigmoid Cross Entropy Loss 推导

Sigmoid Cross Entropy Loss 定义形式：
${L = tln(P) + (1-t)ln(1-P) }$

其中，

t - target 或 label；
P - Sigmoid Score，

${P = \frac{1}{1+ e^{-x}} }$

有：

${L = tln(\frac{1}{1 + e^{-x}}) + (1-t)ln(1- \frac{1}{1 + e^{-x}})}$

公式推导有：

${L = tln(\frac{1}{1 + e^{-x}}) + (1-t)ln(\frac{e^{-x}}{1 + e^{-x}})}$

${L = tln(\frac{1}{1 + e^{-x}}) + ln(\frac{e^{-x}}{1 + e^{-x}}) -tln(\frac{e^{-x}}{1 + e^{-x}})}$

${L = t[ln1-ln(1 + e^{-x})] + [ln(e^{-1})-ln(1+e^{-x})]-t[ln(e^{-x})-ln(1 + e^{-x})]}$

${L = [-tln(1 + e^{-x})] + ln(e^{-x})- ln(1 + e^{-x})-tln(e^{-x}) + [tln(1 + e^{-x})]}$

合并相关项：

${L = ln(e^{-x})- ln(1 + e^{-x})- tln(e^{-x})}$

${L = -x ln(e)- ln(1 + e^{-x}) + txln(e)}$

${L = -x- ln(1 + e^{-x}) + xt}$

即：

${L = xt- x- ln(1 + e^{-x}) }$ <1>

$e^{-x}$(左) 和 $e^{x}$(右) 的函数特点：

${e^{-x}}$ 随着 ${x}$ 值的增加而减小，当 ${x}$ 值为较大的负值时，${e^{-x}}$ 值变得非常大，很容易引起溢出(overflow). 也就是说，函数需要避免出现这种数据类型.

因此，为了避免溢出，对损失函数 L 进行改动. 即，当 ${x < 0}$ 时，采用 ${e^x}$ 进行修改损失函数：

原损失函数： ${L = xt- x- ln(1 + e^{-x})}$ <1>

有： ${L = xt- x + ln(\frac{1}{1 + e^{-x} })}$

最后一项乘以 ${e^x}$：

${L = xt- x + ln(\frac{1 * e^x}{(1 + e^{-x}) * e^x})}$

${L = xt- x + ln(\frac{e^x}{1 + e^x})}$

${L = xt- x + [ln(e^x)- ln(1 + e^x)]}$

${L = xt- x + xlne- ln(1 + e^x)}$

有：

${L = xt- ln(1 + e^x)}$ <2>

根据 <1> 和 <2>，可以得到最终的损失函数：

${L = xt- x- ln(1 + e^{-x}), (x > 0)}$

${L = xt- 0- ln(1 + e^{x}), (x < 0)}$

合二为一，有：

${L = xt- max(x, 0)- ln(1 + e^{-|x|}), for\ all\ x}$

2. Sigmoid Cross Entropy Loss 求导计算

当 ${x > 0}$ 时，${L = xt- x- ln(1 + e^{-x})}$，

有：

${ \frac{\partial L}{\partial x} = \frac{\partial (xt- x- ln(1 + e^{-x}))}{\partial x} }$

${ \frac{\partial L}{\partial x} = \frac{\partial xt}{\partial x}- \frac{\partial x}{\partial x}- \frac{\partial (ln(1 + e ^ {-x}))}{\partial x} }$

${ \frac{\partial L}{\partial x} = t- 1- \frac{1}{1 + e^{-x}} * \frac{\partial (1 + e^{-x})}{\partial x} }$

${ \frac{\partial L}{\partial x} = t- 1- \frac{1}{1 + e^{-x}} * \frac{\partial (e^{-x})}{\partial x} }$

${ \frac{\partial L}{\partial x} = t- 1 + \frac{e^{-x}}{1 + e^{-x}} }$

有：

${ \frac{\partial L}{\partial x} = t- \frac{1}{1 + e^{-x}} }$

第二项为 Sigmoid 函数 ${ P = \frac{1}{1+ e^{-x}} }$，故，

${ \frac{\partial L}{\partial x} = t- P }$

当 ${ x < 0 }$ 时，${ L = xt- ln(1 + e^{x}) }$，

${ \frac{\partial L}{\partial x} = \frac{\partial (xt- ln(1 + e^{x}))}{\partial x} }$

${ \frac{\partial L}{\partial x} = \frac{\partial xt}{\partial x}- \frac{\partial (ln(1 + e^x))}{\partial x} }$

${ \frac{\partial L}{\partial x} = t- \frac{1}{1 + e^x} * \frac{\partial (e^x)}{\partial x} }$

${ \frac{\partial L}{\partial x} = t- \frac{e^x}{1 + e^x} }$

${ \frac{\partial L}{\partial x} = t- \frac{e^x * e^{-x}}{(1 + e^x)(e^{-x})} }$

${ \frac{\partial L}{\partial x} = t- \frac{1}{1 + e^{-x}} }$

第二项为 Sigmoid 函数 ${ P = \frac{1}{1+ e^{-x}} }$，故，

${ \frac{\partial L}{\partial x} = t- P }$

可以看出，对于 ${x > 0}$ 和 ${x < 0}$，其求导的结果是一样的，都是 target 值与 Sigmoid 值的差值.

3. 基于 Python 定制 caffe loss layer

Caffe 官方给出了基于 Python 定制 EuclideanLossLayer 的 Demo.

这里，根据上面的公式推导，创建基于 Python 的 Caffe SigmoidCrossEntropyLossLayer.
Caffe 自带的是 C++ 实现 - SigmoidCrossEntropyLossLayer，可见 Caffe Loss层 - SigmoidCrossEntropyLossLayer.

假设 ${ Labels \in [0, 1] }$.

3.1 SigmoidCrossEntropyLossLayer 实现

    import caffe
    import scipy

    class CustomSigmoidCrossEntropyLossLayer(caffe.Layer):

        def setup(self, bottom, top):
            # check for all inputs
            if len(bottom) != 2:
                raise Exception("Need two inputs (scores and labels) to compute sigmoid crossentropy loss.")

        def reshape(self, bottom, top):
            # check input dimensions match between the scores and labels
            if bottom[0].count != bottom[1].count:
                raise Exception("Inputs must have the same dimension.")
            # difference would be the same shape as any input
            self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)
            # layer output would be an averaged scalar loss
            top[0].reshape(1)

        def forward(self, bottom, top):
            score=bottom[0].data
            label=bottom[1].data

            first_term=np.maximum(score,0)
            second_term=-1*score*label
            third_term=np.log(1+np.exp(-1*np.absolute(score)))

            top[0].data[...]=np.sum(first_term+second_term+third_term)
            sig=scipy.special.expit(score)
            self.diff=(sig-label)
            if np.isnan(top[0].data):
                    exit()

        def backward(self, top, propagate_down, bottom):
            bottom[0].diff[...]=self.diff

3.2 prototxt 中定义

    layer {
      type: 'Python'
      name: 'loss'
      top: 'loss_opt'
      bottom: 'score'
      bottom: 'label'
      python_param {
        # the module name -- usually the filename -- that needs to be in $PYTHONPATH
        module: 'loss_layers'
        # the layer name -- the class name in the module
        layer: 'CustomSigmoidCrossEntropyLossLayer'
      }
      include {
            phase: TRAIN
      }
      # set loss weight so Caffe knows this is a loss layer.
      # since PythonLayer inherits directly from Layer, this isn't automatically
      # known to Caffe
      loss_weight: 1
    }

CaffeLoss - SigmoidCrossEntropyLoss 推导与Python实现

1. Sigmoid Cross Entropy Loss 推导

2. Sigmoid Cross Entropy Loss 求导计算

3. 基于 Python 定制 caffe loss layer

3.1 SigmoidCrossEntropyLossLayer 实现

3.2 prototxt 中定义

4. Related

※相关文章推荐※

※最新文章推荐※

Leave a Comment Cancel reply

CaffeLoss - SigmoidCrossEntropyLoss 推导与Python实现

1. Sigmoid Cross Entropy Loss 推导

2. Sigmoid Cross Entropy Loss 求导计算

3. 基于 Python 定制 caffe loss layer

3.1 SigmoidCrossEntropyLossLayer 实现

3.2 prototxt 中定义

4. Related