如何计算 Keras 中多个类的总损失?

本文介绍了如何计算 Keras 中多个类的总损失?的处理方法,对大家解决问题具有一定的参考价值

问题描述

假设我有以下参数的网络:

Let's say I have network with following params:

  1. 用于语义分割的全卷积网络
  2. loss = 加权二元交叉熵(但它可以是任何损失函数,无所谓)
  3. 5 个类 - 输入是图像,ground truth 是二进制掩码
  4. 批量大小 = 16

现在,我知道损失是按以下方式计算的:二进制交叉熵应用于图像中每个像素的每个类.所以本质上,每个像素将有 5 个损失值

Now, I know that the loss is calculated in the following manner: binary cross entropy is applied to each pixel in the image with regards to each class. So essentially, each pixel will have 5 loss values

这一步之后会发生什么?

当我训练我的网络时,它只打印一个时期的单个损失值.生成单个值需要发生许多级别的损失累积,文档/代码中根本不清楚它是如何发生的.

When I train my network, it prints only a single loss value for an epoch. There are many levels of loss accumulation that need to happen to produce a single value and how it happens is not clear at all in the docs/code.

  1. 首先组合什么 - (1) 类的损失值(例如,每个像素组合 5 个值(每个类一个),然后是图像中的所有像素或(2)图像中的所有像素每个单独的类的图像,然后所有的类损失合并?
  2. 这些不同的像素组合究竟是如何发生的 - 在何处求和/在何处求平均值?
  3. Keras 的 binary_crossentropy 轴上的平均值=-1.那么这是每个类别的所有像素的平均值还是所有类别的平均值,还是两者兼而有之?
  1. What gets combined first - (1) the loss values of the class(for instance 5 values(one for each class) get combined per pixel) and then all the pixels in the image or (2)all the pixels in the image for each individual class, then all the class losses are combined?
  2. How exactly are these different pixel combinations happening - where is it being summed / where is it being averaged?
  3. Keras's binary_crossentropy averages over axis=-1. So is this an average of all the pixels per class or average of all the classes or is it both??

换一种说法:不同类别的损失如何组合以产生图像的单个损失值?

这在文档中根本没有解释,并且对于在 keras 上进行多类预测的人非常有帮助,无论网络类型如何.这是 keras 代码的开头链接 第一个传入损失函数的地方.

This is not explained in the docs at all and would be very helpful for people doing multi-class predictions on keras, regardless of the type of network. Here is the link to the start of keras code where one first passes in the loss function.

我能找到的最接近解释的是

The closest thing I could find to an explanation is

loss:字符串(目标函数名称)或目标函数.见损失.如果模型有多个输出,您可以通过传递字典或损失列表对每个输出使用不同的损失.模型最小化的损失值将是所有单个损失的总和

loss: String (name of objective function) or objective function. See losses. If the model has multiple outputs, you can use a different loss on each output by passing a dictionary or a list of losses. The loss value that will be minimized by the model will then be the sum of all individual losses

来自 keras.那么这是否意味着将图像中每个类的损失简单地相加?

from keras. So does this mean that the losses for each class in the image is simply summed?

示例代码在这里供人们试用.这是从 Kaggle 并针对多标签预测进行了修改:

Example code here for someone to try it out. Here's a basic implementation borrowed from Kaggle and modified for multi-label prediction:

# Build U-Net model
num_classes = 5
IMG_DIM = 256
IMG_CHAN = 3
weights = {0: 1, 1: 1, 2: 1, 3: 1, 4: 1000} #chose an extreme value just to check for any reaction
inputs = Input((IMG_DIM, IMG_DIM, IMG_CHAN))
s = Lambda(lambda x: x / 255) (inputs)

c1 = Conv2D(8, (3, 3), activation='relu', padding='same') (s)
c1 = Conv2D(8, (3, 3), activation='relu', padding='same') (c1)
p1 = MaxPooling2D((2, 2)) (c1)

c2 = Conv2D(16, (3, 3), activation='relu', padding='same') (p1)
c2 = Conv2D(16, (3, 3), activation='relu', padding='same') (c2)
p2 = MaxPooling2D((2, 2)) (c2)

c3 = Conv2D(32, (3, 3), activation='relu', padding='same') (p2)
c3 = Conv2D(32, (3, 3), activation='relu', padding='same') (c3)
p3 = MaxPooling2D((2, 2)) (c3)

c4 = Conv2D(64, (3, 3), activation='relu', padding='same') (p3)
c4 = Conv2D(64, (3, 3), activation='relu', padding='same') (c4)
p4 = MaxPooling2D(pool_size=(2, 2)) (c4)

c5 = Conv2D(128, (3, 3), activation='relu', padding='same') (p4)
c5 = Conv2D(128, (3, 3), activation='relu', padding='same') (c5)

u6 = Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same') (c5)
u6 = concatenate([u6, c4])
c6 = Conv2D(64, (3, 3), activation='relu', padding='same') (u6)
c6 = Conv2D(64, (3, 3), activation='relu', padding='same') (c6)

u7 = Conv2DTranspose(32, (2, 2), strides=(2, 2), padding='same') (c6)
u7 = concatenate([u7, c3])
c7 = Conv2D(32, (3, 3), activation='relu', padding='same') (u7)
c7 = Conv2D(32, (3, 3), activation='relu', padding='same') (c7)

u8 = Conv2DTranspose(16, (2, 2), strides=(2, 2), padding='same') (c7)
u8 = concatenate([u8, c2])
c8 = Conv2D(16, (3, 3), activation='relu', padding='same') (u8)
c8 = Conv2D(16, (3, 3), activation='relu', padding='same') (c8)

u9 = Conv2DTranspose(8, (2, 2), strides=(2, 2), padding='same') (c8)
u9 = concatenate([u9, c1], axis=3)
c9 = Conv2D(8, (3, 3), activation='relu', padding='same') (u9)
c9 = Conv2D(8, (3, 3), activation='relu', padding='same') (c9)

outputs = Conv2D(num_classes, (1, 1), activation='sigmoid') (c9)

model = Model(inputs=[inputs], outputs=[outputs])
model.compile(optimizer='adam', loss=weighted_loss(weights), metrics=[mean_iou])

def weighted_loss(weightsList):
    def lossFunc(true, pred):

        axis = -1 #if channels last 
        #axis=  1 #if channels first        
        classSelectors = K.argmax(true, axis=axis) 
        classSelectors = [K.equal(tf.cast(i, tf.int64), tf.cast(classSelectors, tf.int64)) for i in range(len(weightsList))]
        classSelectors = [K.cast(x, K.floatx()) for x in classSelectors]
        weights = [sel * w for sel,w in zip(classSelectors, weightsList)] 

        weightMultiplier = weights[0]
        for i in range(1, len(weights)):
            weightMultiplier = weightMultiplier + weights[i]

        loss = BCE_loss(true, pred) - (1+dice_coef(true, pred))
        loss = loss * weightMultiplier
        return loss
    return lossFunc
model.summary()

可以在此处找到实际的 BCE-DICE 损失函数.

The actual BCE-DICE loss function can be found here.

问题的动机:基于上述代码,网络在 20 个 epoch 后的总验证损失约为 1%;然而,前 4 个班级的联合分数的平均交集每个都在 95% 以上,但最后一个班级的平均交集为 23%.清楚地表明第 5 节课表现得并不好.然而,这种准确性的损失并没有反映在损失中.因此,这意味着样本的单个损失正在以一种完全否定我们在第 5 类中看到的巨大损失的方式组合.而且,当每个样本的损失被批量合并时,它仍然非常低.我不确定如何协调这些信息.

Motivation for the question: Based on the above code, the total validation loss of the network after 20 epochs is ~1%; however, the mean intersection over union scores for the first 4 classes are above 95% each, but for the last class its 23%. Clearly indicating that the 5th class isn't doing well at all. However, this loss in accuracy isn't being reflected at all in the loss. Hence, that means the individual losses for the sample are being combined in a way that completely negates the huge loss we see for the 5th class. And, so when the per sample losses are being combined over batch, it's still really low. I'm not sure how to reconcile this information.

推荐答案

虽然我已经在 相关答案中提到了这个答案的一部分,但让我们一步一步检查源代码,了解更多细节,以找到具体的答案.

Although I have already mentioned part of this answer in a related answer, but let's inspect the source code step-by-step with more details to find the answer concretely.

首先,让我们前馈(!):有一个对weighted_loss函数的调用,它接受y_truey_predsample_weight>mask 作为输入:

First, Let's feedforward(!): there is a call to weighted_loss function which takes y_true, y_pred, sample_weight and mask as inputs:

weighted_loss = weighted_losses[i]
# ...
output_loss = weighted_loss(y_true, y_pred, sample_weight, mask)

weighted_loss 实际上是 列表的一个元素,其中包含传递给 fit 方法的所有(增强的)损失函数:

weighted_loss is actually an element of a list which contains all the (augmented) loss functions passed to fit method:

weighted_losses = [
    weighted_masked_objective(fn) for fn in loss_functions]

我提到的增强"一词在这里很重要.这是因为,正如您在上面看到的,实际的损失函数被另一个名为 weighted_masked_objective 定义如下:

The "augmented" word I mentioned is important here. That's because, as you can see above, the actual loss function is wrapped by another function called weighted_masked_objective which has been defined as follows:

def weighted_masked_objective(fn):
    """Adds support for masking and sample-weighting to an objective function.
    It transforms an objective function `fn(y_true, y_pred)`
    into a sample-weighted, cost-masked objective function
    `fn(y_true, y_pred, weights, mask)`.
    # Arguments
        fn: The objective function to wrap,
            with signature `fn(y_true, y_pred)`.
    # Returns
        A function with signature `fn(y_true, y_pred, weights, mask)`.
    """
    if fn is None:
        return None

    def weighted(y_true, y_pred, weights, mask=None):
        """Wrapper function.
        # Arguments
            y_true: `y_true` argument of `fn`.
            y_pred: `y_pred` argument of `fn`.
            weights: Weights tensor.
            mask: Mask tensor.
        # Returns
            Scalar tensor.
        """
        # score_array has ndim >= 2
        score_array = fn(y_true, y_pred)
        if mask is not None:
            # Cast the mask to floatX to avoid float64 upcasting in Theano
            mask = K.cast(mask, K.floatx())
            # mask should have the same shape as score_array
            score_array *= mask
            #  the loss per batch should be proportional
            #  to the number of unmasked samples.
            score_array /= K.mean(mask)

        # apply sample weighting
        if weights is not None:
            # reduce score_array to same ndim as weight array
            ndim = K.ndim(score_array)
            weight_ndim = K.ndim(weights)
            score_array = K.mean(score_array,
                                 axis=list(range(weight_ndim, ndim)))
            score_array *= weights
            score_array /= K.mean(K.cast(K.not_equal(weights, 0), K.floatx()))
        return K.mean(score_array)
return weighted

所以,有一个嵌套函数,weighted,它实际上调用了 score_array = fn(y_true, y_pred)<行中真正的损失函数 fn/代码>.现在,具体而言,在 OP 提供的示例中,fn(即损失函数)是 binary_crossentropy.所以我们需要看看<的定义代码>binary_crossentropy() 在 Keras 中:

So, there is a nested function, weighted, that actually calls the real loss function fn in the line score_array = fn(y_true, y_pred). Now, to be concrete, in case of the example the OP provided, the fn (i.e. loss function) is binary_crossentropy. Therefore we need to take a look at the definition of binary_crossentropy() in Keras:

def binary_crossentropy(y_true, y_pred):
    return K.mean(K.binary_crossentropy(y_true, y_pred), axis=-1)

依次调用后端函数K.binary_crossentropy().在使用 Tensorflow 作为后端的情况下,K.binary_crossentropy() 如下:

which in turn, calls the backend function K.binary_crossentropy(). In case of using Tensorflow as the backend, the definition of K.binary_crossentropy() is as follows:

def binary_crossentropy(target, output, from_logits=False):
    """Binary crossentropy between an output tensor and a target tensor.
    # Arguments
        target: A tensor with the same shape as `output`.
        output: A tensor.
        from_logits: Whether `output` is expected to be a logits tensor.
            By default, we consider that `output`
            encodes a probability distribution.
    # Returns
        A tensor.
    """
    # Note: tf.nn.sigmoid_cross_entropy_with_logits
    # expects logits, Keras expects probabilities.
    if not from_logits:
        # transform back to logits
        _epsilon = _to_tensor(epsilon(), output.dtype.base_dtype)
        output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
        output = tf.log(output / (1 - output))

    return tf.nn.sigmoid_cross_entropy_with_logits(labels=target,
                                                   logits=output)

tf.nn.sigmoid_cross_entropy_with_logits 返回:

一个与 logits 形状相同的张量,带有组件逻辑损失.

A Tensor of the same shape as logits with the componentwise logistic losses.

现在,让我们反向传播(!):考虑到上面的注释,K.binray_crossentropy 的输出形状将与 y_pred(或 y_true).正如 OP 所提到的,y_true 的形状为 (batch_size, img_dim, img_dim, num_classes).因此,将 K.mean(...,axis=-1) 应用于形状为 (batch_size, img_dim, img_dim, num_classes) 的张量,结果是(batch_size, img_dim, img_dim) 形状的输出张量.所以所有类的损失值都是为图像中的每个像素取平均值.因此,上述weighted函数中score_array的形状就是(batch_size, img_dim, img_dim).还有一个步骤:weighted 函数中的 return 语句再次取平均值,即 return K.mean(score_array).那么它是如何计算均值的呢?如果你看一下 nofollow的定义code>mean 后端函数你会发现 axis 参数默认是 None :

nofollow的定义code>mean 后端函数你会发现 axis 参数默认是 None :

Now, let's backpropagate(!): considering the above note, the output shape of K.binray_crossentropy would be the same as y_pred (or y_true). As the OP mentioned, y_true has a shape of (batch_size, img_dim, img_dim, num_classes). Therefore, the K.mean(..., axis=-1) is applied over a tensor of shape (batch_size, img_dim, img_dim, num_classes) which results in an output tensor of shape (batch_size, img_dim, img_dim). So the loss values of all classes are averaged for each pixel in the image. Hence, the shape of score_array in weighted function mentioned above would be (batch_size, img_dim, img_dim). There is one more step: the return statement in weighted function takes the mean again i.e. return K.mean(score_array). So how does it compute the mean? If you take a look at the definition of mean backend function you would find out that the axis argument is None by default:

def mean(x, axis=None, keepdims=False):
    """Mean of a tensor, alongside the specified axis.
    # Arguments
        x: A tensor or variable.
        axis: A list of integer. Axes to compute the mean.
        keepdims: A boolean, whether to keep the dimensions or not.
            If `keepdims` is `False`, the rank of the tensor is reduced
            by 1 for each entry in `axis`. If `keepdims` is `True`,
            the reduced dimensions are retained with length 1.
    # Returns
        A tensor with the mean of elements of `x`.
    """
    if x.dtype.base_dtype == tf.bool:
        x = tf.cast(x, floatx())
return tf.reduce_mean(x, axis, keepdims)

nofollow的定义code>mean 后端函数你会发现 axis 参数默认是 None :它调用 tf.reduce_mean() 给定 axis=None 参数,取输入张量所有轴的平均值并返回一个值.因此,计算形状 (batch_size, img_dim, img_dim) 的整个张量的平均值,这意味着对批次中的所有标签及其所有像素取平均值,并返回为表示损失值的单个标量值.然后,这个损失值由Keras上报并用于优化.

And it calls the tf.reduce_mean() which given an axis=None argument, takes the mean over all the axes of input tensor and return one single value. Therefore, the mean of the whole tensor of shape (batch_size, img_dim, img_dim) is computed, which translates to taking the average over all the labels in the batch and over all their pixels, and is returned as one single scalar value which represents the loss value. Then, this loss value is reported back by Keras and is used for optimization.

奖励:如果我们的模型有多个输出层,因此使用多个损失函数会怎样?

记住我在这个答案中提到的第一段代码:

Remember the first piece of code I mentioned in this answer:

weighted_loss = weighted_losses[i]
# ...
output_loss = weighted_loss(y_true, y_pred, sample_weight, mask)

如您所见,有一个 i 变量用于索引数组.您可能猜对了:它实际上是循环的一部分,该循环使用其指定的损失函数计算每个输出层的损失值,然后将所有这些损失值的(加权)总和计算为 计算总损失:

As you can see there is an i variable which is used for indexing the array. You may have guessed correctly: it is actually part of a loop which computes the loss value for each output layer using its designated loss function and then takes the (weighted) sum of all these loss values to compute the total loss:

# Compute total loss.
total_loss = None
with K.name_scope('loss'):
    for i in range(len(self.outputs)):
        if i in skip_target_indices:
            continue
        y_true = self.targets[i]
        y_pred = self.outputs[i]
        weighted_loss = weighted_losses[i]
        sample_weight = sample_weights[i]
        mask = masks[i]
        loss_weight = loss_weights_list[i]
        with K.name_scope(self.output_names[i] + '_loss'):
            output_loss = weighted_loss(y_true, y_pred,
                                        sample_weight, mask)
        if len(self.outputs) > 1:
            self.metrics_tensors.append(output_loss)
            self.metrics_names.append(self.output_names[i] + '_loss')
        if total_loss is None:
            total_loss = loss_weight * output_loss
        else:
            total_loss += loss_weight * output_loss
    if total_loss is None:
        if not self.losses:
            raise ValueError('The model cannot be compiled '
                                'because it has no loss to optimize.')
        else:
            total_loss = 0.

    # Add regularization penalties
    # and other layer-specific losses.
    for loss_tensor in self.losses:
        total_loss += loss_tensor  

这篇关于如何计算 Keras 中多个类的总损失?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,WP2

admin_action_{$_REQUEST[‘action’]}

do_action( "admin_action_{$_REQUEST[‘action’]}" )动作钩子::在发送“Action”请求变量时激发。Action Hook: Fires when an ‘action’ request variable is sent.目录锚点:#说明#源码说明(Description)钩子名称的动态部分$_REQUEST['action']引用从GET或POST请求派生的操作。源码(Source)更新版本源码位置使用被使用2.6.0 wp-admin/admin.php:...

日期:2020-09-02 17:44:16 浏览:1127

admin_footer-{$GLOBALS[‘hook_suffix’]}

do_action( "admin_footer-{$GLOBALS[‘hook_suffix’]}", string $hook_suffix )操作挂钩:在默认页脚脚本之后打印脚本或数据。Action Hook: Print scripts or data after the default footer scripts.目录锚点:#说明#参数#源码说明(Description)钩子名的动态部分,$GLOBALS['hook_suffix']引用当前页的全局钩子后缀。参数(Parameters)参数类...

日期:2020-09-02 17:44:20 浏览:1032

customize_save_{$this->id_data[‘base’]}

do_action( "customize_save_{$this-&gt;id_data[‘base’]}", WP_Customize_Setting $this )动作钩子::在调用WP_Customize_Setting::save()方法时激发。Action Hook: Fires when the WP_Customize_Setting::save() method is called.目录锚点:#说明#参数#源码说明(Description)钩子名称的动态部分,$this->id_data...

日期:2020-08-15 15:47:24 浏览:775

customize_value_{$this->id_data[‘base’]}

apply_filters( "customize_value_{$this-&gt;id_data[‘base’]}", mixed $default )过滤器::过滤未作为主题模式或选项处理的自定义设置值。Filter Hook: Filter a Customize setting value not handled as a theme_mod or option.目录锚点:#说明#参数#源码说明(Description)钩子名称的动态部分,$this->id_date['base'],指的是设置...

日期:2020-08-15 15:47:24 浏览:866

get_comment_author_url

过滤钩子:过滤评论作者的URL。Filter Hook: Filters the comment author’s URL.目录锚点:#源码源码(Source)更新版本源码位置使用被使用 wp-includes/comment-template.php:32610...

日期:2020-08-10 23:06:14 浏览:903

network_admin_edit_{$_GET[‘action’]}

do_action( "network_admin_edit_{$_GET[‘action’]}" )操作挂钩:启动请求的处理程序操作。Action Hook: Fires the requested handler action.目录锚点:#说明#源码说明(Description)钩子名称的动态部分$u GET['action']引用请求的操作的名称。源码(Source)更新版本源码位置使用被使用3.1.0 wp-admin/network/edit.php:3600...

日期:2020-08-02 09:56:09 浏览:848

network_sites_updated_message_{$_GET[‘updated’]}

apply_filters( "network_sites_updated_message_{$_GET[‘updated’]}", string $msg )筛选器挂钩:在网络管理中筛选特定的非默认站点更新消息。Filter Hook: Filters a specific, non-default site-updated message in the Network admin.目录锚点:#说明#参数#源码说明(Description)钩子名称的动态部分$_GET['updated']引用了非默认的...

日期:2020-08-02 09:56:03 浏览:834

pre_wp_is_site_initialized

过滤器::过滤在访问数据库之前是否初始化站点的检查。Filter Hook: Filters the check for whether a site is initialized before the database is accessed.目录锚点:#源码源码(Source)更新版本源码位置使用被使用 wp-includes/ms-site.php:93910...

日期:2020-07-29 10:15:38 浏览:809

WordPress 的SEO 教学:如何在网站中加入关键字(Meta Keywords)与Meta 描述(Meta Description)?

你想在WordPress 中添加关键字和meta 描述吗?关键字和meta 描述使你能够提高网站的SEO。在本文中,我们将向你展示如何在WordPress 中正确添加关键字和meta 描述。为什么要在WordPress 中添加关键字和Meta 描述?关键字和说明让搜寻引擎更了解您的帖子和页面的内容。关键词是人们寻找您发布的内容时,可能会搜索的重要词语或片语。而Meta Description则是对你的页面和文章的简要描述。如果你想要了解更多关于中继标签的资讯,可以参考Google的说明。Meta 关键字和描...

日期:2020-10-03 21:18:25 浏览:1620

谷歌的SEO是什么

SEO (Search Engine Optimization)中文是搜寻引擎最佳化,意思近于「关键字自然排序」、「网站排名优化」。简言之,SEO是以搜索引擎(如Google、Bing)为曝光媒体的行销手法。例如搜寻「wordpress教学」,会看到本站的「WordPress教学:12个课程…」排行Google第一:关键字:wordpress教学、wordpress课程…若搜寻「网站架设」,则会看到另一个网页排名第1:关键字:网站架设、架站…以上两个网页,每月从搜寻引擎导入自然流量,达2万4千:每月「有机搜...

日期:2020-10-30 17:23:57 浏览:1264