Ale*_*exP 1 python backpropagation cosine-similarity scikit-learn tensorflow
正如标题所示,我正在尝试基于 SimCLR 框架训练模型(见本文:https ://arxiv.org/pdf/2002.05709.pdf - NT_Xent 损失在等式(1)和算法 1 中说明) )。
我设法创建了损失函数的 numpy 版本,但这不适合训练模型,因为 numpy 数组无法存储反向传播所需的信息。我很难将我的 numpy 代码转换为 Tensorflow。这是我的 numpy 版本:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Define the contrastive loss function, NT_Xent
def NT_Xent(zi, zj, tau=1):
""" Calculates the contrastive loss of the input data using NT_Xent. The
equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
Args:
zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
zj: Other half of the input data, must have the same shape as zi
tau: Temperature parameter (a constant), default = 1.
Returns:
loss: The complete NT_Xent constrastive loss
"""
z = np.concatenate((zi, zj), 0)
loss = 0
for k in range(zi.shape[0]):
# Numerator (compare i,j & j,i)
i = k
j = k + zi.shape[0]
sim_ij = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[j].reshape(1, -1)))
sim_ji = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[i].reshape(1, -1)))
numerator_ij = np.exp(sim_ij / tau)
numerator_ji = np.exp(sim_ji / tau)
# Denominator (compare i & j to all samples apart from themselves)
sim_ik = np.squeeze(cosine_similarity(z[i].reshape(1, -1), z[np.arange(z.shape[0]) != i]))
sim_jk = np.squeeze(cosine_similarity(z[j].reshape(1, -1), z[np.arange(z.shape[0]) != j]))
denominator_ik = np.sum(np.exp(sim_ik / tau))
denominator_jk = np.sum(np.exp(sim_jk / tau))
# Calculate individual and combined losses
loss_ij = - np.log(numerator_ij / denominator_ik)
loss_ji = - np.log(numerator_ji / denominator_jk)
loss += loss_ij + loss_ji
# Divide by the total number of samples
loss /= z.shape[0]
return loss
Run Code Online (Sandbox Code Playgroud)
我相当有信心这个函数会产生正确的结果(尽管速度很慢,因为我在网上看到它的其他实现是矢量化版本 - 例如 Pytorch 的这个:https : //github.com/Spijkervet/SimCLR/blob/ master/modules/nt_xent.py(我的代码对相同的输入产生相同的结果),但我没有看到它们的版本在数学上如何与论文中的公式等效,因此我尝试构建自己的)。
作为第一次尝试,我已将 numpy 函数转换为它们的 TF 等效函数(tf.concat、tf.reshape、tf.math.exp、tf.range 等),但我相信我唯一/主要的问题是 sklearn 的 cosine_similarity 函数返回一个 numpy 数组,我不知道如何在 Tensorflow 中自己构建这个函数。有任何想法吗?
我设法自己弄明白了!我没有意识到余弦相似度函数“tf.keras.losses.CosineSimilarity”的 Tensorflow 实现
这是我的代码:
import tensorflow as tf
# Define the contrastive loss function, NT_Xent (Tensorflow version)
def NT_Xent_tf(zi, zj, tau=1):
""" Calculates the contrastive loss of the input data using NT_Xent. The
equation can be found in the paper: https://arxiv.org/pdf/2002.05709.pdf
(This is the Tensorflow implementation of the standard numpy version found
in the NT_Xent function).
Args:
zi: One half of the input data, shape = (batch_size, feature_1, feature_2, ..., feature_N)
zj: Other half of the input data, must have the same shape as zi
tau: Temperature parameter (a constant), default = 1.
Returns:
loss: The complete NT_Xent constrastive loss
"""
z = tf.cast(tf.concat((zi, zj), 0), dtype=tf.float32)
loss = 0
for k in range(zi.shape[0]):
# Numerator (compare i,j & j,i)
i = k
j = k + zi.shape[0]
# Instantiate the cosine similarity loss function
cosine_sim = tf.keras.losses.CosineSimilarity(axis=-1, reduction=tf.keras.losses.Reduction.NONE)
sim = tf.squeeze(- cosine_sim(tf.reshape(z[i], (1, -1)), tf.reshape(z[j], (1, -1))))
numerator = tf.math.exp(sim / tau)
# Denominator (compare i & j to all samples apart from themselves)
sim_ik = - cosine_sim(tf.reshape(z[i], (1, -1)), z[tf.range(z.shape[0]) != i])
sim_jk = - cosine_sim(tf.reshape(z[j], (1, -1)), z[tf.range(z.shape[0]) != j])
denominator_ik = tf.reduce_sum(tf.math.exp(sim_ik / tau))
denominator_jk = tf.reduce_sum(tf.math.exp(sim_jk / tau))
# Calculate individual and combined losses
loss_ij = - tf.math.log(numerator / denominator_ik)
loss_ji = - tf.math.log(numerator / denominator_jk)
loss += loss_ij + loss_ji
# Divide by the total number of samples
loss /= z.shape[0]
return loss
Run Code Online (Sandbox Code Playgroud)
如您所见,我基本上只是将 numpy 函数换成了 TF 等价物。一个主要的注意点是我必须在“cosine_sim”函数中使用“reduction=tf.keras.losses.Reduction.NONE”,这是为了保持“sim_ik”和“sim_jk”中的形状一致,否则由此产生的损失与我原来的 numpy 实现不匹配。
我还注意到单独计算 i,j 和 j,i 的分子是多余的,因为答案是相同的,所以我删除了该计算的一个实例。
当然,如果有人有更快的实现,我很高兴听到它!
| 归档时间: |
|
| 查看次数: |
477 次 |
| 最近记录: |