2 min readfrom Machine Learning

Loss functions in Instance Representation Learning [R]

Loss functions in Instance Representation Learning [R]
Loss functions in Instance Representation Learning [R]

In Wu et. al, the MLE objective is computationally infeasible due to the high number of images in the dataset.

Non-parametric Softmax

Negative Log-Likelihood

With large n, the denominator in (2) is hard to compute. Therefore, they use NCE (Noise-Contrastive Estimation).

The NCE Objective

Essentially, they approximate the difficult loss in (3) with the easier to compute loss in (7). However, we end up estimating the denominator anyways in (8). Why not just approximate the denominator in (2) with (8)?

I asked Claude about this and it said something about it being a biased estimator, but I didn't really get that. I'm also a little confused on the connection of the original NCE formulation as being a way to estimate density and the way it is used here; do we do this because NCE loss is easier to compute and as m (the number of noise samples) increases, we get the gradients of NCE loss and gradients of NLL loss to match?

submitted by /u/No_Balance_9777
[link] [comments]

Want to read more?

Check out the full article on the original site

View original article

Tagged with

#financial modeling with spreadsheets
#large dataset processing
#rows.com
#machine learning in spreadsheet applications
#Instance Representation Learning
#Loss Functions
#MLE (Maximum Likelihood Estimation)
#NCE (Noise-Contrastive Estimation)
#Negative Log-Likelihood
#Softmax
#Density Estimation
#Biased Estimator
#Gradients
#Computational Infeasibility
#Noise Samples
#Approximation
#Denominator
#Dataset
#Machine Learning
#Non-parametric