Hi I'm Litan. I'm interested in probabilistic machine learning
and ML infrastructure.
Below you'll find posts related to things I'm currently
learning.
In this post we take a look at the policy gradient theorem and derive
the REINFORCE algorithm, which underpins modern RL post-training
techniques including PPO and GRPO.
This post introduces basic GPU architecture concepts needed for
writing CUDA kernels, such as physical hardware organization,
logical organization/the CUDA compute model, and GPU memory
hierarchy.
Structure tensors are used in computer vision to measure the local
similarity of gradient directions within a 2D image or 3D volume.
These matrices are typically summarized into a scalar metric
called the local coherence. Here I show how coherence is closely
tied to pixel-wise principal component analysis.
Previously we derived the evidence lower bound (ELBO). The
original variational autoencoder paper introduced a
differentiable and unbiased estimator for the ELBO, and extends
the idea of simple parametric forms to complex ones via
feed-forward neural nets. We go through a complete walkthrough
of how the VAE model architecture is defined and trained.
The KL divergence of distribution $p$ from $q$ is a measure of
dissimilarity, taken as the expectation of the log difference over the support of $p$.
It's used a lot in Bayesian inference (specifically, the divergence of a posterior $p$
from some prior $q$), so we're listing some properties and closed-form equations here.