Here we'll make some remarks about latent spaces and derive the evidence lower bound
(ELBO).
Used in variational Bayesian inference, the ELBO serves as a proxy
for learning a model of the true posterior over latent variables.
The framework assumes that in the real world, an observation
is generated by some underlying set of latent or "hidden" variables
, which
are the true representation of the object whereas are
the measurable features. Latent variable models make this assumption about the world
and one can maximize the ELBO as part of an algorithm for generating novel data points.
Luo gave an analogy between Plato's Allegory of the Cave and latent spaces. In the allegory, cavemen see each other as shadows cast by campfires. These shadows have measurable features in the cavemen's observable world (the shadow's height and width, the shape of the ears, and so forth), while a set of latents represent a complete description (temperament, language, caloric intake, etc.). The latents are a rich representation, while the observations are a compressed but measurable manifestation of cavemen. From the observer's point of view, many aspects of the real objects are "lost in translation".
On the other hand, the latent space can be of lower dimension than the data, in which the latents represent a compressed version of the measured data. Whether such a compression is good or bad depends on how accurately one can reconstruct the inputs from their latents.
Since learning a higher dimensional latent space requires priors with strong opinions about the structure of the latent variable distribution (posterior regularization), most generative algorithms specify a latent space of equal or lower dimension.
Conversely, if the data is highly complex or contains intricate patterns, the optimal latent dimension may be higher and meaningfully capture underlying data complexity. In a previous project, we found cases of overcomplete autoencoders performing better on an anomaly detection task vis-à-vis their undercomplete counterparts. In practice, the optimal dimensionality probably depends on your particular task and dataset.