Information Geometry: Geometerization of Information and Statistical Inference
Jun Zhang
University of Michigan, Ann Arbor
Abstract:
Information Geometry is the differential geometric study of the manifold of probability models, and promises to be a unifying geometric framework for investigating statistical inference, information theory, machine learning, etc. Instead of using metric for measuring distances on such manifolds, these applications often use “divergence functions” for measuring proximity of two points (that do not impose symmetry and triangular inequality), for instance Kullback-Leibler divergence, Bregman divergence, f-divergence, etc. Divergence functions are tied to generalized entropy (for instance, Tsallis entropy, Renyi entropy, phi-entropy) and cross-entropy functions widely used in machine learning and information sciences. It turns out that divergence functions enjoy pleasant geometric properties – they induce what is called “statistical structure” on a manifold M: a Riemannian metric g together with a pair of affine connections D, D*, such that D and D* are both Codazzi coupled to g while being conjugate to each other. Divergence functions also induce a natural symplectic structure on the product manifold MxM for which M with statistical structure is a Lagrange submanifold. In joint work with M. Leok, we shown how divergence functions allow us to decouple Hamiltonian and Lagrangian dynamics in geometric mechanics. We recently characterize (para-) holomorphicity of D, D* in the (para-)Hermitian setting, and show that statistical structures can be enhanced to (para-)Hermitian and (para-)Kahler manifolds. The surprisingly rich geometric structures and properties of a statistical manifold open up the intriguing possibility of geometrizing statistical inference, information, and machine learning in string-theoretic languages.