Thursday, March 29, 2012 - 15:05
1 hour (actually 50 minutes)
The study of prediction within the realm of Statistical Learning Theory is intertwined with the study of the supremum of an empirical process. The supremum can be analyzed with classical tools:Vapnik-Chervonenkis and scale-sensitive combinatorial dimensions, covering and packing numbers, and Rademacher averages. Consistency of empirical risk minimization is known to be closely related to theuniform Law of Large Numbers for function classes.In contrast to the i.i.d. scenario, in the sequential prediction framework we are faced with an individual sequence of data on which weplace no probabilistic assumptions. The problem of universal prediction of such deterministic sequences has been studied withinStatistics, Information Theory, Game Theory, and Computer Science. However, general tools for analysis have been lacking, and mostresults have been obtained on a case-by-case basis.In this talk, we show that the study of sequential prediction is closely related to the study of the supremum of a certain dyadic martingale process on trees. We develop analogues of the Rademacher complexity, covering numbers and scale-sensitive dimensions, which canbe seen as temporal generalizations of the classical results. The complexities we define also ensure uniform convergence for non-i.i.d. data, extending the Glivenko-Cantelli type results. Analogues of local Rademacher complexities can be employed for obtaining fast rates anddeveloping adaptive procedures. Our understanding of the inherent complexity of sequential prediction is complemented by a recipe that can be used for developing new algorithms.