Friday, January 16, 2009 - 13:00
1 hour (actually 50 minutes)
It is shown (theoretically and empirically) that a reliable result can be gained only in the case of a certain relation between the capacity of the class of models from which we choose and the size of the training set. There are different ways to measure the capacity of a class of models. In practice the size of a training set is always finite and limited. It leads to an idea to choose a model from the most narrow class, or in other words to use the simplest model (Occam's razor). But if our class is narrow, it is possible that there is no true model within the class or a model close to the true one. It means that there will be greater residual error or larger number of errors even on the training set. So the problem of model complexity choice arises – to find a balance between errors due to limited number of training data and errors due to excessive model simplicity. I shall review different approaches to the problem.