- You are here:
- GT Home
- Home
- News & Events

Series: Stochastics Seminar

Motivated by the ubiquity of signal-plus-noise type models in high-dimensional statistical signal processing and machine learning, we consider the eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Applications in mind are as diverse as radar, sonar, wireless communications, spectral clustering, bio-informatics and Gaussian mixture cluster analysis in machine learning. We provide an application-independent approach that brings into sharp focus a fundamental informational limit of high-dimensional eigen-analysis. Building on this success, we highlight the random matrix origin of this informational limit, the connection with "free" harmonic analysis and discuss how to exploit these insights to improve low-rank signal matrix denoising relative to the truncated SVD.

Series: Stochastics Seminar

We study the asymptotic behaviors of the pairwise angles among n randomly and uniformly distributed unit vectors in p-dimensional spaces as the number of points n goes to infinity, while the dimension p is either fixed or growing with n. For both settings, we derive the limiting empirical distribution of the random angles and the limiting distributions of the extreme angles. The results reveal interesting differences in the two settings and provide a precise characterization of the folklore that ``all high-dimensional random vectors are almost always nearly orthogonal to each other". Applications to statistics and connections with some open problems in physics and mathematics are also discussed. This is a joint work with Tony Cai and Jianqing Fan.

Series: Stochastics Seminar

A highlight in the study of quantum physics was the work of Knizhnik, Polyakov and Zamolodchikov (1988), in which they proposed a relation (KPZ relation) between the scaling dimension of a statistical physics model in Euclidean geometry and its counterpart in the random geometry. More recently, Duplantier and Sheffield (2011) used the 2-dim Gaussian free field to construct the Liouville quantum gravity measure on a planar domain, and gave the first mathematically rigorous formulation and proof of the KPZ relation in that setting. Inspired by the work of Duplantier and Sheffield, we apply a similar approach to extend their results and techniques to higher even dimensions R^(2n) for n>=2. This talk mainly focuses on the case of R^4. I will briefly introduce the notion of Gaussian free field (GFF). In our work we adopt a specific 4-dim GFF to construct a random Borel measure on R^4 which formally has the density (with respect to the Lebesgue measure) being the exponential of an instance of the GFF. Further we establish a 4-dim KPZ relation corresponding to this random measure. This work is joint with Dmitry Jakobson (McGill University).

Series: Stochastics Seminar

The limiting spectral distributions of many sparse random graph models are known to contain atoms. But do they also have some continuous part? In this talk, I will give affirmative answer to this question for several widely studied models of random graphs including Erdos-Renyi random graph G(n,c/n) with c > 1, random graphs with certain degree distributions and supercritical bond percolation on Z^2. I will also present several open problems. This is joint work with Charles Bordenave and Balint Virag.

Series: Stochastics Seminar

The 2-core of a hypergraph is the unique subgraph where all vertices have
degree at least 2 and which is the maximal induced subgraph with this
property. This talk will be about the investigation of the 2-core for a
particular random hypergraph model --- a model which differs from the usual
random uniform hypergraph in that the vertex degrees are not identically
distributed.
For this model the main result proved is that as the size of the vertex
set, n, tends to infinity then the number of hyperedges in the 2-core obeys
a limit law, and this limit exhibits a threshold where the number of
hyperedges in the 2-core transitions from o(n) to Theta(n). We will
discuss aspects of the ideas involved and discuss the background motivation
for the hypergraph model: factoring random integers into primes.

Series: Stochastics Seminar

Random k-SAT is a distribution over boolean formulas studied widely in both statistical physics and theoretical computer science for its intriguing behavior at its phase transition. I will present results on the satisfiability threshold in a geometric model of random k-SAT: labeled boolean literals are placed uniformly at random in a d-dimensional cube, and for each set of k contained in a ball of radius r, a k-clause is added to the random formula. Unlike standard random k-SAT, this model exhibits dependence between the clauses. For all k we show that the satisfiability threshold is sharp, and for k=2 we find the location of the threshold as well. I will also discuss connections between this model, the random geometric graph, and other probabilistic models. This is based on joint work with Milan Bradonjic.

Series: Stochastics Seminar

The problem of finding large average submatrices of a real-valued matrix arises in the exploratory analysis of data from disciplines as diverse as genomics and social sciences. Motivated in part by previous work on this applied problem, this talk will present several new theoretical results concerning large average submatrices of an n x n Gaussian random matrix. We will begin by considering the average and joint distribution of the k x k submatrix having largest average value (the global maximum). We then turn our attention to submatrices with dominant row and column sums, which arise as the local maxima of a practical iterative search procedure for large average submatrices I will present a result characterizing the value and joint distribution of a local maximum, and show that a typical local maxima has an average value within a constant factor of the global maximum. In the last part of the talk I will describe several results concerning the *number* L_n(k) of k x k local maxima, including the asymptotic behavior of its mean and variance for fixed k and increasing n, and a central limit theorem for L_n(k) that is based on Stein's method for normal approximation.
Joint work with Shankar Bhamidi (UNC) and Partha S. Dey (UIUC)

Series: Stochastics Seminar

We consider optimal alignments of random sequences of length n which are i.i.d. For
such alignments we count which letters get aligned with
which letters how often. This gives as for every opitmal alignment
the frequency of the aligned letter pairs. These
frequencies expressed as relative frequencies and put
in vector form are called the "empirical distribution of letter pairs
along an optimal alignment". It was previously established
that if the scoring function is chosen at random,
then the empirical distribution of letter pairs along an opitmal
alignment converges. We show an upper bound for the rate of convergence
which is larger thatn the rate of the alignement score.
the rate of the alignemnt score can be obtained directly
by Azuma-Hoeffding, but not so for the empirical distribution of the aligned letter
pairs seen along an opitmal alignment:
which changing on letter in one of the sequences,
the optimal alginemnt score changes by at most a fixed quantity,
but the empirical distribution of the aligned letter pairs
potentially could change entirely.

Series: Stochastics Seminar

We analyze active learning algorithms, which only receive the classifications of examples when they ask for them, and traditional passive (PAC) learning algorithms, which receive classifications for all training examples, under log-concave and nearly log-concave distributions. By using an aggressive localization argument, we prove that active learning provides an exponential improvement over passive learning when learning homogeneous linear separators in these settings. Building on this, we then provide a computationally efficient algorithm with optimal sample complexity for passive learning in such settings. This provides the first bound for a polynomial-time algorithm that is tight for an interesting infinite class of hypothesis functions under a general class of data-distributions, and also characterizes the distribution-specific sample complexity for each distribution in the class. We also illustrate the power of localization for efficiently learning linear separators in two challenging noise models (malicious noise and agnostic setting) where we provide efficient algorithms with significantly better noise tolerance than previously known.

Series: Stochastics Seminar

We consider two approaches to address angles between random subspaces: classical random matrix theory and free probability. In the former, one constructs random subspaces from vectors with independent random entries. In the latter, one has historically started with the uniform distribution on subspaces of appropriate dimension. We point out when these two approaches coincide and present new results for both. In particular, we present the first universality result for the random matrix theory approach and present the first result beyond uniform distribution for the free probability approach. We further show that, unexpectedly, discrete uncertainty principles play a natural role in this setting. This work is partially with L. Erdos and G. Anderson.