MMDS 2010. Workshop on Algorithms for Modern Massive Data Sets

Stanford University
June 15–18, 2010

Synopsis

The Workshops on Algorithms for Modern Massive Data Sets (MMDS 2010) addressed algorithmic and statistical challenges in modern large-scale data analysis. The goals of this series of workshops are to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets; and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote the cross-fertilization of ideas.

Schedule and Slides:

Tuesday, June 15, 2010. Theme: Large-scale Data and Large-scale Computation

Time Talk
8:00 - 10:00 Breakfast and Registration -- outside Cubberley Auditorium (at the Stanford School of Education, just off the Main Quad)
9:45 - 10:00 Welcome and Opening Remarks -- in Cubberley Auditorium
10:00 - 11:00 Tutorial: Peter Norvig
Internet-Scale Data Analysis
11:00 - 11:30 Ashok Srivastava
Virtual Sensors and Large-Scale Gaussian Processes
11:30 - 12:00 John Langford
A Method for Parallel Online Learning
2:00 - 3:00 Tutorial: John Gilbert
Combinatorial Scientific Computing: Experience and Challenges
3:00 - 3:30 Deepak Agarwal
Recommender Probems for Content Optimization
3:30 - 4:00 James Demmel
Minimizing Communication in Linear Algebra
4:30 - 5:00 Dmitri Krioukov
Hyperbolic Mapping of Complex Networks
5:00 - 5:30 Mehryar Mohri
Matrix Approximation for Large-Scale Learning
5:30 - 6:00 David Bader
Massive-Scale Analytics of Streaming Social Networks
6:00 - 6:30 Ely Porat
Fast Pseudo-Random Fingerprints

Wednesday, June 16, 2010. Theme: Networked Data and Algorithmic Tools

Time Talk
9:00 - 10:00 Tutorial: Peter Bickel
Statistical Inference for Networks
10:00 - 10:30 Jure Leskovec
Inferring Networks of Diffusion and Influence
11:00 - 11:30 Michael W. Mahoney
Geometric Network Analysis Tools
11:30 - 12:00 Edward Chang
AdHEat - A New Influence-based Social Ads Model and its Tera-Scale Algorithms
12:00 - 12:30 Mauro Maggioni
Intrinsic Dimensionality Estimation and Multiscale Geometry of Data Sets
2:30 - 3:00 Guillermo Sapiro
Collaborative Hierarchical Sparse Models
3:00 - 3:30 Alekh Agarwal and Peter Bartlett
Information-theoretic Lower Bounds on the Oracle Complexity of Convex Optimization
3:30 - 4:00 John Duchi and Yoram Singer
Composite Objective Optimization and Learning for Massive Datasets
4:30 - 5:00 Steven Hillion
MAD Analytics in Practice
5:00 - 5:30 Matthew Harding
Outlier Detection in Financial Trading Networks
5:30 - 6:00 Neel Sundrahan
Large Dataset Problems at the Long Tail

Thursday, June 17, 2010. Theme: Spectral Methods and Sparse Matrix Methods

Time Talk
9:00 - 10:00 Tutorial: Sebastiano Vigna
Spectral Ranking
10:00 - 10:30 Robert Stine
Streaming Feature Selection
11:00 - 11:30 Konstantin Mischaikow
A Combinatorial Framework for Nonlinear Dynamics
11:30 - 12:00 Alfred Hero
Sparse Correlation Screening in High Dimension
12:00 - 12:30 Susan Holmes
Heterogeneous Data Challenge Combining Complex Data
2:30 - 3:30 Tutorial: Piotr Indyk
Sparse Recovery Using Sparse Matrices
3:30 - 4:00 Sayan Mukherjee
Efficient Dimension Reduction on Massive Data
4:30 - 5:00 Padhraic Smyth
Statistical Modeling of Large-Scale Sensor Count Data
5:00 - 5:30 Ping Li
Compressed Counting and Application in Estimating Entropy of Data Steams
5:30 - 6:00 Edo Liberty
Scaleable Correlation Clustering Algorithms

Friday, June 18, 2010. Theme: Randomized Algorithms for Data

Time Talk
9:00 - 10:00 Tutorial: Petros Drineas
Randomized Algorithms in Linear Algebra and Large Data Applications
10:00 - 10:30 Gunnar Martinsson
Randomized methods for Computing the SVD/PCA of Very Large Matrices
11:00 - 11:30 Ilse Ipsen
Numerical Reliability of Randomized Algorithms
11:30 - 12:00 Philippe Rigollet
Optimal Rates of Sparse Esimation and Universal Aggregation
12:00 - 12:30 Alexandre d'Aspremont
Subsampling, Spectral Methods & Semidefinite Programming
2:30 - 3:00 Gary Miller
Specialized System Solvers for very large Systems: Theory and Practice
3:00 - 3:30 John Wright and Emmanuel Candes
Robust Principal Component Analysis?
3:30 - 4:00 Alon Orlitsky
Estimation, Prediction, and Classification over Large Alphabets
4:30 - 5:00 Ken Clarkson
Numerical Linear Algebra in the Streaming Model
5:00 - 5:30 David Woodruff
Fast Lp Regression in Data Streams

MMDS 2010 Confirmed Speakers

Cancelled with apologies


MMDS 2010 Organizers

Organizing Committee:
Michael Mahoney (chair), Alex Shkolnik, Petros Drineas, Lek-Heng Lim, Gunnar Carlsson

Sponsors

The MMDS 2010 Organizers and the MMDS Foundation would like to thank the following institutional sponsors for their generous support:

AFOSR LBNL National Science Foundation ONR stanford

Past MMDS Events

MMDS 2008. Workshop on Algorithms for Modern Massive Data Sets, Stanford, CA, June 25–28, 2008.

MMDS 2006. Workshop on Algorithms for Modern Massive Data Sets, Stanford, CA, June 21–24, 2006.