MMDS 2010. Workshop on Algorithms for Modern Massive Data Sets

Stanford University
June 15–18, 2010

Synopsis

The Workshops on Algorithms for Modern Massive Data Sets (MMDS 2010) addressed algorithmic and statistical challenges in modern large-scale data analysis. The goals of this series of workshops are to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets; and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote the cross-fertilization of ideas.

The organizers thank the 216 participants and 40 speakers for their time and interest.

Schedule and talk/poster abstracts can be found in 2010 program. Also, the original conference web page.

Blogs about the event: Revolution Analytics, Big Data News, Nuit Blanche.

Schedule and Slides:

Tuesday, June 15, 2010. Theme: Large-scale Data and Large-scale Computation

Time	Talk
8:00 - 10:00	Breakfast and Registration -- outside Cubberley Auditorium (at the Stanford School of Education, just off the Main Quad)
9:45 - 10:00	Welcome and Opening Remarks -- in Cubberley Auditorium
10:00 - 11:00	Tutorial: Peter Norvig Internet-Scale Data Analysis
11:00 - 11:30	Ashok Srivastava Virtual Sensors and Large-Scale Gaussian Processes
11:30 - 12:00	John Langford A Method for Parallel Online Learning
2:00 - 3:00	Tutorial: John Gilbert Combinatorial Scientific Computing: Experience and Challenges
3:00 - 3:30	Deepak Agarwal Recommender Probems for Content Optimization
3:30 - 4:00	James Demmel Minimizing Communication in Linear Algebra
4:30 - 5:00	Dmitri Krioukov Hyperbolic Mapping of Complex Networks
5:00 - 5:30	Mehryar Mohri Matrix Approximation for Large-Scale Learning
5:30 - 6:00	David Bader Massive-Scale Analytics of Streaming Social Networks
6:00 - 6:30	Ely Porat Fast Pseudo-Random Fingerprints

Wednesday, June 16, 2010. Theme: Networked Data and Algorithmic Tools

Time	Talk
9:00 - 10:00	Tutorial: Peter Bickel Statistical Inference for Networks
10:00 - 10:30	Jure Leskovec Inferring Networks of Diffusion and Influence
11:00 - 11:30	Michael W. Mahoney Geometric Network Analysis Tools
11:30 - 12:00	Edward Chang AdHEat - A New Influence-based Social Ads Model and its Tera-Scale Algorithms
12:00 - 12:30	Mauro Maggioni Intrinsic Dimensionality Estimation and Multiscale Geometry of Data Sets
2:30 - 3:00	Guillermo Sapiro Collaborative Hierarchical Sparse Models
3:00 - 3:30	Alekh Agarwal and Peter Bartlett Information-theoretic Lower Bounds on the Oracle Complexity of Convex Optimization
3:30 - 4:00	John Duchi and Yoram Singer Composite Objective Optimization and Learning for Massive Datasets
4:30 - 5:00	Steven Hillion MAD Analytics in Practice
5:00 - 5:30	Matthew Harding Outlier Detection in Financial Trading Networks
5:30 - 6:00	Neel Sundrahan Large Dataset Problems at the Long Tail

Thursday, June 17, 2010. Theme: Spectral Methods and Sparse Matrix Methods

Time	Talk
9:00 - 10:00	Tutorial: Sebastiano Vigna Spectral Ranking
10:00 - 10:30	Robert Stine Streaming Feature Selection
11:00 - 11:30	Konstantin Mischaikow A Combinatorial Framework for Nonlinear Dynamics
11:30 - 12:00	Alfred Hero Sparse Correlation Screening in High Dimension
12:00 - 12:30	Susan Holmes Heterogeneous Data Challenge Combining Complex Data
2:30 - 3:30	Tutorial: Piotr Indyk Sparse Recovery Using Sparse Matrices
3:30 - 4:00	Sayan Mukherjee Efficient Dimension Reduction on Massive Data
4:30 - 5:00	Padhraic Smyth Statistical Modeling of Large-Scale Sensor Count Data
5:00 - 5:30	Ping Li Compressed Counting and Application in Estimating Entropy of Data Steams
5:30 - 6:00	Edo Liberty Scaleable Correlation Clustering Algorithms

Friday, June 18, 2010. Theme: Randomized Algorithms for Data

Time	Talk
9:00 - 10:00	Tutorial: Petros Drineas Randomized Algorithms in Linear Algebra and Large Data Applications
10:00 - 10:30	Gunnar Martinsson Randomized methods for Computing the SVD/PCA of Very Large Matrices
11:00 - 11:30	Ilse Ipsen Numerical Reliability of Randomized Algorithms
11:30 - 12:00	Philippe Rigollet Optimal Rates of Sparse Esimation and Universal Aggregation
12:00 - 12:30	Alexandre d'Aspremont Subsampling, Spectral Methods & Semidefinite Programming
2:30 - 3:00	Gary Miller Specialized System Solvers for very large Systems: Theory and Practice
3:00 - 3:30	John Wright and Emmanuel Candes Robust Principal Component Analysis?
3:30 - 4:00	Alon Orlitsky Estimation, Prediction, and Classification over Large Alphabets
4:30 - 5:00	Ken Clarkson Numerical Linear Algebra in the Streaming Model
5:00 - 5:30	David Woodruff Fast Lp Regression in Data Streams

MMDS 2010 Confirmed Speakers

Alekh Agarwal	University of California, Berkeley
Deepak Agarwal	Yahoo! Research
Alexandre d'Aspremont	Princeton University
David Bader	Georgia Tech College of Computing
Peter Bickel	University of California, Berkeley
Emmanuel Candes	Stanford University
Edward Chang	Google Research
Ken Clarkson	IBM Almaden Research Center
Jim Demmel	University of California, Berkeley
John Duchi	University of California, Berkeley
John Gilbert	University of California, Santa Barbara
Matthew Harding	Stanford University
Alfred Hero	University of Michigan, Ann Arbor
Steven Hillion	Greenplum
Susan Holmes	Stanford University
Peter Indyk	Massachusetts Institute of Technology
Ilse Ipsen	North Carolina State University
Dmitri Krioukov	Cooperative Association for Internet Data Analysis
John Langford	Yahoo! Research
Jure Leskovec	Stanford University
Ping Li	Cornell University
Edo Liberty	Yahoo! Research
Mauro Maggioni	Duke University
Gunnar Martinsson	University of Colorado, Boulder
Gary Miller	Carnegie Mellon University
Konstantin Mischaikow	Rutgers University
Mehryar Mohri	New York University
Sayan Mukherjee	Duke University
Peter Norvig	Google Research
Alon Orlitsky	University of California, San Diego
Ely Porat	Bar-Ilan University
Guillermo Sapiro	University of Minnesota
Padhraic Smyth	University of California, Irvine
Ashok Srivastava	National Aeronautics and Space Administration
Neel Sundaresan	eBay Research
Robert Stine	University of Pennsylvania
Sebastiano Vigna	Università Degli Studi Di Milano
David Woodruff	IBM Almaden Research Center
John Wright	Microsoft Research Asia

Cancelled with apologies

Peter Bartlett	University of California, Berkeley
Robert Calderbank	Princeton University
Fan Chung	University of California, San Diego
Yoram Singer	Google Research
Patrick Wolfe	Harvard University