MMDS 2010. Workshop on Algorithms for Modern Massive Data Sets

The Workshops on Algorithms for Modern Massive Data Sets (MMDS) will address algorithmic and statistical challenges in modern large-scale data analysis. The goals of this series of workshops are to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets; and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote the cross-fertilization of ideas.

News

MMDS 2010 Wrap up: The workshop concluded on June 18th. We kindly thank all participants for attending.

Preliminary Schedule and Last-Minute Information:
MMDS 2010 will start at 9:45 AM on Tuesday, June 15. For those attendees who are jet-lagged or are early risers, breakfast and registration will be available starting at 8:00 AM just outside of Cubberley Auditorium!
The preliminary schedule for MMDS 2010 is available below! All talks will be held in Cubberley Auditorium in the School of Education Building. You can find the directions to the venue here!.
There is a limited amount of funding available to help with travel and lodging reimbursements, in particular for junior researchers and presenters. Please contact the organizers as soon as possible if you are interested.

Registration is now open:
MMDS 2010 is now accepting registration from corporate, government, academic and student participants. The event dates are June 15-18th. Please register by June 10 to ensure participation.
You can find information about lodging and getting around Stanford here!. More detailed information regarding the event location and schedule will soon be posted.

Poster Session:
The workshop will host a poster session which is open to all registrants. The registration fee will be waived for student poster presenters.
Please email a title and abstract to mmds-organizers@math.stanford.edu if you are interested.
Announcing:
MMDS 2010. Workshop on Algorithms for Modern Massive Data Sets.
Sponsored by Stanford University and the MMDS Foundation.
To take place on the campus of Stanford University.
June 15–18, 2010.

Organizing Committee: Michael Mahoney (chair), Alex Shkolnik, Petros Drineas, Lek-Heng Lim, Gunnar Carlsson
More details, including the registration web page and information about the poster session, will be available here soon!

Preliminary Schedule:

Tuesday, June 15, 2010. Theme: Large-scale Data and Large-scale Computation

Time	Talk
8:00 - 10:00	Breakfast and Registration -- outside Cubberley Auditorium (at the Stanford School of Education, just off the Main Quad)
9:45 - 10:00	Welcome and Opening Remarks -- in Cubberley Auditorium
10:00 - 11:00	Tutorial: Peter Norvig Internet-Scale Data Analysis
11:00 - 11:30	Ashok Srivastava Virtual Sensors and Large-Scale Gaussian Processes
11:30 - 12:00	John Langford A Method for Parallel Online Learning
2:00 - 3:00	Tutorial: John Gilbert Combinatorial Scientific Computing: Experience and Challenges
3:00 - 3:30	Deepak Agarwal Recommender Probems for Content Optimization
3:30 - 4:00	James Demmel Minimizing Communication in Linear Algebra
4:30 - 5:00	Dmitri Krioukov Hyperbolic Mapping of Complex Networks
5:00 - 5:30	Mehryar Mohri Matrix Approximation for Large-Scale Learning
5:30 - 6:00	David Bader Massive-Scale Analytics of Streaming Social Networks
6:00 - 6:30	Ely Porat Fast Pseudo-Random Fingerprints

Wednesday, June 16, 2010. Theme: Networked Data and Algorithmic Tools

Time	Talk
9:00 - 10:00	Tutorial: Peter Bickel Statistical Inference for Networks
10:00 - 10:30	Jure Leskovec Inferring Networks of Diffusion and Influence
11:00 - 11:30	Michael W. Mahoney Geometric Network Analysis Tools
11:30 - 12:00	Edward Chang AdHEat - A New Influence-based Social Ads Model and its Tera-Scale Algorithms
12:00 - 12:30	Mauro Maggioni Intrinsic Dimensionality Estimation and Multiscale Geometry of Data Sets
2:30 - 3:00	Guillermo Sapiro Collaborative Hierarchical Sparse Models
3:00 - 3:30	Alekh Agarwal and Peter Bartlett Information-theoretic Lower Bounds on the Oracle Complexity of Convex Optimization
3:30 - 4:00	John Duchi and Yoram Singer Composite Objective Optimization and Learning for Massive Datasets
4:30 - 5:00	Steven Hillion MAD Analytics in Practice
5:00 - 5:30	Matthew Harding Outlier Detection in Financial Trading Networks
5:30 - 6:00	Neel Sundrahan Large Dataset Problems at the Long Tail

Thursday, June 17, 2010. Theme: Spectral Methods and Sparse Matrix Methods

Time	Talk
9:00 - 10:00	Tutorial: Sebastiano Vigna Spectral Ranking
10:00 - 10:30	Robert Stine Streaming Feature Selection
11:00 - 11:30	Konstantin Mischaikow A Combinatorial Framework for Nonlinear Dynamics
11:30 - 12:00	Alfred Hero Sparse Correlation Screening in High Dimension
12:00 - 12:30	Susan Holmes Heterogeneous Data Challenge Combining Complex Data
2:30 - 3:30	Tutorial: Piotr Indyk Sparse Recovery Using Sparse Matrices
3:30 - 4:00	Sayan Mukherjee Efficient Dimension Reduction on Massive Data
4:30 - 5:00	Padhraic Smyth Statistical Modeling of Large-Scale Sensor Count Data
5:00 - 5:30	Ping Li Compressed Counting and Application in Estimating Entropy of Data Steams
5:30 - 6:00	Edo Liberty Scaleable Correlation Clustering Algorithms

Friday, June 18, 2010. Theme: Randomized Algorithms for Data

Time	Talk
9:00 - 10:00	Tutorial: Petros Drineas Randomized Algorithms in Linear Algebra and Large Data Applications
10:00 - 10:30	Gunnar Martinsson Randomized methods for Computing the SVD/PCA of Very Large Matrices
11:00 - 11:30	Ilse Ipsen Numerical Reliability of Randomized Algorithms
11:30 - 12:00	Philippe Rigollet Optimal Rates of Sparse Esimation and Universal Aggregation
12:00 - 12:30	Alexandre d'Aspremont Subsampling, Spectral Methods & Semidefinite Programming
2:30 - 3:00	Gary Miller Specialized System Solvers for very large Systems: Theory and Practice
3:00 - 3:30	John Wright and Emmanuel Candes Robust Principal Component Analysis?
3:30 - 4:00	Alon Orlitsky Estimation, Prediction, and Classification over Large Alphabets
4:30 - 5:00	Ken Clarkson Numerical Linear Algebra in the Streaming Model
5:00 - 5:30	David Woodruff Fast Lp Regression in Data Streams

MMDS 2010 Confirmed Speakers

Alekh Agarwal	University of California, Berkeley
Deepak Agarwal	Yahoo! Research
Alexandre d'Aspremont	Princeton University
David Bader	Georgia Tech College of Computing
Peter Bickel	University of California, Berkeley
Emmanuel Candes	Stanford University
Edward Chang	Google Research
Ken Clarkson	IBM Almaden Research Center
Jim Demmel	University of California, Berkeley
John Duchi	University of California, Berkeley
John Gilbert	University of California, Santa Barbara
Matthew Harding	Stanford University
Alfred Hero	University of Michigan, Ann Arbor
Steven Hillion	Greenplum
Susan Holmes	Stanford University
Peter Indyk	Massachusetts Institute of Technology
Ilse Ipsen	North Carolina State University
Dmitri Krioukov	Cooperative Association for Internet Data Analysis
John Langford	Yahoo! Research
Jure Leskovec	Stanford University
Ping Li	Cornell University
Edo Liberty	Yahoo! Research
Mauro Maggioni	Duke University
Gunnar Martinsson	University of Colorado, Boulder
Gary Miller	Carnegie Mellon University
Konstantin Mischaikow	Rutgers University
Mehryar Mohri	New York University
Sayan Mukherjee	Duke University
Peter Norvig	Google Research
Alon Orlitsky	University of California, San Diego
Ely Porat	Bar-Ilan University
Guillermo Sapiro	University of Minnesota
Padhraic Smyth	University of California, Irvine
Ashok Srivastava	National Aeronautics and Space Administration
Neel Sundaresan	eBay Research
Robert Stine	University of Pennsylvania
Sebastiano Vigna	Università Degli Studi Di Milano
Patrick Wolfe	Harvard University
David Woodruff	IBM Almaden Research Center
John Wright	Microsoft Research Asia

Cancelled with apologies

Peter Bartlett	University of California, Berkeley
Robert Calderbank	Princeton University
Fan Chung	University of California, San Diego
Yoram Singer	Google Research