Course CS 7880 Special Topics in Theoretical Computer Science: Algorithms for Big Data
Semester Fall 2022
Instructor Soheil Behnezhad (WVH 348)
Meeting Time Tuesdays and Fridays 9:50am - 11:30am in WVH 366
Office Hours Tuesdays and Fridays 11:30am to 12pm at WVH 348
Prerequisites Mathematical maturity and some experience with probability and randomized algorithms. Students in areas other than theory are welcome to take the course!
Course Overview

The emergence of massive datasets has motivated a rapidly growing interest in algorithms that can process huge amounts of data. When working with such massive inputs, inherent assumptions of traditional algorithms (such as the possibility of reading or storing the whole input with a single machine) no longer hold. We thus need a new paradigm for the design and analysis of big data algorithms. This is precisely our focus in this course.

In this course, we will study various algorithms whose time, space, or communication are much smaller than the input size. Problems that we study will include maximum matchings, graph connectivity, minimum spanning trees, frequency estimation, independent sets, graph coloring, and clustering. We study these problems across various models of sublinear algorithms, including sublinear time algorithms, streaming algorithms, and massively parallel computation (MPC).

(Tentative) List of Topics
  • Basic probablistic tools
  • Sublinear time algorithms for average degree and connected components
  • Sublinear time algorithms for maximum matching in regular graphs
  • Sublinear time algorithms for maximal independent set and maximum matching via randomized greedy
  • Streaming algorithms: frequency moments estimation
  • Streaming lower bounds via communication complexity
  • Massively parallel computation (MPC): introduction, space-regimes, and simple tools
  • MPC algorithms for maximum matching
  • MPC algorithms for graph connectivity and minimum spanning trees
  • Big data algorithms for graph connectivity via AGM linear sketching
  • Big data algorithms for graph coloring
  • Big data algorithms for cut problems
  • Big data algorithms for clustering problems
  • 35% Final Project
  • 30% Homework Assignments
  • 35% Scribe notes and participation

Final Project: The final project will consist of exploring a research topic related to this course. More details about the final project will be announced in October.

Homework: There will be 2-3 homework assignments related to the topics discussed in the class. Solutions must be typeset in LaTeX. Groups of 2-3 students can work together on solving the problems, but each student should write the solution independently (in particular, you should understand and be able to explain everything that is written in your solution). You should also include the names of your collaborators in your solution.

Scribe notes: For each lecture, one student will be responsible to write scribe notes and send them to the instructor by 11:59pm on the Monday after the lecture to be posted on the course website. Please use this template when preparing the notes. See the template for more details about scribe notes.

# WD Date Topics References Notes
1 Fr 9/9 Introduction and Basic Models of Big Data Computation Lecture Note 1
2 Tu 9/13 Basic Probabilistic Tools Lecture Note 2
3 Fr 9/16 Sublinear Time Algorithms for Connected Components and MST [CRT'05] [CS'09] Lecture Note 3
4 Tu 9/20 Perfect Matching of Regular Bipartite Graphs in O(n log n) Time [GKK'10]
5 Fr 9/23 Lower Bounds for Sublinear Algorithms [Y'77]
6 Tu 9/27 TBA
7 Fr 9/30 TBA
8 Tu 10/4 TBA
9 Fr 10/7 TBA
10 Tu 10/11 TBA
11 Fr 10/14 TBA
12 Tu 10/18 TBA
13 Fr 10/21 TBA
14 Tu 10/25 TBA
15 Fr 10/28 TBA
16 Tu 11/1 TBA
17 Fr 11/4 TBA
18 Tu 11/8 TBA
Fr 11/11 No Class: Veterans Day
19 Tu 11/15 TBA
20 Fr 11/18 TBA
21 Tu 11/22 TBA
Fr 11/25 No Class: Thanksgiving
22 Tu 11/29 TBA
23 Fr 12/2 TBA
24 Tu 12/6 TBA
25 Fr 12/9 TBA
26 Tu 12/13 TBA
[CRT'05] Chazelle, Bernard, Ronitt Rubinfeld, and Luca Trevisan. "Approximating the minimum spanning tree weight in sublinear time." SIAM Journal on computing 34.6 (2005): 1370-1379.
[CS'09] Czumaj, Artur, and Christian Sohler. "Estimating the weight of metric minimum spanning trees in sublinear time." SIAM Journal on Computing 39.3 (2009): 904-922.
[GKK'10] Goel, Ashish, Michael Kapralov, and Sanjeev Khanna. "Perfect matchings in O (n log n) time in regular bipartite graphs." Proceedings of the Forty-second ACM Symposium on Theory of Computing (STOC). 2010.
[Y'77] Yao, Andrew Chi-Chin. "Probabilistic computations: Toward a unified measure of complexity." 18th Annual Symposium on Foundations of Computer Science (FOCS). 1977.

There is no official textbook for the class and all required material will be available online. The following, however, is a list of helpful supplementary resources.