Check
Course CS 7880 Special Topics in Theoretical Computer Science: Algorithms for Big Data
Semester Fall 2022
Instructor Soheil Behnezhad (WVH 348)
Meeting Time Tuesdays and Fridays 9:50am - 11:30am in WVH 366
Office Hours Tuesdays and Fridays 11:30am to 12pm at WVH 348
Prerequisites Mathematical maturity and some experience with probability and randomized algorithms. Students in areas other than theory are welcome to take the course!
Course Overview

The emergence of massive datasets has motivated a rapidly growing interest in algorithms that can process huge amounts of data. When working with such massive inputs, inherent assumptions of traditional algorithms (such as the possibility of reading or storing the whole input with a single machine) no longer hold. We thus need a new paradigm for the design and analysis of big data algorithms. This is precisely our focus in this course.

In this course, we will study various algorithms whose time, space, or communication are much smaller than the input size. Problems that we study will include maximum matchings, graph connectivity, minimum spanning trees, frequency estimation, independent sets, graph coloring, and clustering. We study these problems across various models of sublinear algorithms, including sublinear time algorithms, streaming algorithms, and massively parallel computation (MPC).

Grading
  • 35% Final Project
  • 30% Homework Assignments
  • 35% Scribe notes and participation

Final Project: The final project will consist of exploring a research topic related to this course. More details about the final project will be announced in October.

Homework: There will be 2-3 homework assignments related to the topics discussed in the class. Solutions must be typeset in LaTeX. Groups of 2-3 students can work together on solving the problems, but each student should write the solution independently (in particular, you should understand and be able to explain everything that is written in your solution). You should also include the names of your collaborators in your solution.

Scribe notes: For each lecture, one student will be responsible to write scribe notes and send them to the instructor by 11:59pm on the Monday after the lecture to be posted on the course website. Please use this template when preparing the notes. See the template for more details about scribe notes.

Schedule
# WD Date Topics References Notes
1 Fr 9/9 Introduction and Basic Models of Big Data Computation Notes (1)
2 Tu 9/13 Basic Probabilistic Tools Notes (2)
3 Fr 9/16 Sublinear Time Algorithms for Connected Components and MST [CRT'05] [CS'09] Notes (3)
4 Tu 9/20 Perfect Matching of Regular Bipartite Graphs in O(n log n) Time [GKK'10] Notes (4)
5 Fr 9/23 Lower Bounds for Sublinear Algorithms [Y'77] Notes (5&6)
6 Tu 9/27 Lower Bounds for Sublinear Algorithms (continued) [Y'77] Notes (5&6)
7 Fr 9/30 Query Complexity of Randomized Greedy Maximal Independent Set [ON'08], [YYI'09] Notes (7)
Tu 10/4 No class
8 Fr 10/7 Sublinear Algorithms for Maximum Matching (Part I) [ON'08], [YYI'09] Notes (8)
9 Tu 10/11 Sublinear Algorithms for Maximum Matching (Part II) [B'21] Notes (9)
10 Fr 10/14 Streaming Algorithms: Distinct Elements (Part I) [AMS'96], [BJKST'02] Notes (10&11)
11 Fr 10/14 Streaming Algorithms: Distinct Elements (Part II) [AMS'96], [BJKST'02] Notes (10&11)
12 Fr 10/21 Communication Complexity and Streaming Lower Bounds (Part I) Notes (12)
13 Tu 10/25 Communication Complexity and Streaming Lower Bounds (Part II) Notes (13)
14 Fr 10/28 Communication Complexity and Streaming Lower Bounds (Part III) Notes (14)
15 Tu 11/1 Massively Parallel Computation (MPC): Introduction & Basics Notes (15)
16 Fr 11/4 Massively Parallel Computation (MPC): Introduction & Basics Notes (16)
17 Tu 11/8 MPC Algorithms for Maximal Independent Set with Linear Space Notes (17)
Fr 11/11 No Class: Veterans Day
18 Tu 11/15 MPC Algorithms for Correlation Clustering with Linear Space Notes (18)
19 Fr 11/18 MPC Algorithms for Graph Connectivity with Sublinear Space Notes (19)
20 Tu 11/22 Edge Degree Constrained Subgraphs (EDCS) (Part I) Notes (20)
Fr 11/25 No Class: Thanksgiving
21 Tu 11/29 Edge Degree Constrained Subgraphs (EDCS) (Part II) Notes (21)
22 Fr 12/2 Graph Connectivity via AGM's Linear Sketch
23 Tu 12/6 Project presentations
24 Fr 12/9 Project presentations
25 Tu 12/13 Project presentations
References
[CRT'05] Chazelle, Bernard, Ronitt Rubinfeld, and Luca Trevisan. "Approximating the minimum spanning tree weight in sublinear time." SIAM Journal on computing 34.6 (2005): 1370-1379.
[CS'09] Czumaj, Artur, and Christian Sohler. "Estimating the weight of metric minimum spanning trees in sublinear time." SIAM Journal on Computing 39.3 (2009): 904-922.
[GKK'10] Goel, Ashish, Michael Kapralov, and Sanjeev Khanna. "Perfect matchings in O (n log n) time in regular bipartite graphs." Proceedings of the Forty-second ACM Symposium on Theory of Computing (STOC). 2010.
[Y'77] Yao, Andrew Chi-Chin. "Probabilistic computations: Toward a unified measure of complexity." 18th Annual Symposium on Foundations of Computer Science (FOCS). 1977.
[ON'08] Nguyen, Huy N., and Krzysztof Onak. "Constant-time approximation algorithms via local improvements." 2008 49th Annual IEEE Symposium on Foundations of Computer Science. IEEE, 2008.
[YYI'09] Yoshida, Yuichi, Masaki Yamamoto, and Hiro Ito. "An improved constant-time approximation algorithm for maximum~ matchings." Proceedings of the forty-first annual ACM symposium on Theory of computing. 2009.
[B'21] Behnezhad, Soheil. "Time-optimal sublinear algorithms for matching and vertex cover." 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2022.
[AMS'96] Alon, Noga, Yossi Matias, and Mario Szegedy. "The space complexity of approximating the frequency moments." Proceedings of the twenty-eighth annual ACM symposium on Theory of computing. 1996.
[BJKST'02] Bar-Yossef, Z., Jayram, T. S., Kumar, R., Sivakumar, D., & Trevisan, L. (2002, September). Counting distinct elements in a data stream. In International Workshop on Randomization and Approximation Techniques in Computer Science (pp. 1-10). Springer, Berlin, Heidelberg.
Resources

There is no official textbook for the class and all required material will be available online. The following, however, is a list of helpful supplementary resources.