Reading Group


Semesters	Spring 2025, Fall 2024
Scope	We aim to cover recent developments in sublinear (time/space) algorithms, in local computation algorithms, in distributed algorithms, etc. with a non-exclusive focus on algorithms for graphs.
Time and Space	The meetings will happen roughly every other Friday from 2-4pm in the Stata Center at MIT (either in the Star conf room or in 32-G575). Here are Google Calendar links to view or subscribe.
Organizers	Soheil Behnezhad, Ronitt Rubinfeld, Madhu Sudan
Mailing List	To sign up for the mailing list, use this link.

Schedule

Date	Title	Speaker	Links	Location
Spring 2025
5/9	Simulating Time With Square-Root Space	Ryan Williams	Abstract	32-D463 (STAR)
5/2	Privately Evaluating Untrusted Black-Box Functions	Sofya Raskhodnikova, Ephraim Linder	Abstract	32-D463 (STAR)
4/18	On Sampling Structures in Graphs	Kuikui Liu	Abstract	32-D463 (STAR)
4/4	Ghost Value Augmentation for k-Edge-Connectivity	Nathan Klein	Abstract	32-G575
3/14	Optimal k-Secretary with Logarithmic Memory	Mingda Qiao	Abstract	32-G575
2/21	Sparsifying Intersections of Halfspaces	Shivam Nadimpalli	Abstract	32-D463 (STAR)
2/14	Query Complexity of Stochastic Minimum Vertex Cover	Mahsa Derakhshan	Abstract	32-D463 (STAR)
Fall 2024
11/22	Directed Isoperimetry and Monotonicity Testing	Renato Ferreira Pinto Jr	Abstract	32-D463 (STAR)
11/8	Sparse Graph Counting and Kelley--Meka Bounds for Binary Systems	Esty Kelman	Abstract	32-D463 (STAR)
10/25	Local correction of monotone functions and applications	Jane Lange	Abstract	32-G575
10/11	Fully Dynamic Matching and Ordered Ruzsa-Szemerédi Graphs	Soheil Behnezhad	Abstract	32-D463 (STAR)
9/27	Sublinear-Time Approximation of Densest Subgraph	Mohsen Ghaffari	Abstract	32-G575
9/20	Sparsification for Maximum Matching: EDCS and Robust Communication Complexity	Amir Azarmehr	Abstract	32-G575
9/6	Sparsification through Codes	Aaron (Louie) Putterman	Abstract	32-D463 (STAR)

Abstract for "Sparsification through Codes" by Aaron (Louie) Putterman

The seminal work of Karger and Benczur and Karger from the 90s showed that graphs can be "sparsified" to a number of edges nearly linear in the number of vertices, so that every cut is preserved to within multiplicative factors close to 1. In the years since, graph (and hypergraph) sparsification has emerged as a central tool in the algorithmic literature, with numerous applications to streaming, fully dynamic algorithms, and more. The utility of this idea frames the natural question of which other combinatorial objects are amenable to sparsification. In this talk, I will provide a partial answer through the recently introduced notion of "code sparsification" which provides a unified framework to construct graph / hypergraph sparsifiers, while also yielding new sparsifications for CSPs and Cayley graphs. Specifically, the given structure to be preserved is a linear error correcting code; the goal is to select a re-weighted subset of the coordinates of the code such that the Hamming weight of every codeword is preserved within multiplicative factors close to 1. We show that this class of problems gives the right language to abstract the techniques of Karger and Benczur and Karger --- and indeed all codes can be sparsified to length nearly linear in the number of message bits. This generalization already resolves some basic questions in CSP sparsification. A further generalization to additive codes over finite abelian groups gives even more powerful results and in particular completely classifies the class of symmetric Boolean CSPs that allow nearly linear sized sparsification. Finally, by developing a linear algebraic analog of spanning trees, all of these sparsifiers become efficiently constructible.

Based on joint works with Sanjeev Khanna (Penn) and Madhu Sudan (Harvard).

Abstract for "Sparsification for Maximum Matching: EDCS and Robust Communication Complexity" by Amir Azarmehr

With massive graphs becoming increasingly prevalent, there has been growing interest in sparsifiers which preserve certain properties of graphs while having a small number of edges. The Edge Degree Constrained Subgraph (EDCS), introduced by Bernstein and Stein (2015, 2016), is such a sparsifier that approximately preserves the size of the maximum matching. EDCS is extremely robust as it remains a good sparsifier under various operations. Consequently, it has found applications in a myriad of different settings including the dynamic, streaming, and sublinear time settings, as well as in the communication model, stochastic matching, and massively parallel computations (MPC).

EDCS is often used to obtain 2/3-approximate matchings. However, we showed in a recent work that it is powerful enough to obtain a much better 5/6-approximation in the robust communication model, significantly improving over prior bounds. The talk will present this result while also covering the basic properties of EDCS.

Based on joint works with Behnezhad (2023), and Behnezhad and Roghani (2024).

Abstract for "Sublinear-Time Approximation of Densest Subgraph" by Mohsen Ghaffari

We will discuss sublinear-time algorithms that approximate the graph's arboricity (and other similar density measures) with near-optimal query complexity, including a work of Eden, Mossel, and Ron from SODA'22 that provides an O(log^2 n) approximation and a very recent work with Jiangqi Dai and Julian Portmann that provides an O(1) approximation.

Abstract for "Fully Dynamic Matching and Ordered Ruzsa-Szemerédi Graphs" by Soheil Behnezhad

In the fully dynamic matching problem, the goal is to efficiently maintain an (approximate) maximum matching of a graph that is subject to edge insertions and deletions. In this talk, I will overview a recent approach that (conditionally) resolves some longstanding barriers for this problem using graph sparsification. Along the way, I will discuss connections to Ruzsa-Szemerédi (RS) graphs, and a generalization of them called Ordered RS graphs, that have played extremely important roles in many recent sublinear algorithms and lower bounds.

Based on a joint work with Alma Ghafari (2024).

Abstract for "Local correction of monotone functions and applications" by Jane Lange

We discuss an algorithm for sorting binary labels on a poset in the LCA model of Rubinfeld et al and Alon et al (ICS'11, SODA’12), and several applications of this algorithm. The LCA is a local corrector for monotonicity — an algorithm that makes lookups to some unknown function and answers queries to a nearby monotone function — and runs in time polynomial in the degree of the DAG that corresponds to the poset. We will also discuss an extension of this algorithm that allows for local correction of real-valued functions, and the first of the applications, which is a tolerant monotonicity tester (and multiplicative+additive-error distance approximator) for functions over general posets. Our algorithm is built on a body of work on distributed greedy graph algorithms; specifically we rely on a recent work of Ghaffari (FOCS'22), which gives an efficient LCA for computing maximal matchings.

The other application turns our focus to the Boolean hypercube, where we can use LCAs as subroutines for tolerant testing or for proper learning under the uniform distribution in 2^{\tilde{O}(\sqrt{n})} time and samples. We will first discuss using the local corrector to upgrade an improper learner to a proper learner in the realizable setting, then discuss a different LCA-based approach to proper learning in the agnostic setting. The learning algorithms match the query complexity lower bound of Blais et al (RANDOM ’15) and are optimal in their dependence on n, and optimality of the testing algorithm is an open problem (see the recent “mildly exponential" lower bound of Chen et al (STOC’24)).

Abstract for "Sparse Graph Counting and Kelley--Meka Bounds for Binary Systems" by Esty Kelman

Graph counting lemma of Chung, Graham, and Wilson (Combinatorica 1988) is a fundamental result in combinatorics, which states that if a large graph is pseudorandom, then the number of copies of any small graph $H$ in $G$ is close to what is expected from a random graph of the same density. However, this result is only nontrivial when $G$ is a dense graph. In this work, we obtain a counting lemma that works in the sparse setting, and it is well-suited for the density increment arguments in additive combinatorics. In a recent remarkable breakthrough, Kelley and Meka (FOCS 2023) obtained a strong upper bound on the density of sets of integers without nontrivial three-term arithmetic progressions. We combine our counting lemma with other ideas to establish Kelley--Meka type bounds for all linear patterns defined by translation-invariant systems of binary linear forms, i.e., each form depends on exactly two variables. In particular, we obtain strong bounds for the Turan problem in Abelian Cayley sum graphs, i.e. an upper bound on the maximum edge density of an Abelian Cayley sum graph with a clique of a given size. To prove our results, we employ some of the recent technology developed by Kelley and Meka and also the follow-up work by Kelley, Lovett, and Meka (STOC 2024).

This talk is based on a joint work with Yuval Filmus, Hamed Hatami and Kaave Hosseini (FOCS 2024).

Abstract for "Directed Isoperimetry and Monotonicity Testing" by Renato Ferreira Pinto Jr

We discuss the connection between isoperimetric inequalities, their "directed" analogues, and monotonicity testing. A series of works in the Boolean setting of testing monotonicity of functions $f : \{0,1\}^d \to \{0,1\}$ has leveraged an emerging connection between isoperimetric inequalities (such as Poincaré, Margulis and Talagrand inequalities) and "directed analogues" of such inequalities to obtain monotonicity testers with improved query complexity. This connection, first observed by Chakrabarty and Seshadhri (STOC 2013), also underlies the result of Khot, Minzer and Safra (FOCS 2015), who gave a nearly optimal nonadaptive tester making roughly $\sqrt{d}$ queries.

Classical isoperimetric inequalities hold on both continuous and discrete domains, and indeed have one of their roots in the work of Poincaré on mathematical physics. Moreover, similar techniques are relevant to both settings, e.g. Fourier analysis. Therefore, toward a deeper understanding of the connection between classical and directed isoperimetry, we wish to understand whether directed isoperimetry is also a feature of the fully continuous setting $f : [0,1]^d \to \bR$ -- and moreover, whether this has algorithmic implications for monotonicity testing. We will discuss a pair of works in this direction (RANDOM 2023 and FOCS 2024), first showing that the ideas behind the classic "edge tester" carry over to the continuous case, and then showing that the mathematical physics interpretation of the classical Poincaré inequality (as characterizing the convergence of the heat equation) reveals also a proof of a directed Poincaré inequality, which implies a tester for the continuous setting making roughly $\sqrt{d}$ queries. Another takeaway from the latter result is that tools from optimal transport prove to be useful for tensorization, i.e. promoting one-dimensional results to high dimension at no loss.

Abstract for "Query Complexity of Stochastic Minimum Vertex Cover" by Mahsa Derakhshan

In this talk, we explore the relationship between the query complexity of the stochastic minimum vertex cover problem and the density of Ruzsa–Szemerédi graphs. We are given an n-vertex graph G=(V,E) and a (constant) existence probability for each edge. Each edge of G is realized (i.e., it exists) independently with this probability, forming a realized subgraph G*. The presence or absence of an edge in G* can be verified only via edge queries. Our goal is to find a near-optimal vertex cover of G* using few queries.

We first show that under mild correlation among edge realizations, obtaining any approximation ratio better than 1.5 requires querying a subgraph of size Ω(n⋅RS(n)). Here, RS(n) refers to Ruzsa–Szemerédi graphs and represents the largest number of induced edge-disjoint matchings of size Θ(n) in an n-vertex graph. We then discuss a simple algorithm that finds a (1+ε)-approximate vertex cover by querying a subgraph of size O(n⋅RS(n)) for any absolute constant ε>0. The analysis extends to the case of O(n⋅RS(n)) correlated edges, effectively completing our understanding of this problem under mild correlation.

This talk is based on joint work with Nika Haghtalab and Naveen Durvasula (STOC’23), as well as Mohammad Saneian and Zhiyang Xun (ITCS’25).

Abstract for "Sparsifying intersections of halfspaces" by Shivam Nadimpalli

Given an intersection of (possibly infinitely many) halfspaces at bounded distance from the origin, we show that it can be sparsified, i.e. approximated (under the Gaussian distribution) by an intersection of a halfspaces where the number of halfspaces depends only on the desired accuracy. This yields efficient algorithms for learning, tolerant testing, and volume estimation of convex sets of bounded width.

Our result follows from a more general sparsification lemma for Gaussian processes, which relies on Talagrand's majorizing measures theorem. As another consequence, we obtain a "junta theorem" for norms over Gaussian space: Every norm over R^n can be multiplicatively approximated (under the Gaussian measure) by a norm that depends on only a constant number of coordinates.

The talk will be self-contained and will require no prior background on Gaussian processes.

Based on joint work with Anindya De, Ryan O'Donnell, and Rocco Servedio.

Abstract for "Optimal k-Secretary with Logarithmic Memory" by Mingda Qiao

We study memory-bounded algorithms for the k-secretary problem. The algorithm of Kleinberg (2005) achieves an optimal competitive ratio of 1 - O(1/sqrt(k)), yet a straightforward implementation requires Omega(k) memory. Our main result is a k-secretary algorithm that matches the optimal competitive ratio using O(log k) words of memory. We prove this result by establishing a general reduction from k-secretary to (random-order) quantile estimation, the problem of finding the k-th largest element in a stream. We show that a quantile estimation algorithm with an O(k^alpha) expected error (in terms of the rank) gives a [1 - O(1/k^(1-alpha))]-competitive k-secretary algorithm with O(1) extra words. We then introduce a new quantile estimation algorithm that achieves an O(sqrt(k)) expected error bound using O(log k) memory. Of independent interest, we give a different algorithm that uses O(sqrt(k)) words and finds the k-th largest element exactly with high probability, generalizing a result of Munro and Paterson (1980).

Based on joint work with Wei Zhang.

"Ghost Value Augmentation for k-Edge-Connectivity" by Nathan Klein

We show that every fractionally k-edge-connected graph can be written as a convex combination of integral (k-10)-edge-connected graphs. A fractional graph is one in which every edge can take any value between 0 and 1, and it is k-edge-connected if the sum of edge values crossing any cut is at least k.

This implies that for large constant values of k, fractional edge connectivity and integral edge connectivity are essentially the same. As a byproduct of this result, we show that one can produce a (k-10)-edge-connected spanning subgraph (ECSS) of cost no more than the optimal k-ECSS, complementing the existing 2-approximation. This result also implies a 1+O(1/k) approximation for the k-edge-connected multi-subgraph problem (k-ECSM), resolving a conjecture of Pritchard from 2011.

This is joint work with Ellis Hershkowitz and Rico Zenklusen.

"On Sampling Structures in Graphs" by Kuikui Liu

Sampling is a fundamental algorithmic primitive in high-dimensional statistics and computer science. In this talk, we survey techniques, both new and old, for sampling graph structures which arise in statistical physics contexts, including independent sets and colorings. Our focus will be on connections between exponential decay of correlations, mixing times of Markov chains, and an interesting local sampler due to Anand-Jerrum.

"Privately evaluating untrusted black-box functions" by Ephraim Linder and Sofya Raskhodnikova

We provide tools for sharing sensitive data in situations when the data curator does not know in advance what questions an (untrusted) analyst might want to ask about the data. The analyst can specify a program that they want the curator to run on the dataset. We model the program as a black-box function $f$. We study differentially private algorithms, called privacy wrappers, that, given black-box access to a real-valued function $f$ and a sensitive dataset $x$, output an accurate approximation to $f(x)$. The dataset $x$ is modeled as a finite subset of a possibly infinite universe, in which each entry $x$ represents the data of one individual. A privacy wrapper calls $f$ on the dataset $x$ and on some subsets of $x$ and returns either an approximation to $f(x)$ or a nonresponse symbol $\perp$. The wrapper may also use additional information (that is, parameters) provided by the analyst, but differential privacy is required for all values of these parameters. Correct setting of these parameters will ensure better accuracy of the privacy wrapper. The bottleneck in the running time of our privacy wrappers is the number of calls to $f$, which we refer to as queries. Our goal is to design privacy wrappers with high accuracy and small query complexity.

We consider two settings: in the automated sensitivity detection setting, the analyst supplies only the black-box function $f$ and the intended (finite) range of $f$; in the provided sensitivity bound setting, the analyst also supplies additional parameters that describe the sensitivity of $f$. We define accuracy for both settings. We present the first privacy wrapper for the automated sensitivity detection setting. For the setting where a sensitivity bound is provided by the analyst, we design privacy wrappers with simultaneously optimal accuracy and query complexity, improving on the constructions provided (or implied) by previous work. We also prove tight lower bounds for both settings. In addition to addressing the black-box privacy problem, our private mechanisms provide feasibility results for the differentially private release of general classes of functions.

Joint work with Adam Smith, and Thomas Steinke (to appear in STOC 2025)

"Simulating Time With Square-Root Space" by Ryan Williams

We will go in-depth into the proof that for all functions $t(n) \geq n$, every multitape Turing machine running in time $t$ can be simulated in space only $O(\sqrt{t \log t})$. We will attempt to be completely self-contained. To that end, we will also describe a necessary extension of the remarkable space-efficient algorithm for Tree Evaluation recently found by Cook and Mertz [STOC 2024], which is the real surprise that makes the simulation possible.

You do not need to have seen the TOC colloquium talk earlier, but if you missed it, you could watch the TCS+ version here.