Shovan Avatar Image
  • About
  • Posts
  • Travels
  • Reads
  • Notes
Cover of Algorithms to Live By: The Computer Science of Human Decisions

Algorithms to Live By: The Computer Science of Human Decisions

by Brian Christian;Tom Griffiths
August 13, 2025115 min read
non-fiction,psychology

Page: 5, Location: 76

Note: C.


1   Optimal Stopping When to Stop Looking

Page: 13, Location: 196-197

Note: C.


A 63% failure rate, when following the best possible strategy, is a sobering fact. Even when we act optimally in the secretary problem, we will still fail most of the time—that is, we won’t end up with the single best applicant in the pool. This is bad news for those of us who would frame romance as a search for “the one.”

Page: 19, Location: 288-291

Note: The one


but perhaps the most appropriate to Berezovsky’s case—with apologies to Russian oligarchs—is known as the “burglar problem.” In this problem, a burglar has the opportunity to carry out a sequence of robberies. Each robbery provides some reward, and there’s a chance of getting away with it each time. But if the burglar is caught, he gets arrested and loses all his accumulated gains. What algorithm should he follow to maximize his expected take?

Page: 33, Location: 498-501

Note: Burglar problem


Because for people there’s always a time cost. It doesn’t come from the design of the experiment. It comes from people’s lives. The “endogenous” time costs of searching, which aren’t usually captured by optimal stopping models, might thus provide an explanation for why human decision-making routinely diverges from the prescriptions of those models. As optimal stopping researcher Neil Bearden puts it, “After searching for a while, we humans just tend to get bored. It’s not irrational to get bored, but it’s hard to model that rigorously.”

Page: 36, Location: 547-551

Note: Time cost of searching


As such, the explicit premise of the optimal stopping problem is the implicit premise of what it is to be alive. It’s this that forces us to decide based on possibilities we’ve not yet seen, this that forces us to embrace high rates of failure even when acting optimally. No choice recurs. We may get similar choices again, but never that exact one. Hesitation—inaction—is just as irrevocable as action. What the motorist, locked on the one-way road, is to space, we are to the fourth dimension: we truly pass this way but once.

Page: 37, Location: 558-562

Note: Intuition And logic


2   Explore/Exploit The Latest vs. the Greatest

Page: 39, Location: 595-597

Note: C.


Seizing a day and seizing a lifetime are two entirely different endeavors. We have the expression “Eat, drink, and be merry, for tomorrow we die,” but perhaps we should also have its inverse: “Start learning a new language or an instrument, and make small talk with a stranger, because life is long, and who knows what joy could blossom over many years’ time.” When balancing favorite experiences and new ones, nothing matters as much as the interval over which we plan to enjoy them.

Page: 43, Location: 655-659

Note: Inversion of seizing the day


So explore when you will have time to use the resulting knowledge, exploit when you’re ready to cash in. The interval makes the strategy.

Page: 44, Location: 668-669

Note: The strategy


intransigence.”

Page: 47, Location: 711-712

Note: T.


For every slot machine we know little or nothing about, there is some guaranteed payout rate which, if offered to us in lieu of that machine, will make us quite content never to pull its handle again. This number—which Gittins called the “dynamic allocation index,” and which the world now knows as the Gittins index—suggests an obvious strategy on the casino floor: always play the arm with the highest index.

Page: 49, Location: 743-746

Note: Gittins index


The success of Upper Confidence Bound algorithms offers a formal justification for the benefit of the doubt. Following the advice of these algorithms, you should be excited to meet new people and try new things—to assume the best about them, in the absence of evidence to the contrary. In the long run, optimism is the best prevention for regret.

Page: 56, Location: 850-852

Note: T.


In some cases, it’s unlikely that any pair of users will have the exact same experience.

Page: 58, Location: 877-877

Note: A/B testing


In general, it seems that people tend to over-explore—to favor the new disproportionately over the best.

Page: 65, Location: 983-984

Note: Study reault


What Carstensen and her colleagues found is that the shrinking of social networks with aging is due primarily to “pruning” peripheral relationships and focusing attention instead on a core of close friends and family members. This process seems to be a deliberate choice: as people approach the end of their lives, they want to focus more on the connections that are the most meaningful.

Page: 70, Location: 1065-1067

Note: Important point


Being sensitive to how much time you have left is exactly what the computer science of the explore/exploit dilemma suggests. We think of the young as stereotypically fickle; the old, stereotypically set in their ways. In fact, both are behaving completely appropriately with respect to their intervals. The deliberate honing of a social network down to the most meaningful relationships is the rational response to having less time to enjoy them.

Page: 71, Location: 1075-1078

Note: Age and people


Big-O: A Yardstick for the Worst Case

Page: 77, Location: 1170-1171

Note: T.


It gets worse from there. There’s “exponential time,” O(2n), where each additional guest doubles your work. Even worse is “factorial time,” O(n!), a class of problems so truly hellish that computer scientists only talk about it when they’re joking—as we were in imagining shuffling a deck until it’s sorted—or when they really, really wish they were.

Page: 80, Location: 1220-1224

Note: T.


Breaking the Quadratic Barrier: Divide and Conquer

Page: 82, Location: 1250-1250

Note: T.


In Bucket Sort, items are grouped together into a number of sorted categories, with no regard for finer, intracategory sorting; that can come later. (In computer science the term “bucket” simply refers to a chunk of unsorted data, but some of the most powerful real-world uses of Bucket Sort, as at the KCLS, take the name entirely literally.)

Page: 86, Location: 1316-1318

Note: Bucket Sort


But in fact it isn’t Bubble Sort that emerges as the single best algorithm in the face of a noisy comparator. The winner of that particular honor is an algorithm called Comparison Counting Sort. In this algorithm, each item is compared to all the others, generating a tally of how many items it is bigger than. This number can then be used directly as the item’s rank. Since it compares all pairs, Comparison Counting Sort is a quadratic-time algorithm, like Bubble Sort. Thus it’s not a popular choice in traditional computer science applications, but it’s exceptionally fault-tolerant.

Page: 97, Location: 1477-1481

Note: Robustness is sometimes preferred than efficiency


Blood Sort: Pecking Orders and Dominance Hierarchies

Page: 98, Location: 1488-1489

Note: T.


Much as we bemoan the daily rat race, the fact that it’s a race rather than a fight is a key part of what sets us apart from the monkeys, the chickens—and, for that matter, the rats.

Page: 103, Location: 1573-1575

Note: Important Point


4   Caching Forget About

Page: 105, Location: 1608-1609

Note: C.


In an ideal world, they wrote, the machine would of course have limitless quantities of lightning-fast storage, but in practice this wasn’t possible. (It still isn’t.) Instead, the trio proposed what they believed to be the next best thing: “a hierarchy of memories, each of which has greater capacity than the preceding but which is less quickly accessible.” By having effectively a pyramid of different forms of memory—a small, fast memory and a large, slow one—maybe we could somehow get the best of both.

Page: 108, Location: 1643-1646

Note: Memory Hierarchy


if you want to store anything else, and in computer science this making of room is called “cache replacement” or “cache eviction.” As Wilkes wrote, “Since the [cache] can only be a fraction of the size of the main memory, words cannot be preserved in it indefinitely, and there must be wired into the system an algorithm by which they are progressively overwritten.” These algorithms are known as “replacement policies” or “eviction policies,” or simply as caching algorithms.

Page: 110, Location: 1681-1685

Note: Cache evictioln


Turning the Library Inside

Page: 113, Location: 1728-1728

Note: T.


Here’s why: if temporal locality holds, then the rough-sorting shelves contain the most important books in the whole building. These are the books that were most recently used, so they are the ones that patrons are most likely to be looking for. It seems a crime that arguably the juiciest and most browseworthy shelf of the libraries’ miles of stacks is both hidden away and constantly eroded by earnest library staff just doing their jobs.

Page: 114, Location: 1742-1745

Note: Temporal Locality


A quarter of all Internet traffic at present is handled by a single corporation, one that manages to stay almost entirely out of the headlines. This Massachusetts-based company is called Akamai, and they’re in the caching business.

Page: 116, Location: 1766-1767

Note: Akamai


In short, the mathematics of self-organizing lists suggests something radical: the big pile of papers on your desk, far from being a guilt-inducing fester of chaos, is actually one of the most well-designed and efficient structures available. What might appear to others to be an unorganized mess is, in fact, a self-organizing mess. Tossing things back on the top of the pile is the very

Page: 123, Location: 1880-1882

Note: Noguchi filing system and LRU


The key to a good human memory then becomes the same as the key to a good computer cache: predicting which items are most likely to be wanted in the future.

Page: 125, Location: 1908-1909

Note: Practice and recall


Why don’t they make the whole plane out of that black box stuff? —STEVEN WRIGHT

Page: 127, Location: 1933-1934

Note: Wow


Recent work by a team of psychologists and linguists led by Michael Ramscar at the University of Tübingen has suggested that what we call “cognitive decline”—lags and retrieval errors—may not be about the search process slowing or deteriorating, but (at least partly) an unavoidable consequence of the amount of information we have to navigate getting bigger and bigger. Regardless of whatever other challenges aging brings, older brains—which must manage a greater store of memories—are literally solving harder computational problems with every passing day. The old can mock the young for their speed: “It’s because you don’t know anything yet!”

Page: 128, Location: 1955-1960

Note: Important


5   Scheduling First Things First

Page: 130, Location: 1989-1991

Note: C.


Spending Time Becomes a Science

Page: 132, Location: 2010-2010

Note: T.


Priority Inversion and Precedence Constraints

Page: 141, Location: 2153-2153

Note: T.


The comedian Mitch Hedberg recounts a time when “I was at a casino, I was minding my own business, and this guy came up and said, ‘You’re gonna have to move, you’re blocking the fire exit.’ As though if there was a fire, I wasn’t gonna run.” The bouncer’s argument was priority inversion; Hedberg’s rebuttal was priority inheritance. Hedberg lounging casually in front of a fleeing mob puts his low-priority loitering ahead of their high-priority running for their lives—but not if he inherits their priority. And an onrushing mob has a way of making one inherit their priority rather quickly. As Hedberg explains, “If you’re flammable and have legs, you are never blocking a fire exit.”

Page: 142, Location: 2172-2177

Note: Priority inversion and inheritance


“Things which matter most must never be at the mercy of things which matter least,”

Page: 143, Location: 2181-2181

Note: Very important


When a certain task can’t be started until another one is finished, scheduling theorists call that a “precedence constraint.”

Page: 143, Location: 2183-2184

Note: Term


Drop Everything: Preemption and Uncertainty

Page: 146, Location: 2234-2234

Note: T.


Preemption Isn’t Free: The Context Switch

Page: 148, Location: 2265-2266

Note: T.


only. The truth is, there’s always overhead—time lost to metawork, to the logistics of bookkeeping and task management. This is one of the fundamental tradeoffs of scheduling. And the more you take on, the more overhead there is. At its nightmarish extreme, this turns into a phenomenon called thrashing.

Page: 150, Location: 2294-2296

Note: T.


But if a task requires keeping track of so many things that they won’t all fit into memory, then you might well end up spending more time swapping information in and out of memory than doing the actual work. What’s more, when you switch tasks, the newly active task might make space for its working set by evicting portions of other working sets from memory. The next task, upon reactivation, would then reacquire parts of its working set from the hard disk and muscle them back into memory, again displacing others. This problem—tasks stealing space from each other—can get even worse in systems with hierarchies of caches between the processor and the memory.

Page: 152, Location: 2319-2324

Note: Cache fighting


This is thrashing: a system running full-tilt and accomplishing nothing at

Page: 152, Location: 2327-2328

Note: Term. Thrashing


If you find yourself doing a lot of context switching because you’re tackling a heterogeneous collection of short tasks, you can also employ another idea from computer science: “interrupt coalescing.” If you have five credit card bills, for instance, don’t pay them as they arrive; take care of them all in one go when the fifth bill comes.

Page: 157, Location: 2397-2399

Note: Ter.


6   Bayes’s Rule Predicting the Future

Page: 161, Location: 2454-2455

Note: C.


Reasoning Backward with the Reverend Bayes

Page: 162, Location: 2475-2475

Note: T.


In fact, for any possible drawing of w winning tickets in n attempts, the expectation is simply the number of wins plus one, divided by the number of attempts plus two: (w+1)⁄(n+2).

Page: 166, Location: 2534-2537

Note: Laplace law


Laplace’s Law, and it is easy to apply in any situation where you need to assess the chances of an event based on its history. If you make ten attempts at something and five of them succeed, Laplace’s Law estimates your overall chances to be 6/12 or 50%, consistent with our intuitions. If you try only once and it works out, Laplace’s estimate of 2/3 is both more reasonable than assuming you’ll win every time, and more actionable than Price’s guidance (which would tell us that there is a 75% metaprobability of a 50% or greater chance of success).

Page: 166, Location: 2538-2541

Note: Important


Bayes’s Rule and Prior Beliefs

Page: 167, Location: 2551-2551

Note: T.


The mathematical formula that describes this relationship, tying together our previously held ideas and the evidence before our eyes, has come to be known—ironically, as the real heavy lifting was done by Laplace—as Bayes’s Rule. And it gives a remarkably straightforward solution to the problem of how to combine preexisting beliefs with observed evidence: multiply their probabilities together.

Page: 168, Location: 2568-2571

Note: Bayes rule


And if we assume that we’re arriving precisely halfway into something’s duration, the best guess we can make for how long it will last into the future becomes obvious: exactly as long as it’s lasted already. Gott saw the Berlin Wall eight years after it was built, so his best guess was that it would stand for eight years more. (It ended up being twenty.) This straightforward reasoning, which Gott named the Copernican Principle, results in a simple algorithm that can be used to make predictions about all sorts of topics.

Page: 170, Location: 2594-2598

Note: Copernician principle


When Bayes’s Rule combines all these probabilities—the more-probable short time spans pushing down the average forecast, the less-probable yet still possible long ones pushing it up—the Copernican Principle emerges: if we want to predict how long something will last, and have no other knowledge about it whatsoever, the best guess we can make is that it will continue just as long as it’s gone on so far.

Page: 172, Location: 2629-2632

Note: Important


The Copernican Principle seems reasonable exactly in those situations where we know nothing at all—such as looking at the Berlin Wall in 1969, when we’re not even sure what timescale is appropriate. And it feels completely wrong in those cases where we do know something about the subject matter. Predicting that a 90-year-old man will live to 180 years seems unreasonable precisely because we go into the problem already knowing a lot about human life spans—and so we can do better.

Page: 173, Location: 2639-2642

Note: Important


The richer the prior information we bring to Bayes’s Rule, the more useful the predictions we can get out of it.

Page: 173, Location: 2642-2643

Note: T.


Human life spans are clearly in the former category. They roughly follow what’s termed a “normal” distribution—also known as the “Gaussian” distribution, after the German mathematician Carl Friedrich Gauss, and informally called the “bell curve” for its characteristic shape.

Page: 173, Location: 2645-2647

Note: Term.


This kind of pattern typifies what are called “power-law distributions.” These are also known as “scale-free distributions” because they characterize quantities that can plausibly range over many scales: a town can have tens, hundreds, thousands, tens of thousands, hundreds of thousands, or millions of residents, so we can’t pin down a single value for how big a “normal” town should be.

Page: 174, Location: 2655-2658

Note: Term.


And for any power-law distribution, Bayes’s Rule indicates that the appropriate prediction strategy is a Multiplicative Rule: multiply the quantity observed so far by some constant factor. For an uninformative prior, that constant factor happens to be 2, hence the Copernican prediction; in other power-law cases, the multiplier will depend on the exact distribution you’re working with.

Page: 175, Location: 2677-2680

Note: Term. Important


When we apply Bayes’s Rule with a normal distribution as a prior, on the other hand, we obtain a very different kind of guidance. Instead of a multiplicative rule, we get an Average Rule: use the distribution’s “natural” average—its single, specific scale—as your guide.

Page: 176, Location: 2687-2689

Note: Term.


Between those two extremes, there’s actually a third category of things in life: those that are neither more nor less likely to end just because they’ve gone on for a while. Sometimes things are simply … invariant.

Page: 177, Location: 2700-2701

Note: Term.


The Danish mathematician Agner Krarup Erlang, who studied such phenomena, formalized the spread of intervals between independent events into the function that now carries his name: the Erlang distribution.

Page: 177, Location: 2701-2703

Note: Term.


The Erlang distribution gives us a third kind of prediction rule, the Additive Rule: always predict that things will go on just a constant amount longer.

Page: 177, Location: 2710-2711

Note: Term.


These three very different patterns of optimal prediction—the Multiplicative, Average, and Additive Rules—all result directly from applying Bayes’s Rule to the power-law, normal, and Erlang distributions, respectively.

Page: 178, Location: 2722-2723

Note: Important


Our judgments betray our expectations, and our expectations betray our experience. What we project about the future reveals a lot—about the world we live in, and about our own past.

Page: 182, Location: 2779-2780

Note: Good point


Priors in the Age of Mechanical Reproduction

Page: 184, Location: 2810-2811

Note: T.


He is careful of what he reads, for that is what he will write. He is careful of what he learns, for that is what he will know. —ANNIE DILLARD The best way to make good predictions, as Bayes’s Rule shows us, is to be accurately informed about the things you’re predicting.

Page: 184, Location: 2813-2816

Note: Important


There’s a curious tension, then, between communicating with others and maintaining accurate priors about the world. When people talk about what interests them—and offer stories they think their listeners will find interesting—it skews the statistics of our experience. That makes it hard to maintain appropriate prior distributions. And the challenge has only increased with the development of the printing press, the nightly news, and social media—innovations that allow our species to spread language mechanically.

Page: 185, Location: 2826-2830

Note: Base of availability bias


If you want to be a good intuitive Bayesian—if you want to naturally make good predictions, without having to think about what kind of prediction rule is appropriate—you need to protect your priors. Counterintuitively, that might mean turning off the news.

Page: 186, Location: 2837-2839

Note: Point


7   Overfitting When to Think Less

Page: 187, Location: 2858-2860

Note: C.


If the study were repeated with different people, producing slight variations on the same essential pattern, the one- and two-factor models would remain more or less steady—but the nine-factor model would gyrate wildly from one instance of the study to the next. This is what statisticians call overfitting.

Page: 191, Location: 2927-2929

Note: Term.


The Idolatry of Data

Page: 192, Location: 2932-2932

Note: T.


we had copious data, drawn from a perfectly representative sample, completely mistake-free, and representing exactly what we’re trying to evaluate, then using the most complex model available would indeed be the best approach. But if we try to perfectly fit our model to the data when any of these factors fails to hold, we risk overfitting.

Page: 192, Location: 2932-2935

Note: Imp.


How to Combat Overfitting: Penalizing Complexity

Page: 198, Location: 3028-3028

Note: T.


Occam’s razor principle, which suggests that, all things being equal, the simplest possible hypothesis is probably the correct one.

Page: 198, Location: 3034-3035

Note: Term.


Computer scientists refer to this principle—using constraints that penalize models for their complexity—as Regularization.

Page: 199, Location: 3038-3039

Note: Term.


Language forms yet another natural Lasso: complexity is punished by the labor of speaking at greater length and the taxing of our listener’s attention span. Business plans get compressed to an elevator pitch; life advice becomes proverbial wisdom only if it is sufficiently concise and catchy. And anything that needs to be remembered has to pass through the inherent Lasso of memory.

Page: 200, Location: 3057-3059

Note: Imp.


As a species, being constrained by the past makes us less perfectly adjusted to the present we know but helps keep us robust for the future we don’t.

Page: 204, Location: 3115-3116

Note: Imp.


8   Relaxation Let It Slide

Page: 208, Location: 3185-3186

Note: C.


This is an instance of what’s known to mathematicians and computer scientists as a “constrained optimization” problem: how to find the single best arrangement of a set of variables, given particular rules and a scorekeeping measure.

Page: 211, Location: 3224-3225

Note: Term.


But like the secretary problem, it emerged in the mid-twentieth century, a period unmistakably evoked by its canonical name: “the traveling salesman problem.”

Page: 211, Location: 3227-3228

Note: Term.


One of the simplest forms of relaxation in computer science is known as Constraint Relaxation. In this technique, researchers remove some of the problem’s constraints and set about solving the problem they wish they had. Then, after they’ve made a certain amount of headway, they try to add the constraints back in. That is, they make the problem temporarily easier to handle before bringing it back to reality.

Page: 214, Location: 3271-3274

Note: Term.


Just a Speeding Ticket: Lagrangian Relaxation

Page: 218, Location: 3341-3341

Note: T.


When an optimization problem’s constraints say “Do it, or else!,” Lagrangian Relaxation replies, “Or else what?” Once we can color outside the lines—even just a little bit, and even at a steep cost—problems become tractable that weren’t tractable before.

Page: 219, Location: 3353-3355

Note: Term.


There are many ways to relax a problem, and we’ve seen three of the most important. The first, Constraint Relaxation, simply removes some constraints altogether and makes progress on a looser form of the problem before coming back to reality. The second, Continuous Relaxation, turns discrete or binary choices into continua: when deciding between iced tea and lemonade, first imagine a 50–50 “Arnold Palmer” blend and then round it up or down. The third, Lagrangian Relaxation, turns impossibilities into mere penalties, teaching the art of bending the rules (or breaking them and accepting the consequences). A rock band deciding which songs to cram into a limited set, for instance, is up against what computer scientists call the “knapsack problem”—a puzzle that asks one to decide which of a set of items of different bulk and importance to pack into a confined volume. In its strict formulation the knapsack problem is famously intractable, but that needn’t discourage our relaxed rock stars. As demonstrated in several celebrated examples, sometimes it’s better to simply play a bit past the city curfew and incur the related fines than to limit the show to the available slot. In fact, even when you don’t commit the infraction, simply imagining it can be illuminating.

Page: 222, Location: 3396-3404

Note: Relaxation types


Scott Fitzgerald once wrote that “the test of a first-rate intelligence is the ability to hold two opposing ideas in mind at the same time and still retain the ability to function.”

Page: 227, Location: 3473-3474

Note: Quote.


Ulam developed the idea further with John von Neumann, and worked with Nicholas Metropolis, another of the physicists from the Manhattan Project, on implementing the method on the Los Alamos computer. Metropolis named this approach—replacing exhaustive probability calculations with sample simulations—the Monte Carlo Method, after the Monte Carlo casino in Monaco, a place equally dependent on the vagaries of chance.

Page: 228, Location: 3490-3493

Note: Term.


For millennia, the study of prime numbers was believed to be, as G. H. Hardy put it, “one of the most obviously useless branches” of mathematics.

Page: 230, Location: 3514-3515

Note: Hahd


what if we only needed to be mostly sure this URL was new to us? That’s where the Bloom filter comes in. Named for its inventor, Burton H. Bloom, a Bloom filter works much like the Rabin-Miller primality test: the URL is entered into a set of equations that esssentially check for “witnesses” to its novelty. (Rather than proclaim “n is not prime,” these equations say “I have not seen n before.”) If you’re willing to tolerate an error rate of just 1% or 2%, storing your findings in a probabilistic data structure like a Bloom filter will save you significant amounts of both time and space. And the usefulness of such filters is not confined to search engines: Bloom filters have shipped with a number of recent web browsers to check URLs against a list of known malicious websites, and they are also an important part of cryptocurrencies like Bitcoin.

Page: 239, Location: 3654-3661

Note: Term. Bloom filter


This is an algorithm known as Hill Climbing—since the search through a space of solutions, some better and some worse, is commonly thought of in terms of a landscape with hills and valleys, where your goal is to reach the highest peak.

Page: 241, Location: 3685-3687

Note: Term.


for instance, the physics of annealing, the way that materials change state as they are heated and cooled.

Page: 243, Location: 3724-3725

Note: Term.


Over the next few decades, that paper would be cited a whopping thirty-two thousand times. To this day, simulated annealing remains one of the most promising approaches to optimization problems known to the field.

Page: 245, Location: 3756-3757

Note: Term.


it’s a common enough phenomenon that a word was invented to capture it: in 1754, Horace Walpole coined the term “serendipity,” based on the fairy tale adventures of The Three Princes of Serendip (Serendip being the archaic name of Sri Lanka), who “were always making discoveries, by accidents and sagacity, of things they were not in quest of.”

Page: 247, Location: 3780-3782

Note: Word.


Being randomly jittered, thrown out of the frame and focused on a larger scale, provides a way to leave what might be locally good and get back to the pursuit of what might be globally optimal.

Page: 249, Location: 3816-3817

Note: Impo.


Protocol is how we get on the same page; in fact, the word is rooted in the Greek protokollon, “first glue,” which referred to the outer page attached to a book or manuscript.

Page: 255, Location: 3901-3902

Note: Term.


Exponential Backoff: The Algorithm of Forgiveness

Page: 262, Location: 4017-4018

Note: T.


Since the maximum delay length (2, 4, 8, 16…) forms an exponential progression, it’s become known as Exponential Backoff.

Page: 264, Location: 4047-4048

Note: Term.


Flow Control and Congestion Avoidance

Page: 267, Location: 4087-4088

Note: T.


Packet switching is radically different. The phone system gets full; the mail system gets slow. There’s nothing in the network to explicitly tell a sender how many other senders there are, or how congested the network is at any given moment, and the amount of congestion is constantly changing. Therefore, the sender and receiver must not only communicate but metacommunicate: they need to figure out how fast the data should be sent. Somehow, assorted packet flows—without explicit management or coordination—must both get out of each other’s way and quickly take advantage of any newly available space.

Page: 268, Location: 4101-4105

Note: Imp.


Backchannels: Flow Control in Linguistics

Page: 272, Location: 4160-4160

Note: T.


When a networking buffer fills up, what typically happens is called Tail Drop: an unceremonious way of saying that every packet arriving after that point is simply rejected, and effectively deleted. (Turning new customers away from the crêpe stand once the line gets too long would be a version of Tail Drop in a human context.) Given the postal metaphor for packet switching,

Page: 276, Location: 4229-4231

Note: Term.


The most prevalent critique of modern communications is that we are “always connected.” But the problem isn’t that we’re always connected; we’re not. The problem is that we’re always buffered. The difference is enormous.

Page: 279, Location: 4264-4266

Note: Imp.


The feeling that one needs to look at everything on the Internet, or read all possible books, or see all possible shows, is bufferbloat.

Page: 279, Location: 4266-4267

Note: Yolo


There’s also a proposal for a new backchannel for TCP, the first such modification in many years: Explicit Congestion Notification, or ECN. Fully extricating the Internet from bufferbloat will draw on all of these changes and require the patience of many years. “This is a long-term swamp,” says Gettys.

Page: 280, Location: 4289-4291

Note: Term.


11  Game Theory The Minds of Others

Page: 282, Location: 4314-4315

Note: C.


Computer science illustrates the fundamental limitations of this kind of reasoning with what’s called the “halting problem.”

Page: 284, Location: 4349-4350

Note: Term.


Dominant Strategies, for Better or Worse

Page: 290, Location: 4433-4433

Note: T.


In fact, this makes defection not merely the equilibrium strategy but what’s known as a dominant strategy. A dominant strategy avoids recursion altogether, by being the best response to all of your opponent’s possible strategies—so you don’t even need to trouble yourself getting inside their head at all. A dominant strategy is a powerful thing.

Page: 291, Location: 4448-4451

Note: Term.


But from a congestion standpoint, the fact that anarchy is only 4/3 as congested as perfect coordination means that perfectly coordinated commutes will only be 3/4 as congested as they are now. It’s a bit like the famous line by James Branch Cabell: “The optimist proclaims that we live in the best of all possible worlds; and the pessimist fears this is true.” Congestion will always be a problem solvable more by planners and by overall demand than by the decisions of individual drivers, human or computer, selfish or cooperative.

Page: 292, Location: 4473-4477

Note: Imp.


The Tragedy of the Commons

Page: 293, Location: 4482-4482

Note: T.


In theory, all the villagers should graze only as many animals as would leave some grass for everyone. In practice, though, the benefits of grazing a little bit more than that accrue directly to you, while the harms seem too small to be of consequence. Yet if everyone follows this logic of using just slightly more of the commons than they should, a dreadful equilibrium results: a completely devastated lawn, and no grass for anyone’s livestock thereafter. Hardin called this the “tragedy of the commons,”

Page: 293, Location: 4485-4489

Note: Tragedy of commmons


Mechanism Design: Change the Game

Page: 295, Location: 4523-4523

Note: T.


If the forest could only somehow agree to a kind of truce, the ecosystem could enjoy the photosynthetic bounty without the wood-making arms race wasting it all. But as we’ve seen, good outcomes in these scenarios tend only to arise in the context of an authority outside the game—someone changing the payoffs from the top down. It would seem as though in nature, then, there is simply no way of establishing good equilibria between individuals.

Page: 300, Location: 4591-4594

Note: Imp.


Just as with the tragedy of the commons, this failure is not necessarily the players’ fault. An enormously influential paper by the economists Sushil Bikhchandani, David Hirshleifer, and Ivo Welch has demonstrated that under the right circumstances, a group of agents who are all behaving perfectly rationally and perfectly appropriately can nonetheless fall prey to what is effectively infinite misinformation. This has come to be known as an “information cascade.”

Page: 307, Location: 4702-4705

Note: Term.


Named for Nobel Prize–winning economist William Vickrey, the Vickrey auction, just like the first-price auction, is a “sealed bid” auction process. That is, every participant simply writes down a single number in secret, and the highest bidder wins. However, in a Vickrey auction, the winner ends up paying not the amount of their own bid, but that of the second-place bidder. That is to say, if you bid 25andIbid25 and I bid 25andIbid10, you win the item at my price: you only have to pay $10.

Page: 311, Location: 4755-4758

Note: Term.


In fact, a game-theoretic principle called “revenue equivalence” establishes that over time, the average expected sale price in a first-price auction will converge to precisely the same as in a Vickrey auction. Thus the Vickrey equilibrium involves the same bidder winning the item for the same price—without any strategizing by any of the bidders whatsoever. As Tim Roughgarden tells his Stanford students, the Vickrey auction is “awesome.”

Page: 312, Location: 4771-4774

Note: Term.


In fact, the lesson here goes far beyond auctions. In a landmark finding called the “revelation principle,” Nobel laureate Roger Myerson proved that any game that requires strategically masking the truth can be transformed into a game that requires nothing but simple honesty.

Page: 312, Location: 4779-4781

Note: Term.


Introduction

Page: 5, Location: 76-76


How are you to know that an apartment is indeed the best unless you have a baseline to judge it by? And how are you to establish that baseline unless you look at (and lose) a number of apartments? The more information you gather, the better you’ll know the right opportunity when you see it—but the more likely you are to have already passed it by.

Page: 6, Location: 86-89


Thirty-seven percent. If you want the best odds of getting the best apartment, spend 37% of your apartment hunt (eleven days, if you’ve given yourself a month for the search) noncommittally exploring options. Leave the checkbook at home; you’re just calibrating. But after that point, be prepared to immediately commit—deposit and all—to the very first place you see that beats whatever you’ve already seen. This is not merely an intuitively satisfying compromise between looking and leaping. It is the provably optimal solution.

Page: 7, Location: 94-98


But an algorithm is just a finite sequence of steps used to solve a problem, and algorithms are much broader—and older by far—than the computer. Long before algorithms were ever used by machines, they were used by people.

Page: 8, Location: 122-123


The word “algorithm” comes from the name of Persian mathematician al-Khwārizmī, author of a ninth-century book of techniques for doing mathematics by hand. (His book was called al-Jabr wa’l-Muqābala—and the “al-jabr” of the title in turn provides the source of our word “algebra.”)

Page: 9, Location: 123-126


Optimal stopping tells us when to look and when to leap. The explore/exploit tradeoff tells us how to find the balance between trying new things and enjoying our favorites. Sorting theory tells us how (and whether) to arrange our offices. Caching theory tells us how to fill our closets. Scheduling theory tells us how to fill our time.

Page: 9, Location: 133-136


As Carl Sagan put it, “Science is a way of thinking much more than it is a body of knowledge.”

Page: 9, Location: 137-137


Alan Turing defined the very notion of computation by an analogy to a human mathematician who carefully works through the steps of a lengthy calculation, yielding an unmistakably right answer.

Page: 10, Location: 147-149


Though all Christians start a wedding invitation by solemnly declaring their marriage is due to special Divine arrangement, I, as a philosopher, would like to talk in greater detail about this … —JOHANNES KEPLER

Page: 13, Location: 197-199


It’s such a common phenomenon that college guidance counselors even have a slang term for it: the “turkey drop.” High-school sweethearts come home for Thanksgiving of their freshman year of college and, four days later, return to campus single.

Page: 14, Location: 202-204


The nature of serial monogamy, writ large, is that its practitioners are confronted with a fundamental, unavoidable problem. When have you met enough people to know who your best match is? And what if acquiring the data costs you that very match? It seems the ultimate Catch-22 of the heart.

Page: 14, Location: 208-210


In any optimal stopping problem, the crucial dilemma is not which option to pick, but how many options to even consider. These problems turn out to have implications not only for lovers and renters, but also for drivers, homeowners, burglars, and beyond.

Page: 14, Location: 212-215


Whence 37%? In your search for a secretary, there are two ways you can fail: stopping early and stopping late. When you stop too early, you leave the best applicant undiscovered. When you stop too late, you hold out for a better applicant who doesn’t exist. The optimal strategy will clearly require finding the right balance between the two, walking the tightrope between looking too much and not enough.

Page: 17, Location: 249-252


As a result, best-yet applicants will become steadily more impressive as the search continues (by definition, again, they’re better than all those who came before)—but they will also become more and more infrequent.

Page: 17, Location: 257-258


the optimal solution takes the form of what we’ll call the Look-Then-Leap Rule: You set a predetermined amount of time for “looking”—that is, exploring your options, gathering data—in which you categorically don’t choose anyone, no matter how impressive. After that point, you enter the “leap” phase, prepared to instantly commit to anyone who outshines the best applicant you saw in the look phase.

Page: 18, Location: 264-267


As the applicant pool grows, the exact place to draw the line between looking and leaping settles to 37% of the pool, yielding the 37% Rule: look at the first 37% of the applicants,* choosing none, then be ready to leap for anyone better than all those you’ve seen so far.

Page: 19, Location: 280-283


As it turns out, following this optimal strategy ultimately gives us a 37% chance of hiring the best applicant; it’s one of the problem’s curious mathematical symmetries that the strategy itself and its chance of success work out to the very same number.

Page: 19, Location: 285-286


It’s true that you’re unlikely to find the needle the majority of the time, but optimal stopping is your best defense against the haystack, no matter how large.

Page: 20, Location: 295-297


married the first man I ever kissed. When I tell this to my children they just about throw up. —BARBARA BUSH

Page: 20, Location: 299-301


I married the first man I ever kissed. When I tell this to my children they just about throw up. —BARBARA BUSH

Page: 20, Location: 299-301


Both Kepler and Trick—in opposite ways—experienced firsthand some of the ways that the secretary problem oversimplifies the search for love.

Page: 21, Location: 320-321


In the classical secretary problem, applicants always accept the position, preventing the rejection experienced by Trick. And they cannot be “recalled” once passed over, contrary to the strategy followed by Kepler.

Page: 21, Location: 321-323


For example, assume an immediate proposal is a sure thing but belated proposals are rejected half the time. Then the math says you should keep looking noncommittally until you’ve seen 61% of applicants, and then only leap if someone in the remaining 39% of the pool proves to be the best yet. If you’re still single after considering all the possibilities—as Kepler was—then go back to the best one that got away. The symmetry between strategy and outcome holds in this case once again, with your chances of ending up with the best applicant under this second-chances-allowed scenario also being 61%.

Page: 22, Location: 335-339


It’s this fact that gives rise to the unavoidable “look” phase, in which we risk passing up a superb early applicant while we calibrate our expectations and standards. Mathematicians refer to this genre of optimal stopping problems as “no-information games.”

Page: 23, Location: 349-350


We can instead use the Threshold Rule, where we immediately accept an applicant if she is above a certain percentile. We don’t need to look at an initial group of candidates to set this threshold—but we do, however, need to be keenly aware of how much looking remains available.

Page: 24, Location: 363-365


The math shows that when there are a lot of applicants left in the pool, you should pass up even a very good applicant in the hopes of finding someone still better than that—but as your options dwindle, you should be prepared to hire anyone who’s simply better than average. It’s a familiar, if not exactly inspiring, message: in the face of slim pickings, lower your standards. It also makes clear the converse: with more fish in the sea, raise them. In both cases, crucially, the math tells you exactly by how much.

Page: 24, Location: 366-369


being more choosy the more applicants are left.

Page: 25, Location: 374-374


The chance of ending up with the single best applicant in this full-information version of the secretary problem comes to 58%—still far from a guarantee, but considerably better than the 37% success rate offered by the 37% Rule in the no-information game. If you have all the facts, you can succeed more often than not, even as the applicant pool grows arbitrarily large.

Page: 25, Location: 376-378


The full-information game thus offers an unexpected and somewhat bizarre takeaway. Gold digging is more likely to succeed than a quest for love.

Page: 25, Location: 380-382


The full-information game thus offers an unexpected and somewhat bizarre takeaway. Gold digging is more likely to succeed than a quest for love. If you’re evaluating your partners based on any kind of objective criterion—say, their income percentile—then you’ve got a lot more information at your disposal than if you’re after a nebulous emotional response (“love”) that might require both experience and comparison to calibrate.

Page: 25, Location: 380-384


But if neither concern leads us to believe that our backs are against the wall, then we can simply focus on a cost-benefit analysis of the waiting game.

Page: 27, Location: 405-406


First, if the cost of waiting is trivial, we’re able to be almost infinitely choosy.

Page: 27, Location: 414-415


Finally, if waiting costs half or more of our expected range of offers—in this case, $50,000—then there’s no advantage whatsoever to holding out; we’ll do best by taking the very first offer that comes along and calling it done. Beggars can’t be choosers.

Page: 28, Location: 417-419


The critical thing to note in this problem is that our threshold depends only on the cost of search. Since the chances of the next offer being a good one—and the cost of finding out—never change, our stopping price has no reason to ever get lower as the search goes on, regardless of our luck. We set it once, before we even begin, and then we quite simply hold fast.

Page: 28, Location: 421-424


But in house selling and job hunting, even if it’s possible to reconsider an earlier offer, and even if that offer is guaranteed to still be on the table, you should nonetheless never do so. If it wasn’t above your threshold then, it won’t be above your threshold now. What you’ve paid to keep searching is a sunk cost. Don’t compromise, don’t second-guess. And don’t look back.

Page: 29, Location: 433-436


The key impact that occupancy rate has on parking strategy becomes clear once we recognize that parking is an optimal stopping problem. As you drive along the street, every time you see the occasional empty spot you have to make a decision: should you take this spot, or go a little closer to your destination and try your luck?

Page: 31, Location: 462-464


When to Quit

Page: 32, Location: 483-483


the number of robberies you should carry out is roughly equal to the chance you get away, divided by the chance you get caught. If you’re a skilled burglar and have a 90% chance of pulling off each robbery (and a 10% chance of losing it all), then retire after 90/10 = 9 robberies. A ham-fisted amateur with a 50/50 chance of success? The first time you have nothing to lose, but don’t push your luck more than once.

Page: 33, Location: 503-506


Always Be Stopping

Page: 35, Location: 524-525


So the irresistible question is whether—by evolution or education or intuition—we actually do follow the best strategies. At first glance, the answer is no. About a dozen studies have produced the same result: people tend to stop early, leaving better applicants unseen.

Page: 35, Location: 531-533


But in practice, when the clock—or the ticker—is ticking, few aspects of decision-making (or of thinking more generally) are as important as this one: when to

Page: 37, Location: 563-564


But in practice, when the clock—or the ticker—is ticking, few aspects of decision-making (or of thinking more generally) are as important as this one: when to stop.

Page: 37, Location: 563-564


Every day we are constantly forced to make decisions between options that differ in a very specific dimension: do we try new things or stick with our favorite ones? We intuitively understand that life is a balance between novelty and tradition, between the latest and the greatest, between taking risks and savoring what we know and love.

Page: 40, Location: 602-604


But the reality is not so simple. Remembering that every “best” song and restaurant among your favorites began humbly as something merely “new” to you is a reminder that there may be yet-unknown bests still out there—and thus that the new is indeed worthy of at least some of our attention.

Page: 40, Location: 608-610


balance for more than fifty years. They even have a name for it: the explore/exploit tradeoff. Explore/

Page: 40, Location: 614-615


Computer scientists have been working on finding this balance for more than fifty years. They even have a name for it: the explore/exploit tradeoff.

Page: 40, Location: 613-614


Explore/Exploit

Page: 41, Location: 615-615


Simply put, exploration is gathering information, and exploitation is using the information you have to get a known good result.

Page: 41, Location: 616-618


What’s more, exploration can be a curse. Part of what’s nice about music, for instance, is that there are constantly new things to listen to. Or, if you’re a music journalist, part of what’s terrible about music is that there are constantly new things to listen to. Being a music journalist means turning the exploration dial all the way to 11, where it’s nothing but new things all the time.

Page: 41, Location: 622-625


Journalists are martyrs, exploring so that others may exploit.

Page: 42, Location: 630-631


In computer science, the tension between exploration and exploitation takes its most concrete form in a scenario called the “multi-armed bandit problem.” The odd name comes from the colloquial term for a casino slot machine, the “one-armed bandit.”

Page: 42, Location: 631-633


People tend to treat decisions in isolation, to focus on finding each time the outcome with the highest expected value. But decisions are almost never isolated, and expected value isn’t the end of the story. If you’re thinking not just about the next decision, but about all the decisions you are going to make about the same options in the future, the explore/exploit tradeoff is crucial to the process.

Page: 43, Location: 647-650


Seize the Interval

Page: 43, Location: 653-653


A sobering property of trying new things is that the value of exploration, of finding a new favorite, can only go down over time, as the remaining opportunities to savor it dwindle. Discovering an enchanting café on your last night in town doesn’t give you the opportunity to return.

Page: 44, Location: 664-666


Win-Stay

Page: 45, Location: 687-687


Robbins specifically considered the case where there are exactly two slot machines, and proposed a solution called the Win-Stay, Lose-Shift algorithm: choose an arm at random, and keep pulling it as long as it keeps paying off. If the arm doesn’t pay off after a particular pull, then switch to the other one. Although this simple strategy is far from a complete solution, Robbins proved in 1952 that it performs reliably better than chance.

Page: 46, Location: 692-695


Good options shouldn’t be penalized too strongly for being imperfect.

Page: 46, Location: 700-701


For these reasons, the multi-armed bandit problem effectively stayed unsolved. In Whittle’s words, “it quickly became a classic, and a byword for intransigence.”

Page: 47, Location: 710-712


The Gittins Index

Page: 47, Location: 712-712


Economists refer to this idea, of valuing the present more highly than the future, as “discounting.”

Page: 48, Location: 726-726


The Gittins index, then, provides a formal, rigorous justification for preferring the unknown, provided we have some opportunity to exploit the results of what we learn from exploring.

Page: 51, Location: 777-778


The old adage tells us that “the grass is always greener on the other side of the fence,” but the math tells us why: the unknown has a chance of being better, even if we actually expect it to be no different, or if it’s just as likely to be worse. The untested rookie is worth more (early in the season, anyway) than the veteran of seemingly equal ability, precisely because we know less about him. Exploration in itself has value, since trying new things increases our chances of finding the best. So taking the future into account, rather than focusing just on the present, drives us toward novelty.

Page: 51, Location: 778-782


Regret and Optimism

Page: 52, Location: 795-795


For myself I am an optimist. It does not seem to be much use being anything else. —WINSTON CHURCHILL

Page: 52, Location: 797-798


In the memorable words of management theorist Chester Barnard, “To try and fail is at least to learn; to fail to try is to suffer the inestimable loss of what might have been.”

Page: 53, Location: 802-803


Regret is the result of comparing what we actually did with what would have been best in hindsight.

Page: 53, Location: 813-813


In general we can’t realistically expect someday to never have any more regrets. But if we’re following a regret-minimizing algorithm, every year we can expect to have fewer new regrets than we did the year before.

Page: 54, Location: 825-827


Of the ones they’ve discovered, the most popular are known as Upper Confidence Bound algorithms.

Page: 54, Location: 828-829


In a multi-armed bandit problem, an Upper Confidence Bound algorithm says, quite simply, to pick the option for which the top of the confidence interval is highest.

Page: 55, Location: 833-834


Bandits Online

Page: 56, Location: 852-852


Clinical Trials on Trial

Page: 59, Location: 900-900


Clinical testing of new drugs and treatments, the report notes, often requires risking harm to some patients, even if steps are taken to minimize that risk.

Page: 60, Location: 914-915


The widespread difficulty with accepting results from adaptive clinical trials might seem incomprehensible. But consider that part of what the advent of statistics did for medicine, at the start of the twentieth century, was to transform it from a field in which doctors had to persuade each other in ad hoc ways about every new treatment into one where they had clear guidelines about what sorts of evidence were and were not persuasive. Changes to accepted standard statistical practice have the potential to upset this balance, at least temporarily.

Page: 64, Location: 970-973


The Restless World

Page: 64, Location: 979-979


In that case, people chose to observe 505 times, on average, placing bets the other 495 times. But the math says they should have started to bet after just 38 observations—leaving 962 chances to cash in.

Page: 65, Location: 991-992


One of the curious things about human beings, which any developmental psychologist aspires to understand and explain, is that we take years to become competent and autonomous. Caribou and gazelles must be prepared to run from predators the day they’re born, but humans take more than a year to make their first steps.

Page: 68, Location: 1031-1033


What we take to be the caprice of children may be wiser than we know.

Page: 69, Location: 1051-1051


I had reached a juncture in my reading life that is familiar to those who have been there: in the allotted time left to me on earth, should I read more and more new books, or should I cease with that vain consumption—vain because it is endless—and begin to reread those books that had given me the intensest pleasure in my past. —LYDIA DAVIS

Page: 69, Location: 1052-1055


In an experiment testing this hypothesis, Carstensen and her collaborator Barbara Fredrickson asked people to choose who they’d rather spend thirty minutes with: an immediate family member, the author of a book they’d recently read, or somebody they had met recently who seemed to share their interests. Older people preferred the family member; young people were just as excited to meet the author or make a new friend. But in a critical twist, if the young people were asked to imagine that they were about to move across the country, they preferred the family member too.

Page: 70, Location: 1067-1071


The explore/exploit tradeoff also tells us how to think about advice from our elders. When your grandfather tells you which restaurants are good, you should listen—these are pearls gleaned from decades of searching. But when he only goes to the same restaurant at 5:00 p.m. every day, you should feel free to explore other options,

Page: 71, Location: 1082-1084


The explore/exploit tradeoff also tells us how to think about advice from our elders. When your grandfather tells you which restaurants are good, you should listen—these are pearls gleaned from decades of searching. But when he only goes to the same restaurant at 5:00 p.m. every day, you should feel free to explore other options, even though they’ll likely be worse.

Page: 71, Location: 1082-1084


Shifting the bulk of one’s attention to one’s favorite things should increase quality of life. And it seems like it does: Carstensen has found that older people are generally more satisfied with their social networks, and often report levels of emotional well-being that are higher than those of younger adults.

Page: 71, Location: 1088-1090


The basic summary of this section: git while the Gittins’s good.

Page: 72, Location: 1096-1096


3   Sorting Making Order

Page: 72, Location: 1101-1102


The Ecstasy of Sorting

Page: 73, Location: 1119-1119


The first code ever written for a “stored program” computer was a program for efficient sorting.

Page: 74, Location: 1132-1132


The truncated top of an immense, sorted list is in many ways the universal user interface.

Page: 75, Location: 1146-1147


The Agony of Sorting

Page: 76, Location: 1155-1155


“To lower costs per unit of output, people usually increase the size of their operations,” wrote J. C. Hosken in 1955, in the first scientific article published on sorting. This is the economy of scale familiar to any business student.

Page: 76, Location: 1155-1156


But with sorting, size is a recipe for disaster: perversely, as a sort grows larger, “the unit cost of sorting, instead of falling, rises.” Sorting involves steep diseconomies of scale, violating our normal intuitions about the virtues of doing things

Page: 76, Location: 1156-1158


But with sorting, size is a recipe for disaster: perversely, as a sort grows larger, “the unit cost of sorting, instead of falling, rises.” Sorting involves steep diseconomies of scale, violating our normal intuitions about the virtues of doing things in bulk.

Page: 76, Location: 1156-1159


This is the first and most fundamental insight of sorting theory. Scale hurts.

Page: 76, Location: 1161-1162


All we need are about 80,658,175,170,943,878,571,660,636,856,403,766,975,289,505,440,883,277,824,000,000,000,000 attempts at the title. This number, a bit over 80 unvigintillion, is 52 factorial, or “52!” in mathematical notation—the number of ways

Page: 77, Location: 1176-1178


All we need are about 80,658,175,170,943,878,571,660,636,856,403,766,975,289,505,440,883,277,824,000,000,000,000 attempts at the title. This number, a bit over 80 unvigintillion, is 52 factorial, or “52!” in mathematical notation—the number of ways that a deck of 52 cards can possibly be ordered.

Page: 77, Location: 1176-1178


Moreover, a computer scientist would want to know the worst sort time. Worst-case analysis lets us make hard guarantees: that a critical process will finish in time, that deadlines won’t be blown.

Page: 78, Location: 1186-1188


This is the rosiest class of problems there is: called “Big-O of one,” written O(1), it is also known as “constant time.”

Page: 78, Location: 1195-1196


Now, the time required to pass the roast around the table will be “Big-O of n,” written O(n), also known as “linear time”—with twice the guests, you’ll wait twice as long for the dish to come around.

Page: 79, Location: 1198-1201


This turns out to be “Big-O of n-squared,” written O(n2) and also known as “quadratic time.” Here again, we only care about the basic contours of the relationship between n and time. There’s no O(2n2) for two hugs apiece, or O(n2 + n) for hugs plus passing the food around, or O(n2 + 1) for hugs plus home cleaning. It’s all quadratic time, so O(n2) covers everything.

Page: 79, Location: 1210-1217


The Squares: Bubble Sort and Insertion Sort

Page: 80, Location: 1224-1224


Obama was right to eschew Bubble Sort, an algorithm which has become something of a punching bag for computer science students: it’s simple, it’s

Page: 81, Location: 1228-1229


Obama was right to eschew Bubble Sort, an algorithm which has become something of a punching bag for computer science students: it’s simple, it’s intuitive, and it’s extremely inefficient.

Page: 81, Location: 1228-1229


Computer scientists call this, appropriately enough, Insertion Sort. The good news is that it’s arguably even more intuitive than Bubble Sort and doesn’t have quite the bad reputation.

Page: 82, Location: 1245-1247


Breaking the Quadratic Barrier: Divide and Conquer At this point, having seen two entirely sensible approaches fall into unsustainable quadratic time, it’s natural to wonder whether faster sorting is even possible.

Page: 82, Location: 1250-1251


This approach is known today as Mergesort, one of the legendary algorithms in computer science. As a 1997 paper put it, “Mergesort is as important in the history of sorting as sorting in the history of computing.”

Page: 84, Location: 1282-1284


The power of Mergesort comes from the fact that it indeed ends up with a complexity between linear and quadratic time—specifically, O(n log n), known as “linearithmic” time.

Page: 84, Location: 1284-1286


No wonder it’s the method of choice for large-scale industrial sorting problems.

Page: 85, Location: 1292-1292


Computer science, as undergraduates are taught, is all about tradeoffs.

Page: 88, Location: 1346-1346


And one of the most central tradeoffs is between sorting and searching. The basic principle is this: the effort expended on sorting materials is just a preemptive strike against the effort it’ll take to search through them later. What the precise balance should be depends on the exact parameters of the situation, but thinking about sorting as valuable only to support future search tells us something surprising: Err on the side of messiness. Sorting something that you will never search is a complete waste; searching something you never sorted is merely inefficient.

Page: 88, Location: 1347-1352


The verdict is clear: ordering your bookshelf will take more time and energy than scanning through it ever will.

Page: 89, Location: 1364-1365


Computer science shows that the hazards of mess and the hazards of order are quantifiable and that their costs can be measured in the same currency: time. Leaving something unsorted might be thought of as an act of procrastination—passing the buck to one’s future self, who’ll have to pay off with interest what we chose not to pay up front. But the whole story is subtler than that. Sometimes mess is more than just the easy choice. It’s the optimal choice.

Page: 90, Location: 1374-1378


Charles Lutwidge Dodgson

Page: 90, Location: 1380-1380


“The present method of assigning prizes is, except in the case of the first prize, entirely unmeaning.” Said plainly, the silver medal is a lie. “As a mathematical fact,” he continued, “the chance that the 2nd best Player will get the prize he deserves is only 16/31sts; while the chance that the best 4 shall get their proper prizes is so small, that the odds are 12 to 1 against its happening!”

Page: 92, Location: 1396-1399


One of the most familiar algorithms in sports is the Round-Robin format, where each of n teams eventually plays every one of the other n − 1 teams. While arguably the most comprehensive, it’s also one of the most laborious. Having every team grapple with every other team is like having guests exchange hugs at our dinner party: the dreaded O(n2), quadratic time.

Page: 92, Location: 1405-1409


Ladder tournaments—popular in sports like badminton, squash, and racquetball—put players in a linear ranking, with each player allowed to issue a direct challenge to the player immediately above them, exchanging places if they prevail. Ladders are the Bubble Sorts of the athletic world and are thus also quadratic, requiring O(n2) games to reach a stable ranking.

Page: 92, Location: 1410-1413


Computer scientists call this phenomenon noise. All of the sorting algorithms that we’ve considered thus far assume perfect, flawless, foolproof comparisons, ones that never mess up and mistakenly judge the lesser of two quantities to be the greater. Once you allow for a “noisy comparator,” some of computer science’s most hallowed algorithms go out the window—and some of its most maligned have their day of redemption.

Page: 96, Location: 1462-1465


That Comparison Counting Sort is the single most robust sorting algorithm known, quadratic or better, should offer something very specific to sports fans: if your team doesn’t make the playoffs, don’t whine.

Page: 97, Location: 1483-1485


Put differently, if your team is eliminated early in the postseason, it’s tough luck. But if your team fails to get to the postseason, it’s tough truth. You may get sports-bar sympathy from your fellow disappointed fans, but you won’t get any from a computer scientist.

Page: 97, Location: 1486-1488


Displacement happens when an animal uses its knowledge of the hierarchy to determine that a particular confrontation simply isn’t worth it. In many animal societies, resources and opportunities—food, mates, preferred spaces, and so forth—are scarce, and somehow it must be decided who gets what. Establishing an order ahead of time is less violent than coming to blows every time a mating opportunity or a prime spot of grass becomes available. Though we may cringe when we see creatures turning their claws and beaks on each other, biologists tend to think of pecking orders as the violence that preempts violence.

Page: 99, Location: 1511-1516


We’ve now seen two separate downsides to the desire of any group to sort itself. You have, at minimum, a linearithmic number of confrontations, making everyone’s life more combative as the group grows—and you also oblige every competitor to keep track of the ever-shifting status of everyone else, otherwise they’ll find themselves fighting battles they didn’t need to. It taxes not only the body but the mind.

Page: 101, Location: 1535-1537


The only caveat is that the time required for the event is determined by its slowest competitors. This sporting contest is the marathon, and it suggests something critical: a race is fundamentally different from a fight.

Page: 101, Location: 1540-1542


This move from “ordinal” numbers (which only express rank) to “cardinal” ones (which directly assign a measure to something’s caliber) naturally orders a set without requiring pairwise comparisons. Accordingly, it makes possible dominance hierarchies that don’t require direct head-to-head matchups.

Page: 101, Location: 1548-1550


It’s possible for the individuals to resent the basis of this hierarchy, but not really to contest its verdict. As a result, individual pairwise interactions take place with a minimum of jockeying for status. By and large, any pair of people can tell, without needing to negotiate, who is supposed to show what level of respect to whom. Everyone knows where to meet.

Page: 102, Location: 1556-1559


in practice one straightforward principle determines which ships give way to which: the “Law of Gross Tonnage.” Quite simply, the smaller ship gets out of the way of the larger one.

Page: 102, Location: 1559-1561


Linearithmic numbers of fights might work fine for small-scale groups; they do in nature. But in a world where status is established through pairwise comparisons—whether they involve exchanging rhetoric or gunfire—the amount of confrontation quickly spirals out of control as society grows. Operating at industrial scale, with many thousands or millions of individuals sharing the same space, requires a leap beyond. A leap from ordinal to cardinal.

Page: 103, Location: 1570-1573


4   Caching Forget About It

Page: 105, Location: 1608-1609


In the practical use of our intellect, forgetting is as important a function as remembering. —WILLIAM JAMES

Page: 105, Location: 1609-1611


But just as the tidiness of a scholar’s desk may hide the messiness of their mind, so does the apparent tidiness of a computer’s file system obscure the highly engineered chaos of how data is actually being stored underneath the nested-folder veneer.

Page: 107, Location: 1628-1630


The Memory Hierarchy

Page: 107, Location: 1633-1633


In 1946, Arthur Burks, Herman Goldstine, and John von Neumann, working at the Institute for Advanced Study in Princeton, laid out a design proposal for what they called an electrical “memory organ.”

Page: 108, Location: 1641-1643


Wilkes’s proposal was implemented in the IBM 360/85 supercomputer later in the 1960s, where it acquired the name of the “cache.”

Page: 109, Location: 1662-1663


Likewise, a factory that doubles its manufacturing speed each year—but has the same number of parts shipped to it from overseas at the same sluggish pace—will mean little more than a factory that’s twice as idle.

Page: 109, Location: 1671-1673


For a while it seemed that Moore’s Law was yielding little except processors that twiddled their thumbs ever faster and ever more of the time. In the 1990s this began to be known as the “memory wall.”

Page: 110, Location: 1673-1674


Modern consumer laptops, tablets, and smartphones have on the order of a six-layer memory hierarchy, and managing memory smartly has never been as important to computer science as it is today.

Page: 110, Location: 1675-1677


Eviction and Clairvoyance

Page: 110, Location: 1678-1678


Bélády’s 1966 paper on caching algorithms would become the most cited piece of computer science research for fifteen years.

Page: 111, Location: 1690-1691


As it explains, the goal of cache management is to minimize the number of times you can’t find what you’re looking for in the cache and must go to the slower main memory to find it; these are known as “page faults” or “cache misses.” The optimal cache eviction policy—essentially by definition, Bélády wrote—is, when the cache is full, to evict whichever item we’ll need again the longest from now.

Page: 111, Location: 1691-1694


The hypothetical all-knowing, prescient algorithm that would look ahead and execute the optimal policy is known today in tribute as Bélády’s Algorithm. Bélády’s Algorithm is an instance of what computer scientists call a “clairvoyant” algorithm: one informed by data from the future.

Page: 111, Location: 1695-1698


We could just try Random Eviction, adding new data to the cache and overwriting old data at random.

Page: 111, Location: 1701-1702


Another simple strategy is First-In, First-Out (FIFO), where you evict or overwrite whatever has been sitting in the cache the longest (as in Martha Stewart’s question “How long have I had it?”).

Page: 112, Location: 1704-1706


third approach is Least Recently Used (LRU): evicting the item that’s gone the longest untouched (Stewart’s “When was the last time I wore it or used it?”).

Page: 112, Location: 1706-1707


The LRU principle is effective because of something computer scientists call “temporal locality”: if a program has called for a particular piece of information once, it’s likely to do so again in the near future.

Page: 112, Location: 1710-1711


LRU teaches us that the next thing we can expect to need is the last one we needed, while the thing we’ll need after that is probably the second-most-recent one. And the last thing we can expect to need is the one we’ve already gone longest without. Unless we have good reason to think otherwise, it seems that our best guide to the future is a mirror image of the past. The nearest thing to clairvoyance is to assume that history repeats itself—backward.

Page: 113, Location: 1724-1727


Turning the Library Inside Out

Page: 113, Location: 1728-1728


ordered. Here’s why: if temporal locality holds, then the rough-sorting shelves contain the most important books in the whole building. These are the books that were most recently used, so they are the ones that patrons are most likely to be looking for. It seems a crime that arguably the juiciest and most browseworthy

Page: 114, Location: 1741-1744


Says Akamai’s chief architect, Stephen Ludin, “It’s our belief—and we build the company around the fact—that distance matters.”

Page: 117, Location: 1781-1782


Caching is just as useful when it’s proximity, rather than performance, that’s the scarce resource.

Page: 117, Location: 1784-1785


“I have to emphasize,” says Noguchi, “that a very fundamental principle in my method is not to group files according to content.”

Page: 120, Location: 1840-1841


But computer science gives us something that most efficiency gurus don’t: guarantees.

Page: 122, Location: 1858-1859


LRU tells us that when we add something to our cache we should discard the oldest item—but it doesn’t tell us where we should put the new item.

Page: 122, Location: 1859-1861


In short, the mathematics of self-organizing lists suggests something radical: the big pile of papers on your desk, far from being a guilt-inducing fester of chaos, is actually one of the most well-designed and efficient structures available. What might appear to others to be an unorganized mess is, in fact, a self-organizing mess. Tossing things back on the top of the pile is the very best you can do, shy of knowing the future.

Page: 123, Location: 1880-1883


His results mapped out a graph of how memory fades over time, known today by psychologists as “the forgetting curve.” Ebbinghaus’s results established the credibility of a quantitative science of human memory, but they left open something of a mystery.

Page: 124, Location: 1892-1894


The key idea behind Anderson’s new account of human memory is that the problem might be not one of storage, but of organization. According to his theory, the mind has essentially infinite capacity for memories, but we have only a finite amount of time in which to search for them. Anderson made the analogy to a library with a single, arbitrarily long shelf—the Noguchi Filing System at Library of Congress scale. You can fit as many items as you want on that shelf, but the closer something is to the front the faster it will be to find.

Page: 125, Location: 1903-1908


The Tyranny of Experience

Page: 126, Location: 1931-1931


big book is a big nuisance. —CALLIMACHUS (305–410 BC), LIBRARIAN AT ALEXANDRIA

Page: 126, Location: 1931-1932


A big book is a big nuisance. —CALLIMACHUS (305–410 BC), LIBRARIAN AT ALEXANDRIA

Page: 126, Location: 1931-1932


The fastest cache on current computers, for instance, is made with what’s called SRAM, which costs roughly a thousand times as much per byte as the flash memory in solid-state drives.

Page: 127, Location: 1936-1937


Unavoidably, the larger a memory is, the more time it takes to search for and extract a piece of information from

Page: 127, Location: 1945-1945


Unavoidably, the larger a memory is, the more time it takes to search for and extract a piece of information from it.

Page: 127, Location: 1945-1945


And when it comes to episodic memory, well, every year adds a third of a million waking minutes to one’s total lived experience. Considered this way, it’s a wonder that the two of us—or anyone—can mentally keep up at all. What’s surprising is not memory’s slowdown, but the fact that the mind can possibly stay afloat and responsive as so much data accumulates.

Page: 128, Location: 1951-1954


Caching gives us the language to understand what’s happening. We say “brain fart” when we should really say “cache miss.”

Page: 129, Location: 1968-1969


Though we always manage to find some way to order the things we do in our days, as a rule we don’t consider ourselves particularly good at it—hence the perennial bestseller status of time-management guides.

Page: 131, Location: 2001-2003


William James, the “father of American psychology,” asserts that “there’s nothing so fatiguing as the eternal hanging on of an uncompleted task,”

Page: 131, Location: 2007-2008


This practice would be built upon by Taylor’s colleague Henry Gantt, who in the 1910s developed the Gantt charts that would help organize many of the twentieth century’s most ambitious construction projects, from the Hoover Dam to the Interstate Highway System. A century later, Gantt charts still adorn the walls and screens of project managers at firms like Amazon, IKEA, and SpaceX.

Page: 132, Location: 2017-2020


Thus you can keep the total amount of time spent doing laundry to the absolute minimum. Johnson’s analysis had yielded scheduling’s first optimal algorithm: start with the lightest wash, end with the smallest hamper.

Page: 133, Location: 2034-2035


This is a sufficiently fundamental and counterintuitive point that it’s worth repeating. If you have only a single machine, and you’re going to do all of your tasks, then any ordering of the tasks will take you the same amount of time.

Page: 134, Location: 2043-2045


Thus we encounter the first lesson in single-machine scheduling literally before we even begin: make your goals explicit.

Page: 134, Location: 2045-2046


If you’re concerned with minimizing maximum lateness, then the best strategy is to start with the task due soonest and work your way toward the task due last. This strategy, known as Earliest Due Date, is fairly intuitive. (For instance, in a service-sector context, where each arriving patron’s “due date” is effectively the instant they walk in the door, it just means serving customers in order of arrival.)

Page: 134, Location: 2054-2057


Earliest Due Date is optimal for reducing maximum lateness, which means it will minimize the rottenness of the single most rotten thing you’ll have to eat; that may not be the most appetizing metric to eat by.

Page: 135, Location: 2065-2067


Moore’s Algorithm says that we start out just like with Earliest Due Date—by scheduling out our produce in order of spoilage date, earliest first, one item at a time. However, as soon as it looks like we won’t get to eating the next item in time, we pause, look back over the meals we’ve already planned, and throw out the biggest item (that is, the one that would take the most days to consume).

Page: 135, Location: 2069-2072


Getting Things Done

Page: 136, Location: 2079-2080


Minimizing the sum of completion times leads to a very simple optimal algorithm called Shortest Processing Time: always do the quickest task you can.

Page: 137, Location: 2090-2091


Its sum-of-completion-times metric can be expressed another way: it’s like focusing above all on reducing the length of your to-do list. If each piece of unfinished business is like a thorn in your side, then racing through the easiest items may bring some measure of relief.

Page: 137, Location: 2095-2097


this strategy nonetheless offers a nice rule of thumb: only prioritize a task that takes twice as long if it’s twice as important.

Page: 138, Location: 2106-2106


the “debt avalanche.” This debt-reduction strategy says to ignore the number and size of your debts entirely, and simply funnel your money toward the debt with the single highest interest rate. This corresponds rather neatly to working through jobs in order of importance per unit time. And it’s the strategy that will reduce the total burden of your debt as quickly as possible.

Page: 138, Location: 2113-2115


It’s said that “a man with one watch knows what time it is; a man with two watches is never sure.”

Page: 139, Location: 2122-2123


Computer scientists would call this a “ping attack” or a “denial of service” attack: give a system an overwhelming number of trivial things to do, and the important things get lost in the chaos.

Page: 139, Location: 2129-2131


Live by the metric, die by the metric. If all tasks are indeed of equal weight, then that’s exactly what we should be doing. But if we don’t want to become slaves to minutiae, then we need to take measures toward that end.

Page: 140, Location: 2146-2147


Once JPL engineers had identified the Pathfinder problem as a case of priority inversion, they wrote up a fix and beamed the new code across millions of miles to Pathfinder. What was the solution they sent flying across the solar system? Priority inheritance. If a low-priority task is found to be blocking a high-priority resource, well, then all of a sudden that low-priority task should momentarily become the highest-priority thing on the system, “inheriting” the priority of the thing it’s blocking.

Page: 142, Location: 2169-2172


A commitment to fastidiously doing the most important thing you can, if pursued in a head-down, myopic fashion, can lead to what looks for all the world like procrastination.

Page: 143, Location: 2179-2180


this problem belongs to a class that most computer scientists believe has no efficient solution—it’s what the field calls “intractable.”

Page: 145, Location: 2213-2215


The drawing of the borders of scheduling theory continues to this day. A recent survey showed that the status of about 7% of all problems is still unknown, scheduling’s terra incognita. Of the 93% of the problems that we do understand, however, the news isn’t great: only 9% can be solved efficiently, and the other 84% have been proven intractable.

Page: 146, Location: 2228-2231


The best time to plant a tree is twenty years ago. The second best time is now. —PROVERB

Page: 146, Location: 2234-2236


But there is one twist that can make it easier: being able to stop one task partway through and switch to another. This property, “preemption,” turns out to change the game dramatically.

Page: 146, Location: 2236-2238


In fact, the weighted version of Shortest Processing Time is a pretty good candidate for best general-purpose scheduling strategy in the face of uncertainty.

Page: 147, Location: 2253-2254


It offers a simple prescription for time management: each time a new piece of work comes in, divide its importance by the amount of time it will take to complete. If that figure is higher than for the task you’re currently doing, switch to the new one; otherwise stick with the current task. This algorithm is the closest thing that scheduling theory has to a skeleton key or Swiss Army knife, the optimal strategy not just for one flavor of problem but for many.

Page: 147, Location: 2254-2257


When the future is foggy, it turns out you don’t need a calendar—just a to-do list.

Page: 148, Location: 2265-2265


The hurrieder I go / The behinder I get —NEEDLEPOINT SEEN IN BOONVILLE, CA

Page: 148, Location: 2266-2267


Psychologists have shown that for us, the effects of switching tasks can include both delays and errors—at the scale of minutes rather than microseconds.

Page: 149, Location: 2283-2284


Brian, for his part, thinks of writing as a kind of blacksmithing, where it takes a while just to heat up the metal before it’s malleable.

Page: 150, Location: 2288-2289


Thrashing

Page: 150, Location: 2296-2296


Computers multitask through a process called “threading,” which you can think of as being like juggling a set of balls. Just as a juggler only hurls one ball at a time into the air but keeps three aloft, a CPU only works on one program at a time, but by swapping between them quickly enough (on the scale of ten-thousandths of a second) it appears to be playing a movie, navigating the web, and alerting you to incoming email all at once.

Page: 150, Location: 2299-2303


Think again about our image of a juggler. With one ball in the air, there’s enough spare time while that ball is aloft for the juggler to toss some others upward as well. But what if the juggler takes on one more ball than he can handle? He doesn’t drop that ball; he drops everything. The whole system, quite literally, goes down.

Page: 151, Location: 2311-2314


As Peter Zijlstra, one of the head developers on the Linux operating system scheduler, puts it, “The caches are warm for the current workload, and when you context switch you pretty much invalidate all caches. And that hurts.”

Page: 152, Location: 2324-2326


At the extreme, a program may run just long enough to swap its needed items into memory, before giving way to another program that runs just long enough to overwrite them in turn. This is thrashing: a system running full-tilt and accomplishing nothing at all.

Page: 152, Location: 2326-2328


Interrupt Coalescing

Page: 154, Location: 2359-2359


Part of what makes real-time scheduling so complex and interesting is that it is fundamentally a negotiation between two principles that aren’t fully compatible. These two principles are called responsiveness and throughput: how quickly you can respond to things, and how much you can get done overall.

Page: 154, Location: 2359-2362


Establishing a minimum amount of time to spend on any one task helps to prevent a commitment to responsiveness from obliterating throughput entirely: if the minimum slice is longer than the time it takes to context-switch, then the system can never get into a state where context switching is the only thing it’s doing. It’s also a principle that is easy to translate into a recommendation for human lives.

Page: 155, Location: 2375-2378


Methods such as “timeboxing” or “pomodoros,” where you literally set a kitchen timer and commit to doing a single task until it runs out, are one embodiment of this idea.

Page: 156, Location: 2378-2379


So the rule that computer operating systems follow when deciding how long they can afford to dedicate themselves to some task is simple: as long as possible without seeming jittery or slow to the user.

Page: 156, Location: 2385-2386


When we humans leave the house to run a quick errand, we might say something like, “You won’t even notice I’m gone.” When our machines context-switch into a computation, they must literally return to us before we notice they’re gone.

Page: 156, Location: 2386-2388


The moral is that you should try to stay on a single task as long as possible without decreasing your responsiveness below the minimum acceptable limit. Decide how responsive you need to be—and then, if you want to get things done, be no more responsive than that.

Page: 157, Location: 2395-2396


Perhaps the patron saint of the minimal-context-switching lifestyle is the legendary programmer Donald Knuth. “I do one thing at a time,” he says. “This is what computer scientists call batch processing—the alternative is swapping in and out.

Page: 158, Location: 2414-2416


Perhaps the patron saint of the minimal-context-switching lifestyle is the legendary programmer Donald Knuth. “I do one thing at a time,” he says. “This is what computer scientists call batch processing—the alternative is swapping in and out. I don’t swap in and out.”

Page: 158, Location: 2414-2416


But often the problems most germane to daily human life are at the opposite extreme. Our days are full of “small data.” In fact, like Gott standing at the Berlin Wall, we often have to make an inference from the smallest amount of data we could possibly have: a single observation.

Page: 162, Location: 2471-2473


More than 250 years ago, the question of making predictions from small data weighed heavily on the mind of the Reverend Thomas Bayes, a Presbyterian minister in the charming spa town of Tunbridge Wells, England.

Page: 162, Location: 2478-2479


And in either 1746, ’47, ’48, or ’49 he would write one of the most influential papers in all of mathematics, abandon it unpublished, and move on to other things.

Page: 162, Location: 2484-2485


The essay concerned exactly the kind of raffle problem under discussion: Let us then imagine a person present at the drawing of a lottery, who knows nothing of its scheme or of the proportion of Blanks to Prizes in it. Let it further be supposed, that he is obliged to infer this from the number of blanks he hears drawn compared with the number of prizes; and that it is enquired what conclusions in these circumstances he may reasonably make.

Page: 163, Location: 2491-2496


Bayes’s critical insight was that trying to use the winning and losing tickets we see to figure out the overall ticket pool that they came from is essentially reasoning backward. And to do that, he argued, we need to first reason forward from hypotheticals. In other words, we need to first determine how probable it is that we would have drawn the tickets we did if various scenarios were true. This probability—known to modern statisticians as the “likelihood”—gives us the information we need to solve the problem.

Page: 163, Location: 2496-2500


This is the crux of Bayes’s argument. Reasoning forward from hypothetical pasts lays the foundation for us to then work backward to the most probable one.

Page: 165, Location: 2516-2517


Laplace’s Law

Page: 165, Location: 2523-2524


In 1774, completely unaware of the previous work by Bayes, Laplace published an ambitious paper called “Treatise on the Probability of the Causes of Events.” In it, Laplace finally solved the problem of how to make inferences backward from observed effects to their probable causes.

Page: 165, Location: 2526-2528


Count the number of times it has happened in the past plus one, then divide by the number of opportunities plus two.

Page: 167, Location: 2548-2548


This sense of what was “in the bag” before the coin flip—the chances for each hypothesis to have been true before you saw any data—is known as the prior probabilities, or “prior” for short. And Bayes’s Rule always needs some prior from you, even if it’s only a guess.

Page: 168, Location: 2575-2577


The Copernican Principle

Page: 169, Location: 2583-2583


It’s difficult to make predictions, especially about the future. —DANISH PROVERB

Page: 169, Location: 2584-2585


And if a municipal transit system cannot afford the incredibly useful but expensive real-time signs that tell riders when the next bus is going to arrive, the Copernican Principle suggests that there might be a dramatically simpler and cheaper alternative. Simply displaying how long it’s been since the previous bus arrived at that stop offers a substantial hint about when the next one will.

Page: 170, Location: 2607-2610


And it turns out that the Copernican Principle is exactly what results from applying Bayes’s Rule using what is known as an uninformative prior.

Page: 171, Location: 2617-2619


but if the wall were going to be around for a million years, it would be a big coincidence that we happened to bump into it so very close to the start of its existence. Therefore, even though enormously long life spans cannot be ruled out, neither are they very likely.

Page: 172, Location: 2627-2629


Real-World Priors

Page: 173, Location: 2643-2644


In the broadest sense, there are two types of things in the world: things that tend toward (or cluster around) some kind of “natural” value, and things that don’t.

Page: 173, Location: 2644-2645


It’s often lamented that “the rich get richer,” and indeed the process of “preferential attachment” is one of the surest ways to produce a power-law distribution.

Page: 174, Location: 2665-2667


The most popular websites are the most likely to get incoming links; the most followed online celebrities are the ones most likely to gain new fans; the most prestigious firms are the ones most likely to attract new clients; the biggest cities are the ones most likely to draw new residents.

Page: 174, Location: 2667-2668


The most popular websites are the most likely to get incoming links; the most followed online celebrities are the ones most likely to gain new fans; the most prestigious firms are the ones most likely to attract new clients; the biggest cities are the ones most likely to draw new residents. In every case, a power-law distribution will result.

Page: 174, Location: 2667-2669


And for any power-law distribution, Bayes’s Rule indicates that the appropriate prediction strategy is a Multiplicative Rule: multiply the quantity observed so far by some constant factor. For an uninformative prior, that constant factor happens to be 2, hence the Copernican prediction; in other power-law cases, the multiplier will depend on the exact distribution you’re working with. For the grosses of movies, for instance, it happens to be about 1.4. So if you hear a movie has made 6millionsofar,youcanguessitwillmakeabout6 million so far, you can guess it will make about 6millionsofar,youcanguessitwillmakeabout8.4 million overall; if it’s made 90million,guessitwilltopoutat90 million, guess it will top out at 90million,guessitwilltopoutat126 million. This multiplicative rule is a direct consequence of the fact that power-law distributions do not specify a natural scale for the phenomenon they’re describing. The only thing that gives us a sense of scale for our prediction, therefore, is the single data point we have—such as the fact that the Berlin Wall has stood for eight years. The larger the value

Page: 175, Location: 2677-2684


Indeed, distributions that yield the same prediction, no matter their history or current state, are known to statisticians as “memoryless.”

Page: 178, Location: 2719-2720


In a power-law distribution, the longer something has gone on, the longer we expect it to continue going on. So a power-law event is more surprising the longer we’ve been waiting for it—and maximally surprising right before it happens.

Page: 178, Location: 2725-2727


In a normal distribution, events are surprising when they’re early—since we expected them to reach the average—but not when they’re late. Indeed, by that point they seem overdue to happen, so the longer we wait, the more we expect them.

Page: 178, Location: 2728-2729


And in an Erlang distribution, events by definition are never any more or less surprising no matter when they occur. Any state of affairs is always equally likely to end regardless of how long it’s lasted. No wonder politicians are always thinking about their next election.

Page: 178, Location: 2730-2732


In “The Gambler,” Kenny Rogers famously advised that you’ve got to “Know when to walk away / Know when to run”—but for a memoryless distribution, there is no right time to quit. This may in part explain these games’ addictiveness.

Page: 179, Location: 2739-2741


Whether we know it or not, we appear to carry around in our heads surprisingly accurate priors about movie grosses and running times, poem lengths, and political terms of office, not to mention human life spans. We don’t need to gather them explicitly; we absorb them from the world.

Page: 181, Location: 2763-2765


Good predictions require good priors.

Page: 182, Location: 2778-2779


Learning self-control is important, but it’s equally important to grow up in an environment where adults are consistently present and trustworthy.

Page: 184, Location: 2809-2810


He is careful of what he reads, for that is what he will write. He is careful of what he learns, for that is what he will know. —ANNIE DILLARD

Page: 184, Location: 2813-2814


The best way to make good predictions, as Bayes’s Rule shows us, is to be accurately informed about the things you’re predicting.

Page: 184, Location: 2815-2816


If you want to be a good intuitive Bayesian—if you want to naturally make good predictions, without having to think about what kind of prediction rule is appropriate—you need to protect your priors. Counterintuitively, that might mean turning off the news. *There’s a certain irony here: when it comes to time, assuming that there’s nothing special about our arrival does result in us imagining ourselves at the very center after all.

Page: 186, Location: 2837-2845


Q.E.D.” Quod erat demonstrandum,

Page: 187, Location: 2864-2865


The question of how hard to think, and how many factors to consider, is at the heart of a knotty problem that statisticians and machine-learning researchers call “overfitting.”

Page: 188, Location: 2882-2883


The Case Against Complexity

Page: 189, Location: 2885-2885


And every prediction, crucially, involves thinking about two distinct things: what you know and what you don’t.

Page: 189, Location: 2889-2890


If the study were repeated with different people, producing slight variations on the same essential pattern, the one- and two-factor models would remain more or less steady—but the nine-factor model would gyrate wildly from one instance of the study to the next.

Page: 191, Location: 2927-2928


So one of the deepest truths of machine learning is that, in fact, it’s not always better to use a more complex model, one that takes a greater number of factors into account. And the issue is not just that the extra factors might offer diminishing returns—performing better than a simpler model, but not enough to justify the added complexity. Rather, they might make our predictions dramatically worse.

Page: 191, Location: 2929-2932


Fundamentally, overfitting is a kind of idolatry of data, a consequence of focusing on what we’ve been able to measure rather than what matters.

Page: 192, Location: 2942-2943


Overfitting Everywhere

Page: 193, Location: 2951-2951


The original goal of fencing was to teach people how to defend themselves in a duel, hence the name: “defencing.”

Page: 194, Location: 2965-2966


It’s as exciting a sport as ever, but as athletes overfit their tactics to the quirks of scorekeeping, it becomes less useful in instilling the skills of real-world swordsmanship.

Page: 194, Location: 2970-2972


Avinash Kaushik, digital marketing evangelist at Google, warns that trying to get website users to see as many ads as possible naturally devolves into trying to cram sites with ads:

Page: 195, Location: 2983-2984


Kaushik’s conclusion: “Friends don’t let friends measure Page Views. Ever.”

Page: 195, Location: 2988-2988


Mistakes like these are known in law enforcement and the military as “training scars,” and they reflect the fact that it’s possible to overfit one’s own preparation.

Page: 196, Location: 2997-2998


Detecting Overfitting: Cross-Validation

Page: 196, Location: 3000-3000


Research in machine learning has yielded several concrete strategies for detecting overfitting, and one of the most important is what’s known as Cross-Validation.

Page: 196, Location: 3005-3006


Simply put, Cross-Validation means assessing not only how well a model fits the data it’s given, but how well it generalizes to data it hasn’t seen. Paradoxically, this may involve using less data.

Page: 197, Location: 3007-3008


How to Combat Overfitting: Penalizing Complexity If you can’t explain it simply, you don’t understand it well enough. —ANONYMOUS We’ve seen some of the ways that overfitting can rear its head, and we’ve looked at some of the methods to detect and measure it.

Page: 198, Location: 3028-3031


If you can’t explain it simply, you don’t understand it well enough. —ANONYMOUS

Page: 198, Location: 3029-3030


From a statistics viewpoint, overfitting is a symptom of being too sensitive to the actual data we’ve seen. The solution, then, is straightforward: we must balance our desire to find a good fit against the complexity of the models we use to do so.

Page: 198, Location: 3031-3033


So what do these complexity penalties look like? One algorithm, discovered in 1996 by biostatistician Robert Tibshirani, is called the Lasso and uses as its penalty the total weight of the different factors in the model.* By putting this downward pressure on the weights of the factors, the Lasso drives as many of them as possible completely to zero. Only the factors that have a big impact on the results remain in the equation—thus potentially transforming, say, an overfitted nine-factor model into a simpler, more robust formula

Page: 199, Location: 3040-3044


So what do these complexity penalties look like? One algorithm, discovered in 1996 by biostatistician Robert Tibshirani, is called the Lasso and uses as its penalty the total weight of the different factors in the model.* By putting this downward pressure on the weights of the factors, the Lasso drives as many of them as possible completely to zero. Only the factors that have a big impact on the results remain in the equation—thus potentially transforming, say, an overfitted nine-factor model into a simpler, more robust formula with just a couple of the most critical factors.

Page: 199, Location: 3040-3045


Neuroscientists have suggested, for instance, that brains try to minimize the number of neurons that are firing at any given moment—implementing the same kind of downward pressure on complexity as the Lasso.

Page: 200, Location: 3055-3056


The Upside of Heuristics

Page: 200, Location: 3059-3060


When it comes to portfolio management, it turns out that unless you’re highly confident in the information you have about the markets, you may actually be better off ignoring that information altogether.

Page: 201, Location: 3069-3070


The Weight of History

Page: 202, Location: 3085-3085


For example, the oddly cross-wired arrangement of our nervous system (the left side of our body controlled by the right side of our brain and vice versa) reflects the evolutionary history of vertebrates. This phenomenon, called “decussation,”

Page: 203, Location: 3103-3104


This kind of setup—where more time means more complexity—characterizes a lot of human endeavors. Giving yourself more time to decide about something does not necessarily mean that you’ll make a better decision. But it does guarantee that you’ll end up considering more factors, more hypotheticals, more pros and cons, and thus risk overfitting.

Page: 205, Location: 3130-3132


The greater the uncertainty, the bigger the gap between what you can measure and what matters, the more you should watch out for overfitting—that is, the more you should prefer simplicity, and the earlier you should stop.

Page: 206, Location: 3149-3151


When you’re truly in the dark, the best-laid plans will be the simplest. When our expectations are uncertain and the data are noisy, the best bet is to paint with a broad brush, to think in broad strokes.

Page: 206, Location: 3151-3152


The upshot of Early Stopping is that sometimes it’s not a matter of choosing between being rational and going with our first instinct. Going with our first instinct can be the rational solution. The more complex, unstable, and uncertain the decision, the more rational an approach that

Page: 207, Location: 3160-3162


The upshot of Early Stopping is that sometimes it’s not a matter of choosing between being rational and going with our first instinct. Going with our first instinct can be the rational solution. The more complex, unstable, and uncertain the decision, the more rational an approach that is.

Page: 207, Location: 3160-3162


nuptial

Page: 208, Location: 3190-3190


But as computer scientists have discovered over the past few decades, there are entire classes of problems where a perfect solution is essentially unreachable, no matter how fast we make our computers or how cleverly we program them. In fact, no one understands as well as a computer scientist that in the face of a seemingly unmanageable challenge, you should neither toil forever nor give up, but—as we’ll see—try a third thing entirely.

Page: 210, Location: 3216-3219


In fact, it’s the most famous optimization problem of them all.

Page: 211, Location: 3225-3226


They asserted what’s now known as the Cobham–Edmonds thesis: an algorithm should be considered “efficient” if it runs in what’s called “polynomial time”—that is, O(n2), O(n3), or in fact n to the power of any number at all. A problem, in turn, is considered “tractable” if we know how to solve it using an efficient algorithm. A problem we don’t know how to solve in polynomial time, on the other hand, is considered “intractable.” And at anything but the smallest scales, intractable problems are beyond the reach of solution by computers, no matter how powerful.

Page: 212, Location: 3247-3253


As scheduling expert Jan Karel Lenstra told us, “When the problem is hard, it doesn’t mean that you can forget about it, it means that it’s just in a different status. It’s a serious enemy, but you still have to fight it.”

Page: 213, Location: 3263-3264


If you can’t solve the problem in front of you, solve an easier version of it—and then see if that solution offers you a starting point, or a beacon, in the full-blown problem. Maybe it does.

Page: 215, Location: 3295-3296


The message is simple but profound: if we’re willing to accept solutions that are close enough, then even some of the hairiest problems around can be tamed with the right techniques.

Page: 216, Location: 3299-3300


And it’s also the problem that epidemiologists study in thinking about, say, the minimum number of people in a population—and which people—to vaccinate to protect a society from communicable diseases.

Page: 217, Location: 3319-3321


Ever since Cobham and Edmonds’s work, this chasm between “polynomials” (n-to-the-something) and “exponentials” (something-to-the-n) has served as the field’s de facto out-of-bounds marker.

Page: 224, Location: 3428-3430


9   Randomness When to Leave It to Chance

Page: 224, Location: 3435-3436


F. Scott Fitzgerald once wrote that “the test of a first-rate intelligence is the ability to hold two opposing ideas in mind at the same time and still retain the ability to function.” That may be true, but no first-rate

Page: 227, Location: 3473-3474


Scott Fitzgerald once wrote that “the test of a first-rate intelligence is the ability to hold two opposing ideas

Page: 227, Location: 3473-3474


F. Scott Fitzgerald once wrote that “the test of a first-rate intelligence is the ability to hold two opposing ideas in mind at the same time and still retain the ability to function.”

Page: 227, Location: 3473-3474


In modern encryption, for instance, secret primes known only to the sender and recipient get multiplied together to create huge composite numbers that can be transmitted publicly without fear, since factoring the product would take any eavesdropper way too long to be worth attempting. Thus virtually all secure communication online—be it commerce, banking, or email—begins with a hunt for prime numbers.

Page: 230, Location: 3518-3520


Though you may have never heard of the Miller-Rabin test, your laptop, tablet, and phone know it well. Several decades after its discovery, it is still the standard method used to find and check primes in many domains. It’s working behind the scenes whenever you use your credit card online, and almost any time secure communications are sent through the air or over wires.

Page: 232, Location: 3556-3559


Here again randomness offers a way forward: just generate some random xs and plug them in. If the two expressions are not the same, it would be a big coincidence if they gave the same answer for some randomly generated input. And an even bigger coincidence if they also gave identical answers for a second random input. And a bigger coincidence still if they did it for three random inputs in a row. Since there is no known deterministic algorithm for efficiently testing polynomial identity, this randomized method—with multiple observations quickly giving rise to near-certainty—is the only practical one we

Page: 233, Location: 3568-3573


The polynomial identity test shows that sometimes our effort is better spent checking random values—sampling from the two expressions we want to know about—than trying to untangle their inner workings. To some extent this seems reasonably intuitive.

Page: 234, Location: 3573-3575


When we need to make sense of, say, national health care reform—a vast apparatus too complex to be readily understood—our political leaders typically offer us two things: cherry-picked personal anecdotes and aggregate summary statistics.

Page: 236, Location: 3609-3611


There is no such thing as absolute certainty, but there is assurance sufficient for the purposes of human life. —JOHN STUART MILL

Page: 238, Location: 3639-3641


This is an example of a so-called greedy algorithm, which you can also think of as a “myopic algorithm”: one that shortsightedly takes the best thing available every step of the way.

Page: 240, Location: 3677-3678


This technique, developed by the same Los Alamos team that came up with the Monte Carlo Method, is called the Metropolis Algorithm. The Metropolis Algorithm is like Hill Climbing, trying out different small-scale tweaks on a solution, but with one important difference: at any given point, it will potentially accept bad tweaks as well as good ones.

Page: 242, Location: 3710-3713


apocryphal)

Page: 247, Location: 3779-3779


with Mach going so far as to declare that “thus are to be explained the statements of Newton, Mozart, Richard Wagner, and others, when they say that thought, melodies, and harmonies had poured in upon them, and that they had simply retained the right ones.”

Page: 249, Location: 3806-3808


But perhaps it’s just a case of a little knowledge being a dangerous thing. If the Dice Man had only had a deeper grasp of computer science, he’d have had some guidance. First, from Hill Climbing: even if you’re in the habit of sometimes acting on bad ideas, you should always act on good ones. Second, from the Metropolis Algorithm: your likelihood of following a bad idea should be inversely proportional to how bad an idea it is. Third, from Simulated Annealing: you should front-load randomness, rapidly cooling out of a totally random state, using ever less and less randomness as time goes on, lingering longest as you approach freezing. Temper yourself—literally.

Page: 251, Location: 3835-3839


“Once you got somewhere you were happy,” he told the Guardian, “you’d be stupid to shake it up any further.”

Page: 251, Location: 3843-3844


10  Networking How We Connect

Page: 253, Location: 3879-3880


portent—

Page: 254, Location: 3886-3886


The long-distance telegraph began with a portent—Samuel F. B. Morse, standing in the chambers of the US Supreme Court on May 24, 1844, wiring his assistant Alfred Vail in Baltimore a verse from the Old Testament: “WHAT HATH GOD WROUGHT.” The first thing we ask of any new connection is how it began, and from that origin can’t help trying to augur its future.

Page: 254, Location: 3886-3888


The first telephone call in history, made by Alexander Graham Bell to his assistant on March 10, 1876, began with a bit of a paradox. “Mr. Watson, come here; I want to see you”—a simultaneous testament to its ability and inability to overcome physical distance.

Page: 254, Location: 3888-3890


The cell phone began with a boast—Motorola’s Martin Cooper walking down Sixth Avenue on April 3, 1973, as Manhattan pedestrians gawked, calling his rival Joel Engel at AT&T: “Joel, I’m calling you from a cellular phone. A real cellular phone: a handheld, portable, real cellular phone.” (“I don’t remember exactly what he said,” Cooper recalls, “but it was really quiet for a while. My assumption was that he was grinding his teeth.”)

Page: 254, Location: 3891-3894


And the text message began, on December 3, 1992, with cheer: Neil Papworth at Sema Group Telecoms wishing Vodafone’s Richard Jarvis an early “Merry Christmas.”

Page: 254, Location: 3894-3895


The beginnings of the Internet were, somehow fittingly, much humbler and more inauspicious than all of that. It was October 29, 1969, and Charley Kline at UCLA sent to Bill Duvall at the Stanford Research Institute the first message ever transmitted from one computer to another via the ARPANET. The message was “login”—or would have been, had the receiving machine not crashed after “lo.”

Page: 255, Location: 3895-3898


What we now think of as “the Internet” is actually a collection of many protocols, but the chief among them (so much so that it’s often referred to more or less synonymously with the Internet) is what’s known as Transmission Control Protocol, or TCP. It was born from a 1973 talk and a 1974 paper by Vinton “Vint” Cerf and Robert “Bob” Kahn, who laid out a proposal for the language of—as they imagined calling it—an “internetwork.”

Page: 255, Location: 3909-3912


Phone calls use what’s called “circuit switching”: the system opens a channel between the sender and the receiver, which supplies constant bandwidth between the parties in both directions as long as the call lasts.

Page: 256, Location: 3914-3915


In a packet-switched network, rather than using a dedicated channel for each connection, senders and receivers atomize their messages into tiny shards known as “packets,” and merge them into the communal flow of data—a bit like postcards moving at the speed of light.

Page: 257, Location: 3929-3930


In circuit-switched networks, a call fails if any one of its links gets disrupted—which means that reliability goes down exponentially as a network grows larger. In packet switching, on the other hand, the proliferation of paths in a growing network becomes a virtue: there are now that many more ways for data to flow, so the reliability of the network increases exponentially with its size.

Page: 257, Location: 3941-3944


Computer scientists know this concept as the “Byzantine generals problem.” Imagine two generals, on opposite sides of a valley that contains their common enemy, attempting to coordinate an attack. Only by perfect synchronization will they succeed; for either to attack alone is suicide. What’s worse, any messages from one general to the other must be delivered by hand across the very terrain that contains the enemy, meaning there’s a chance that any given message will never arrive.

Page: 259, Location: 3961-3964


Communication is one of those delightful things that work only in practice; in theory it’s impossible.

Page: 259, Location: 3968-3969


so it’s considered enough for a session to begin with what’s called a “triple handshake.” The visitor says hello, the server acknowledges the hello and says hello back, the visitor acknowledges that, and if the server receives this third message, then no further confirmation is needed and they’re off to the races.

Page: 259, Location: 3970-3973


A report from the second half of 2014 showed that almost 10% of upstream Internet traffic during peak hours was due to Netflix—which we tend to think of as sending data almost exclusively downstream, to users. But all that video generates an awful lot of ACKs.

Page: 260, Location: 3987-3989


For this reason, phone services that automatically reduce background noise to silence are doing their users a major disservice. Background static is a continual reassurance that the call is still connected and that any silence is a deliberate choice by the other party. Without it, one must constantly confront the possibility that the call has dropped, and constantly offer reassurances that it has not.

Page: 262, Location: 4006-4008


Ironically, one of the few exceptions to this is in transmitting the human voice. Real-time voice communications, such as Skype, typically do not use TCP, which underlies most of the rest of the Internet. As researchers discovered in the early days of networking, using reliable, robust protocols—with all their ACKs and retransmission of lost packets—to transmit the human voice is overkill. The humans provide the robustness themselves. As Cerf explains, “In the case of voice, if you lose a packet, you just say, ‘Say that again, I missed something.’”

Page: 261, Location: 4001-4005


Every message could be the last, and there is often no telling the difference between someone taking their time to respond and someone who has long since ended the conversation.

Page: 262, Location: 4010-4011


The world’s most difficult word to translate has been identified as “ilunga,” from the Tshiluba language spoken in south-eastern DR Congo.… Ilunga means “a person who is ready to forgive any abuse for the first time, to tolerate it a second time, but never a third time.” —BBC NEWS

Page: 263, Location: 4018-4021


Clearly the ALOHAnet protocol was going to need to tell competing signals how to give each other space, how to yield and make way for one another.

Page: 263, Location: 4031-4031


The first thing that the senders need to do here is what’s called “breaking symmetry.” As any sidewalk pedestrian knows, dodging right as an oncoming walker dodges left, and then having both of you simultaneously dodge back the other way, doesn’t solve anything.

Page: 263, Location: 4032-4034


As one influential paper puts it, “For a transport endpoint embedded in a network of unknown topology and with an unknown, unknowable and constantly changing population of competing conversations, only one scheme has any hope of working—exponential backoff.”

Page: 265, Location: 4050-4052


Beyond just collision avoidance, Exponential Backoff has become the default way of handling almost all cases of networking failure or unreliability. For instance, when your computer is trying to reach a website that appears to be down, it uses Exponential Backoff—trying again one second later, again a few seconds after that, and so forth. This is good for everyone: it prevents a host server that’s down from getting slammed with requests as soon as it comes back online, and it prevents your own machine from wasting too much effort trying to get blood from a stone.

Page: 265, Location: 4052-4056


The first efforts at computer networking focused on establishing reliable transmissions over unreliable links. These efforts proved to be so successful that a second concern immediately arose: making sure that an overloaded network could avoid catastrophic meltdown. No sooner had TCP solved the problem of getting data from point A to point B than it was confronted with the problem of gridlock.

Page: 267, Location: 4088-4091


At the heart of TCP congestion control is an algorithm called Additive Increase, Multiplicative Decrease, or AIMD. Before AIMD kicks in, a new connection will ramp up its transmission rate aggressively: if the first packet is received successfully it sends out two more, if both of those get through it sends out a batch of four, and so on. But as soon as any packet’s ACK does not come back to the sender, the AIMD algorithm takes over. Under AIMD, any fully received batch of packets causes the number of packets in flight not to double but merely to increase by 1, and dropped packets cause the transmission rate to cut back by half (hence the name Additive Increase, Multiplicative Decrease). Essentially, AIMD takes the form of someone saying, “A little more, a little more, a little more, whoa, too much, cut way back, okay a little more, a little more…” Thus it leads to a characteristic bandwidth shape known as the “TCP sawtooth”—steady upward climbs punctuated by steep drops.

Page: 268, Location: 4107-4114


And it may be that human communications themselves mirror the very protocols that transmit them: every text message or email reply encourages yet another, while every unreturned message stanches the flow.

Page: 270, Location: 4131-4132


Some fifty years before Peter’s formulation, Spanish philosopher José Ortega y Gasset in 1910 voiced the same sentiment. “Every public servant should be demoted to the immediately lower rank,” he wrote, “because they were advanced until they became incompetent.”

Page: 270, Location: 4139-4141


Some organizations have attempted to remediate the Peter Principle by simply firing employees who don’t advance. The so-called Cravath System, devised by leading law firm Cravath, Swaine & Moore, involves hiring almost exclusively recent graduates, placing them into the bottom ranks, and then routinely either promoting or firing them over the following years. In 1980, the US Armed Forces adopted a similar “up or out” policy with the Defense Officer Personnel Management Act. The United Kingdom has likewise pursued what they call “manning control,” to great controversy.

Page: 271, Location: 4141-4145


Laurence J. Peter saw it, the insidious Peter Principle arises in corporations because of “the first commandment of hierarchical life: the hierarchy must be preserved.” TCP, in contrast, teaches the virtues of flexibility. Companies speak of “flat” hierarchies and “tall” hierarchies, but they might consider speaking of dynamic ones. Under an AIMD system, no one is long anxious about being overtaxed, nor long resentful about a missed promotion; both are temporary and frequent correctives, and the system hovers near its equilibrium despite everything changing all the time. Perhaps one day we’ll speak not of the arc of one’s career, but rather of its sawtooth.

Page: 271, Location: 4155-4160


In TCP, as we’ve seen, there’s no such thing as a one-way transmission: without consistent feedback, the sender will slow down almost immediately.

Page: 272, Location: 4163-4164


As linguist Victor Yngve would write in 1970, “In fact, both the person who has the turn and his partner are simultaneously engaged in both speaking and listening. This is because of the existence of what I call the back channel, over which the person who has the turn receives short messages such as ‘yes’ and ‘uh-huh’ without relinquishing the turn.”

Page: 272, Location: 4170-4172


We’ve all had the experience of talking to someone whose eyes drifted away—to their phone, perhaps—making us wonder whether our lackluster storytelling was to blame. In fact, it’s now clear that the cause and effect are often the reverse: a poor listener destroys the tale.

Page: 273, Location: 4179-4181


As the saying goes, “the most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’ but ‘That’s funny.’”

Page: 274, Location: 4200-4201


A buffer is essentially a queue whose function is to smooth out bursts. If you walked into a doughnut shop at roughly the same time as another customer, it wouldn’t do for the very momentarily overwhelmed cashier to make one of you leave the store and come back another time. Customers wouldn’t have it, of course, but neither would management: such a policy is virtually guaranteed to underutilize the cashier. Putting the customers in a queue instead ensures that the average throughput of the store approaches its maximum throughput. That’s a good thing.

Page: 275, Location: 4208-4213


This superior resource utilization comes with a very real cost, however: delay.

Page: 275, Location: 4213-4213


It was like trying to have a conversation where every time you say “uh-huh” it is delayed by ten or twenty seconds. The speaker is going to slow way down, assuming you aren’t comprehending them, and there’s nothing you can do about it.

Page: 276, Location: 4227-4228


Dropped packets are the Internet’s primary feedback mechanism.

Page: 277, Location: 4233-4234


Fundamentally, buffers use delay—known in networking as “latency”—in order to maximize throughput. That is, they cause packets (or customers) to wait, to take advantage of later periods when things are slow.

Page: 277, Location: 4236-4237


One of the fundamental principles of buffers, be they for packets or patrons, is that they only work correctly when they are routinely zeroed

Page: 277, Location: 4240-4241


One of the fundamental principles of buffers, be they for packets or patrons, is that they only work correctly when they are routinely zeroed out.

Page: 277, Location: 4240-4241


Photons that miss the retina aren’t queued for later viewing. In real life, packet loss is almost total.

Page: 278, Location: 4262-4262


things done under overload. The most prevalent critique of modern

Page: 279, Location: 4264-4265


It used to be that people knocked on your door, got no response, and went away. Now they’re effectively waiting in line when you come home.

Page: 279, Location: 4268-4269


We used to request dedicated circuits with others; now we send them packets and wait expectantly for ACKs. We used to reject; now we defer.

Page: 279, Location: 4275-4277


The much-lamented “lack of idleness” one reads about is, perversely, the primary feature of buffers: to bring average throughput up to peak throughput. Preventing idleness is what they do. You check email from the road, from vacation, on the toilet, in the middle of the night. You are never, ever bored. This is the mixed blessing of buffers, operating as advertised.

Page: 279, Location: 4277-4280


in fact, gaming is so sensitive to latency that all important gaming honors are still contested in person, with players boarding airplanes to gather and compete over a network serving just a single room.

Page: 281, Location: 4294-4295


And indeed, bringing latencies down is one of the current frontiers of networking research, and it will be interesting to see what that brings.

Page: 281, Location: 4305-4306


Schoolchildren are taught to conceive of literary plots as belonging to one of several categories: man vs. nature, man vs. self, man vs. man, man vs. society. Thus far in this book we have considered primarily cases in the first two categories—that is to say, computer science has thus far been our guide to problems created by the fundamental structure of the world, and by our limited capacities for processing information. Optimal stopping problems spring from

Page: 282, Location: 4324-4327


Schoolchildren are taught to conceive of literary plots as belonging to one of several categories: man vs. nature, man vs. self, man vs. man, man vs. society. Thus far in this book we have considered primarily cases in the first two categories—that is to say, computer science has thus far been our guide to problems created by the fundamental structure of the world, and by our limited capacities for processing information.

Page: 282, Location: 4324-4327


Arguably the most influential economist of the twentieth century, John Maynard Keynes, once said that “successful investing is anticipating the anticipations of others.”

Page: 283, Location: 4337-4338


In this way, the value of a stock isn’t what people think it’s worth but what people think people think

Page: 284, Location: 4340-4341


In this way, the value of a stock isn’t what people think it’s worth but what people think people think it’s worth.

Page: 284, Location: 4340-4341


As Alan Turing proved in 1936, a computer program can never tell you for sure whether another program might end up calculating forever without end—except by simulating the operation of that program and thus potentially going off the deep end itself. (Accordingly, programmers will never have automated tools that can tell them whether their software will freeze.)

Page: 284, Location: 4350-4352


“I don’t know if this is an actual game-theory term,” says the world’s top-rated poker player, Dan Smith, “but poker players call it ‘leveling.’ Level one is ‘I know.’ Two is ‘you know that I know.’ Three, ‘I know that you know that I know.’

Page: 285, Location: 4359-4361


Reaching Equilibrium

Page: 286, Location: 4383-4383


Mathematicians analyzing these games seek to identify a so-called equilibrium: that is, a set of strategies that both players can follow such that neither player would want to change their own play, given the play of their opponent. It’s called an equilibrium because it’s stable—no amount of further reflection by either player will bring them to different choices. I’m content with my strategy, given yours, and you’re content with your strategy, given mine.

Page: 287, Location: 4387-4391


the mathematician John Nash proved in 1951 that every two-player game has at least one equilibrium. This major discovery would earn Nash the Nobel Prize in Economics in 1994 (and lead to the book and film A Beautiful Mind, about Nash’s life). Such an equilibrium is now often spoken of as the “Nash equilibrium”—the “Nash” that Dan Smith always tries to keep track

Page: 287, Location: 4400-4403


the mathematician John Nash proved in 1951 that every two-player game has at least one equilibrium. This major discovery would earn Nash the Nobel Prize in Economics in 1994 (and lead to the book and film A Beautiful Mind, about Nash’s life). Such an equilibrium is now often spoken of as the “Nash equilibrium”—the “Nash” that Dan Smith always tries to keep track of.

Page: 287, Location: 4400-4403


As UC Berkeley computer scientist Christos Papadimitriou

Page: 288, Location: 4415-4415


In a game-theory context, knowing that an equilibrium exists doesn’t actually tell us what it is—or how to get there. As UC Berkeley computer scientist Christos Papadimitriou writes, game theory “predicts the agents’ equilibrium behavior typically with no regard to the ways in which such a state will be reached—a consideration that would be a computer scientist’s foremost concern.”

Page: 288, Location: 4414-4417


it.” And so, the original field of game theory begat

Page: 289, Location: 4418-4419


MIT’s Scott Aaronson agrees. “In my opinion,” he says, “if the theorem that Nash equilibria exist is considered relevant to debates about (say) free markets versus government intervention, then the theorem that finding those equilibria is [intractable] should be considered relevant also.”

Page: 289, Location: 4429-4431


Even when we can reach an equilibrium, just because it’s stable doesn’t make it good. It may seem paradoxical, but the equilibrium strategy—where neither player is willing to change tack—is by no means necessarily the strategy that leads to the best outcomes for the players. Nowhere is that better illustrated than in game theory’s most famous, provocative, and controversial two-player game: “the prisoner’s dilemma.”

Page: 290, Location: 4433-4437


Here’s the problem. No matter what your accomplice does, it’s always better for you to defect.

Page: 290, Location: 4444-4444


This has emerged as one of the major insights of traditional game theory: the equilibrium for a set of players, all acting rationally in their own interest, may not be the outcome that is actually best for those players.

Page: 291, Location: 4454-4455


Surprisingly, Tim Roughgarden and Cornell’s Éva Tardos proved in 2002 that the “selfish routing” approach has a price of anarchy that’s a mere 4/3. That is, a free-for-all is only 33% worse than perfect top-down coordination.

Page: 292, Location: 4465-4466


Every corporation (and, to some degree, every nation) is better off being a bit more reckless than their peers for the sake of competitive advantage. Yet if they all act more recklessly, it leads to a ravaged Earth, and all for nothing: there’s no economic advantage for anyone relative to where they started.

Page: 294, Location: 4494-4496


Imagine two shopkeepers in a small town. Each of them can choose either to stay open seven days a week or to be open only six days a week, taking Sunday off to relax with their friends and family. If both of them take a day off, they’ll retain their existing market share and experience less stress. However, if one shopkeeper decides to open his shop seven days a week, he’ll draw extra customers—taking them away from his competitor and threatening his livelihood. The Nash equilibrium, again, is for everyone to work all the time.

Page: 295, Location: 4511-4515


either the two-party prisoner’s dilemma, or the multi-party tragedy of the commons?

Page: 295, Location: 4520-4520


This brings us to a branch of game theory known as “mechanism design.” While game theory asks what behavior will emerge given a set of rules, mechanism design (sometimes called “reverse game theory”) works in the other direction, asking: what rules will give us the behavior we want to see?

Page: 296, Location: 4532-4534


For the small-town shopkeepers, a verbal truce to take Sundays off would be unstable: as soon as either shopkeeper needed some extra cash he’d be liable to violate it, prompting the other to start working Sundays as well so as not to lose market share. This would land them right back in the bad equilibrium where they get the worst of both worlds—they’re exhausted and don’t get any competitive advantage for it. But they might be able to act as their own don by signing a legally binding contract to the effect that, say, any proceeds earned by either shop on a Sunday go to the other shop. By worsening the unsatisfactory equilibrium, they’d make a new and better one.

Page: 297, Location: 4543-4547


Scaling up this logic results in a potent argument for the role of government. In fact, many governments do have laws on the books mandating minimum vacations and limiting shop hours. And while the United States is one of the only developed nations without federal requirements for paid vacation, Massachusetts, Maine, and Rhode Island do have state-level prohibitions on Thanksgiving commerce.

Page: 298, Location: 4567-4569


God happens to be even better than government in this respect, since omniscience and omnipotence provide a particularly strong guarantee that taking bad actions will have dire consequences. It turns out there’s no Godfather quite like God the Father.

Page: 299, Location: 4574-4576


Mechanism Design by Evolution

Page: 299, Location: 4579-4579


The heart has its reasons which reason knows nothing of. —BLAISE PASCAL

Page: 299, Location: 4583-4584


Whether prompted by a shoddy business or a petty thief, outrage can override rationality. And in these instances, it may be that the hand of evolution has done what it would otherwise have taken an authority outside the game to accomplish.

Page: 301, Location: 4611-4613


Emotion, for the bitter, retaliatory consumer and for the convenience-store hero alike, is our own species taking over the controls for a minute. “Morality is herd instinct in the individual,” wrote Nietzsche.

Page: 302, Location: 4617-4618


As Cornell economist Robert Frank puts it, “If people expect us to respond irrationally to the theft of our property, we will seldom need to, because it will not be in their interests to steal it. Being predisposed to respond irrationally serves much better here than being guided only by material self-interest.”

Page: 302, Location: 4621-4624


In both cases this so-called commitment problem can be at least partially addressed by a contract. But game theory suggests that in the case of dating, the voluntary bonds of the law are less relevant to an enduring partnership than the involuntary bonds of love itself.

Page: 303, Location: 4633-4635


As Robert Frank puts it, “The worry that people will leave relationships because it may later become rational for them to do so is largely erased if it is not rational assessment that binds them in the first place.”

Page: 303, Location: 4635-4636


Yes, people search for objective characteristics they care about. Everybody wants somebody who’s kind and intelligent and interesting and healthy and maybe physically attractive, good earning power, the whole laundry list of features, but that’s the first pass.… After you’ve spent enough time together, it’s not those things that make you want to stay together. It’s just the fact that it’s that particular person—that is what’s valuable to you, so you don’t really need the contract so much as you need a feeling that makes you not want to separate, even though objectively there might be a better option available to you.

Page: 303, Location: 4637-4641


Playwright George Bernard Shaw once wrote of marriage that “If the prisoner is happy, why lock him in? If he is not, why pretend that he is?” Game theory offers a subtle answer to this particular riddle. Happiness is the lock.

Page: 303, Location: 4643-4645


One of the simplest auction formats has each participant write down their bid in secret, and the one whose bid is highest wins the item for whatever price they wrote down. This is known as a “sealed-bid first-price auction,”

Page: 305, Location: 4672-4673


Another classic auction format, the “Dutch auction” or “descending auction,” gradually lowers an item’s price until someone is willing to buy it. The name references the Aalsmeer Flower Auction, the largest flower auction in the world, which takes place daily in the Netherlands—but Dutch auctions are more prevalent than they might initially seem.

Page: 306, Location: 4679-4681


The inverse of a Dutch or descending auction is what’s known as an “English auction” or “ascending auction”—the most familiar auction format. In an English auction, bidders alternate raising the price until all but one of them drop out.

Page: 306, Location: 4686-4688


It’s easy to imagine a bunch of people all going over a cliff together because “everyone else” was acting as though it’d all be fine—when in reality each person had qualms, but suppressed them because of the apparent confidence of everyone else in the group.

Page: 307, Location: 4700-4702


Investors are said to fall into two broad camps: “fundamental” investors, who trade on what they perceive as the underlying value of a company, and “technical” investors, who trade on the fluctuations of the market.

Page: 309, Location: 4730-4731


For one, be wary of cases where public information seems to exceed private information, where you know more about what people are doing than why they’re doing it, where you’re more concerned with your judgments fitting the consensus than fitting the facts. When you’re mostly looking to others to set a course, they may well be looking right back at you to do the same.

Page: 309, Location: 4737-4740


Second, remember that actions are not beliefs; cascades get caused in part when we misinterpret what others think based on what they do. We should be especially hesitant to overrule our own doubts—and if we do, we might want to find some way to broadcast those doubts even as we move forward, lest others fail to distinguish the reluctance in our minds from the implied enthusiasm in our actions.

Page: 310, Location: 4740-4743


Last, we should remember from the prisoner’s dilemma that sometimes a game can have irredeemably lousy rules. There may be nothing we can do once we’re in it, but the theory of information cascades may help us to avoid such a game in the first place.

Page: 310, Location: 4743-4744


And to an algorithmic game theorist in particular, one property especially stands out: the participants are incentivized to be honest.

Page: 311, Location: 4759-4760


French existentialist philosopher Jean-Paul Sartre famously wrote that “Hell is other people.” He didn’t mean that others are inherently malicious or unpleasant, but rather that they complicate our own thoughts and beliefs: When

Page: 313, Location: 4796-4798


French existentialist philosopher Jean-Paul Sartre famously wrote that “Hell is other people.” He didn’t mean that others are inherently malicious or unpleasant, but rather that they complicate our own thoughts and beliefs:

Page: 313, Location: 4796-4798


The road to hell is paved with intractable recursions, bad equilibria, and information cascades. Seek out games where honesty is the dominant strategy. Then just be yourself.

Page: 314, Location: 4807-4808


Binmore adds another insight: games like the prisoner’s dilemma seemingly obliterate Immanuel Kant’s argument that rationality consists of what he called the “categorical imperative,” acting the way you wish everyone else would act. The categorical imperative would give us a better outcome in the prisoner’s dilemma than the equilibrium strategy, but there’s no getting around the fact that this outcome isn’t a stable one.

Page: 315, Location: 4819-4822