My research is on understanding the relationship between algorithms and information. Algorithms are the computational processes by which we process information and use it to make decisions. To illustrate the type of research I became excited about, I should first tell another story. My major at Cornell was math. I went to graduate school at MIT to study pure math. My PhD is in math. After my second year at MIT, I decided to take a summer job. The year was 1999. Boston was full of tech start-ups, many of them spin-offs from MIT. In particular there was a spin-off from my home department called Akamai Technologies, which has become one of the big players in internet services. It was founded by a MIT professor who became my thesis adviser.
The company provides a service that makes other websites more scalable, fast, and secure. They accomplish this with a massive distributed network of tens of thousands of servers around the world. (We have some of them on the Cornell campus, for example.) If you are an Akamai customer, the content on your website gets spread out in these thousands of locations so that when people download it, it comes to them from a computer that is much closer to them. If an attacker tries to take down your site, instead of taking down one computer, they have to take down 10,000. If you have a power failure that takes down your site, a power failure in one place cannot take out a site that is distributed over a thousand locations.
I worked in a group called the network mapping group. If you have servers in thousands of locations around the world, and you are supposed to direct every user to the one that is best suited for serving their request, this is a gigantic optimization problem to solve—millions and millions of users and thousands of locations. My group, the mapping group, was in charge of supplying the input to that optimization problem. We wanted to be able to tell the decision-making apparatus the information it needed to know in order to make the best possible decisions.
Traditionally, one would take for granted that this information is easy to obtain and that the hard part is processing it and figuring out where each person should be routed. In reality, it is an incredibly tough engineering challenge to measure the relevant information for making this decision. This requires measuring the network conditions between each of our servers and every client. We would be performing billions of measurements every second, which is not technologically feasible.
What information do we actually need in order to make the best decisions, and what is a smart and scalable strategy for acquiring this information? This is what I mean when I talk about the relationship between algorithms and information. We live in an age where most of the algorithms that ordinary people care about in order to make optimal decisions require a greater amount of information than is technologically feasible for them to obtain.
To read the article in its entirety view below or download the full article.