Diseases and Networks

We did a rapid survey of diseases and the role that networks play in the spread of diseases. We started with a brief discussion of the plagues that besieged Europe and parts of Asia and Africa in the middle ages (14th century). We then talked about John Snow and the 1854 Broad Street Cholera outbreak as an early example of modern methods applied to the study of disease spread. John Snow is now known as the “father” of epidemiology, which is the study of how diseases spread in a population. We then moved from 19th century England to 20th century US and the early history of HIV  in the US. In the early 80s, at a time when AIDS was referred to as the “gay cancer” there was a great deal of discrimination and political inaction associated with AIDS. We also talked about the first book on AIDS “And the Band Played On” by investigative reported Randy Shilts. Shilts himself died from AIDS a few years after the publication of his book. This book promoted the idea (based on a 1984 American Journal of Medicine article) that a French-Canadian Flight Attendant names Gaetan Dugas was “Patient Zero” for AIDS in the US. In other words, the claim was that Dugas had brought AIDS from Africa and then spread it to cities through the innumerable sexual partners he had. This theory has been questioned and in a 2007 article in the Proceedings of the National Academy of Science (PNAS), and it is claimed that AIDS moved from Africa to Haiti and then to the US in the 60s. After discussing AIDS, we also talked about a more recent disease-outbreak, namely the 2003 outbreak of SARS.
Our discussion then moved on to how epidemiologists (and mathematicians, statisticians, and computer scientists) model the spread of disease. I showed a game designed by Princeton University mathematician John Conway, called the Game of Life. Here are some applets that let you play the game and here is John Conway on youtube talking about the complexity that his simple game can lead to. Models for disease-spread are similar to Conway’s Game of Life, in that the disease-status of an individual in each time step depends on the disease-status of “neighboring” individuals. Here is an example of a  Disease spread simulation involving individuals that can, in each time step, be susceptible, infected, or recovered. Finally, we talked about this Disease spread strategy game.
Homework:
Describe a future (imagined) world-wide disease outbreak, providing as much detail as possible.  Include biological details such as whether the disease is viral or bacterial, how it spreads (cough/sneeze vs touch), whether it involves carriers such as rodents, etc. Also, describe a typical individual’s reaction to the infection, including whether the individual can be infectious before showing symptoms and if so for how long. Also, describe  how the disease affects the very young and the old.
Then describe where the disease starts and how it spreads across the world. Include the effects of various transportation networks such as airlines, cargo ships, etc., and also the highway system and rail networks within countries. Also, include the effects of social networks induced by colleges, schools, work places with large number of employees, etc.
In order to make your story as realistic as possible read up on the spread of HIV, SARS, etc. by following the links posted above. You might also want to watch the movie Contagion, that was released last year.
I would like your final description to be no more than 2 typed, single-spaced pages using a 11 pt font. On Tuesday, in class I would like each of you to share with the class a first draft of this description.

Is obesity contagious?

We started discussing work by Nicholas Christakis and James Fowler that first appeared in the New England Journal of Medicine in July 2007. The title of this paper is “The Spread of Obesity in a Large Social Network Over 32 Years.” Christakis and Fowler have followed this up with a series of papers claiming that other aspects of our human condition: happiness, loneliness, decrease in addictive behaviors (e.g., smoking), divorce, etc. can all be shown to spread through networks of friends. Christakis and Fowler have received lots of press and recognition for this work. Some examples I am familiar with are:

  • This TED talk by Christakis titled “The hidden influence of social networks.”
  • This article in the “Wired” magazine.
  • This interview by Stephen Colbert!

There has been some push back against the main conclusions that Christakis and Fowler derive from their data and analysis. For example, statisticians have argued that it may be impossible to distinguish between an effect that has spread over the edges of the network from the possibility that individuals may have formed relationships (edges) with other individuals having similar habits or tastes. This latter tendency is called homophily by social scientists and what some statisticians are arguing is that what Christakis and Fowler are seeing is their data might just be homophily. For example, international students arriving at the University of Iowa tend to associate with other international students. We definitely don’t think of being “international” as a contagion that spread on a network! A couple of somewhat technical articles by statisticians and a mathematician criticizing the Christakis-Fowler methods and conclusions are:

As you can tell from the snarky title of the latter article, Prof. Lyons is quite upset by what he claims is flawed analysis. This push back has been picked up by the popular press as well. Here are some examples:

Christakis and Fowler have since responded to these criticisms in a paper titled: “Social Contagion Theory: Examining Dynamic Social Networks and Human Behavior,” which you can find under publications at Fowler’s webpage. I suspect this is not the last we will hear of this fascinating debate.

Preferential Attachment

In class on Oct 23rd, two students were brave enough to show off their “Facebook report” from Wolfram Alpha. Thank you! In both cases, I was struck by how highly clustered the set of friends was. In other words, a large fraction of the friends were also friends of each other, implying a very high clustering coefficient for each of the two students.

After the “Facebook report” activity, I showed a demonstration of the preferential attachment model (also called the Barábasi-Albert model) in Netlogo. Here is a pretty clear description of how the model works (copied from Netlogo’s page on preferential attachment):

The model starts with two nodes connected by an edge. At each step, a new node is added. A new node picks an existing node to connect to randomly, but with some bias. More specifically, a node’s chance of being selected is directly proportional to the number of connections it already has, or its “degree.” This is the mechanism which is called “preferential attachment.”

Homework for Tuesday, Oct 30: Read “The Seventh Link” on “Rich Get Richer.” Come prepared to discuss this reading in class. Also, play with the Netlogo demonstration of the preferential attachment model and answer the following questions:

  • Suppose you use the model to build a graph with 1000 nodes. What is the average degree of a node in this graph?
  • What is the frequency of nodes of degree 1, degree 2, degree 3, and degree 4?

Heavy-Tailed Degree Distributions

We discussed how some distributions (e.g., human heights, scores on standardized tests, etc.) have the “bell curve” shape (normal or Gaussian distribution) whereas others (e.g., populations of cities, sizes of earthquakes, etc.) have a very different, heavy tailed shape. In such distributions there are lots of items with tiny values and a few items with enormous values. The “heavy” (sometimes called “fat” or “long”) tail of the distribution refers to the fact that a small, but not insignificant number of items in the distribution take on extremely large values relative to the mean. See these slides for more details.

The reason we are interested in shapes of distributions is that real-world networks seem to have a heavy-tailed degree distribution. In other words, real-world networks seem to contain lots of nodes with very small degree, but also a small number of nodes (aka “hubs”) with very high degree. The Computational Epidemiology research group at Iowa has built a “contact networks” for health-care workers at the  University of Iowa Hospitals and Clinics (UIHC) and these networks show a clear heavy-tailed degree distribution. See these slides for a description of how these networks were built and for plots that clear show the heavy-tailed shape of the degree distributions of these networks.

The fact that many real-world networks have heavy-tailed degree distributions means that random graph models for generating real-world networks need to able to generate nodes with these degree distributions. This is the main motivation for the preferential attachment models proposed by Barabasi and Albert. More on this next week.

Homework (Due by e-mail on Friday, Oct 26th): Wolfram Alpha is a “computational engine” on the web that I use a fair bit. If you have not used it you should spend some time playing with it! You can type “weather in Iowa” or “human height distribution” or “Solve x^2 + 9x = 12” or anything else that strikes your fancy and see what output is produced. You should also try clicking on “Examples” to read more about all the kinds the queries Wolfram Alpha is designed to answer.

Recently Wolfram Alpha added capabilities that allow it to do “personal analytics” using your Facebook data. See this blog post by Stephen Wolfram and this web page for more details. In Part I of the homework I would like you to use Wolfram Alpha to generate a “Facebook report” for yourself and take a close look at this report before class on Tuesday, Oct 23rd. In class, I would like to discuss your findings and I will also ask for volunteers to show their “Facebook report” in class (using my laptop). If you are uncomfortable doing this you can opt out by sending an e-mail before class. In Part II of the homework I want you to write a report (max: 2 typed pages) for the class on what you learned about your Facebook friends and the structure of your Facebook friends network from this report. Describe any aspects of the report you found surprising. Feel free to include plots/graphs produced by Wolfram Alpha in your report.

 

The Watts-Strogatz Model

This is a much delayed post on our meeting from September 25th.

We examined the Watts-Strogatz random graph model in detail, based on its description in this paper. The Watts-Strogatz model starts with nodes distributed evenly on a circle (Question: is it important that the nodes be on a circle? Could other geometric configurations also work?). For some positive integer parameter k, each node is connected to k nearest neighbors on either side. This yields a network with high clustering coefficient (Why?), but also high diameter and average path length. Then, with some probability p each original edge e is “rewired.” Note that in the initial network, each edge connects pairs of nodes that are relatively close to each other on the circle. The process of “rewiring” deletes an edge and replaces it with one having randomly chosen endpoints. When p is small only a few edges are “rewired” and when p is close to 1, most edges are “rewired.” The main point of the Watts-Strogatz paper is that using a very small p does not disturb the clustering coefficient very much and the clustering coefficient continues to stay high. However, rewiring even a small number of edges causes the diameter and average path length to dramatically fall. Thus by using a small p, we have constructed a random network with high clustering coefficient and small average path length.

Homework: Due Oct 16th.

1. Here is a “demo” of the Watts-Strogatz model from Netlogo. Play with this and answer the following questions (bring your answers to class).

  • Set the number of nodes to the maximum possible (i.e., 300). What is the clustering coefficient and average-path-length for the network with 300 nodes before you do any rewiring?
  • Using what you know about the definition of clustering coefficient, explain the value of the clustering coefficient you see.
  • Try “rewiring” the network with probabilities p = 0.05 and 0.1. Report the clustering coefficient and average-path-length for these rewiring probabilities.

2. In 1-2 paragraphs, explain whether you think the Watts-Strogatz model is appropriate for real-world networks. In what ways does this model capture key aspects of how networks form and in what ways does it fail to do so? Do  you think the Watts-Strogatz model is appropriate for the network consisting of you and your “friends” on Facebook?

Clustering in Networks

Clustering in networks is a phenomenon that we all understand intuitively. Paraphrasing Barabási (Page 46), two of your friends are much more likely to know each other than a “gondolier from Venice and an Eskimo fisherman.” In other words, it is reasonable to expect that your friends would “cluster” into a few tightly connected groups.

This notion of clustering is made precise using the network measure called clustering coefficient. Roughly speaking the clustering coefficient of a node v is the fraction of pairs of friends of v that are friends of each other. The clustering coefficient of a network is simply the average of the clustering coefficients of the nodes in the network. Here are slides in which I define the notion of clustering coefficient of a node as well as clustering coefficient of a network.

Once we start focusing on clustering in social networks, it is easy to see the inadequacy of Erdös-Rényi random graphs as models of social networks. Going back to the earlier example, in an Erdös-Rényi random graph the gondolier from Venice and the Eskimo fisherman have the same probability of being friends as two of my friends. This problem with the Erdös-Rényi model was pointed out in a widely cited paper titled “Collective dynamics of `small-world’ networks” by Watts and Strogatz (Nature, Vol 393, June 1998). The paper considers three empirical examples of small-world networks: (i) Films actors network, (ii) Power grid network, and (iii) C.elegans network and shows that in each of these networks the clustering coefficient is several orders of magnitude larger than the clustering coefficient of the corresponding (i.e., with the same size and expected number of edges) Erdös-Rényi graphs. Also, worth noting is that Erdös-Rényi graphs do get the average path length right, it is just that they significantly underestimate the clustering coefficient.

Homework for next class (9/25): The main contribution of the Watts-Strogatz 1998 Nature paper is a simple random graph model, which we will call the Watts-Strogatz model, that has both the “small world” property and the high clustering property. Your homework for next week is to understand this model by:

  • reading its description in the Watts-Strogatz Nature paper; see Figure 1 and its caption,
  • reading Chapter 4 on “Small Worlds” from the Barabási book. Section 4 in this chapter provides a somewhat informal description of the Watts-Strogatz model.

An additional resource is this article at Scholarpedia.

Degrees of Separation

The focus of this class was a discussion of Stanley Milgram’s Experiment from  1967 that is widely credited for suggesting that the “social distance” between individuals is quite small. In other words, the “degree of separation” between individuals is small. There are many problems with the experiment, but the experiment has been repeated a number of times with different parameters and in other settings and has produced similar results. For a recent example, look at this news item claiming that the the degree of separation in Facebook is 4.74! In fact, if you want to participate in a current “small-world experiment”, you might want to enroll in Yahoo!’s small-world experiment.

These ideas depend on the notion of “distance” in a graph. Here are slides in which I define distance, diameter, and average path length in graphs. We spent a few minutes in class calculating these measures for a small, example graph. My hope is that this will help you better understand some of your readings on “small world” networks.

Homework for next class (9/19): Read “The Third Link” on “Six Degrees of Separation” from the textbook. Be prepared to discuss your reading.

Giant Component Emergence in Erdös-Rényi graphs

I was hoping to discuss the notion of “small-world” networks, but we ended up spending most our time  discussing a couple of sentences from the “The Random Universe” (Chapter 2) in Barabási’s book. The sentences in question (bottom of Page 17) are:

But when you add enough links such that each node has an average of one link, a miracle happens: A unique giant cluster emerges.

We tried an activity in class to make this idea concrete. Notice that for an n-node Erdös-Rényi graph to have expected degree 1, p would have to be 1/(n-1). I asked each of you to build a 9-node Erdös-Rényi graph with p = 1/8. To simulate the random process of adding edges, you would need an 8-sided die and since I did not have any handy, I gave each of you a pre-generated sequence of 36 rolls of an 8-sided die. Why 36? This is because a 9-node graph has 9*8/2 = 36 distinct node-pairs. Here are the sequences I passed around and here is the Python program that I used to generate these sequences.

After completing this activity, students answered three questions:

  1. Is your graph connected?
  2. How many connected components does it have?
  3. What are the sizes of the connected components?

I polled a few (4) students and here are their answers:

  • No; 3; 6, 2, 1
  • No; 3; 7, 1, 1
  • No; 4; 5, 2, 1, 1
  • No; 4; 5, 2, 1, 1

Even with small values of n we start getting a glimpse of the “miracle” that Barabási is alluding to. In fact, with p fixed at 1/(n-1), as n grows, the “miracle” becomes more clearly visible as the graph essentially consists of one “giant” connected component along with a small number of tiny components.

Homework (due: 9/11)
Netlogo is a library of models from a variety of disciplines. I noticed that they had a model called Giant Component that turned out to be just what we were discussing in class this past Tuesday. Your homework for next Tuesday (9/11) is basically to play with the Giant Component model and report on your exploration.

Here is how you get started:

  • Follow the Giant Component link and then click on “Run Giant Component in your browser.”
  • Adjust the slider on “num-nodes” to a number you want
  • Click on “setup” and then click on “go” or “go once”
  • You can click on “setup” to restart the process

Your machine needs Java for the model to work in your browser. If you don’t have it, you can download Java here.

Play with the model until you understand all the controls and the answer the following questions:

  1. For number of nodes equal to (about) 250, what is the size of the giant component when the connections per node is (about) 1?
  2. For number of nodes equal to (about) 250, what does the number of connections per node have to become for the giant component to get to 90% of the network?

Since these are random experiments, the answers will vary from attempt to attempt. To make sure that your answers are “robust,” do the experiment 10 times and report the means and medians.

Erdös-Rényi Graphs

We started with a discussion of the early Christian Paul and “Mafia Boy” and the supposed connection between how Paul helped spread Christianity and how “Mafia Boy” launched a denial-of-service attack on Yahoo! in 2000. Barabasi uses the phrase “masters of the network” to describe both Paul and “Mafia Boy.” While this is an evocative phrase, I must say that I am skeptical of there being more than a superficial connection between the early spread of Christianity and this attack on Yahoo!’s servers. It is easy to imagine that there were external factors that had nothing to do with Paul’s “mastery” of his networks that had a bigger impact on the early spread of Christianity.

Our discussion then shifted to the collaboration of two Hungarian mathematicians, Paul Erdös and Alfréd Rényi, who studied mathematical properties of random graphs obtained by a simple random process in which pairs of nodes are connected by edges independently and with a fixed probability p. In our first class we created an Erdös-Rényi graph with nodes being the 17 students in class and pairs of students connected by edges (“hand shakes”) with probability p = 1/2. We looked at the degree distribution of this network and chatted about how this graph, and Erdös-Rényi graphs in general, are unlikely to contain nodes whose degrees are too high or too low, relative to the expected degree. This seems to make Erdös-Rényi graphs somewhat different from “real-world” networks in which it is common to have lots of nodes with very small degree, relative to the expected degree.

Slides from the class.

Homework for next class (9/4): Read Chapter 2 (“The Random Universe”) from the textbook.

First Class

After getting to know each other a little bit, we chatted about what a “network” means, at least in the context of the seminar. A couple of you are in the seminar hoping to learn more about computer networks. At least one of you is interested in the seminar because I mentioned the twitter network in the course description. All of these are valid reasons to be part of this group and to emphasize the rather broad scope of the seminar I showed 4 rather different examples of networks.

Then everyone got into the activity of creating a random student network for our class.There were 17 students in attendance and for each pair of students, one of the students in the pair tossed a coin to determine if the pair would shake hands. This resulted in plenty of hand shaking, which is of course perfect for the first day of class and it also resulted in a random graph that we will discuss and analyze in our next class.

Homework for next class (8/25):

  1. Get a copy of the course textbook by Barabasi and read “The First Link” (Chapter 1).
  2. Chapter 1 briefly describes the case of “Mafiaboy.” Read more about this case in this 2001 article in PC World.

Come prepared to discuss these readings in class. See you on Tuesday!