We started discussing work by Nicholas Christakis and James Fowler that first appeared in the New England Journal of Medicine in July 2007. The title of this paper is “The Spread of Obesity in a Large Social Network Over 32 Years.” Christakis and Fowler have followed this up with a series of papers claiming that other aspects of our human condition: happiness, loneliness, decrease in addictive behaviors (e.g., smoking), divorce, etc. can all be shown to spread through networks of friends. Christakis and Fowler have received lots of press and recognition for this work. Some examples I am familiar with are:
- This TED talk by Christakis titled “The hidden influence of social networks.”
- This article in the “Wired” magazine.
- This interview by Stephen Colbert!
There has been some push back against the main conclusions that Christakis and Fowler derive from their data and analysis. For example, statisticians have argued that it may be impossible to distinguish between an effect that has spread over the edges of the network from the possibility that individuals may have formed relationships (edges) with other individuals having similar habits or tastes. This latter tendency is called homophily by social scientists and what some statisticians are arguing is that what Christakis and Fowler are seeing is their data might just be homophily. For example, international students arriving at the University of Iowa tend to associate with other international students. We definitely don’t think of being “international” as a contagion that spread on a network! A couple of somewhat technical articles by statisticians and a mathematician criticizing the Christakis-Fowler methods and conclusions are:
- “Homophily and Contagion Are Generically Confounded in Observational Social Network Studies” by Shalizi and Thomas. See this blog post by Shalizi for an informal description of the paper.
- “The Spread of Evidence-Poor Medicine via Flawed Social-Network Analysis” by Russell Lyons.
As you can tell from the snarky title of the latter article, Prof. Lyons is quite upset by what he claims is flawed analysis. This push back has been picked up by the popular press as well. Here are some examples:
- “Catching Obesity From Friends May Not Be So Easy,” a 2011 article in NYT.
- “Disconnected?” a 2011 article in Slate.
Christakis and Fowler have since responded to these criticisms in a paper titled: “Social Contagion Theory: Examining Dynamic Social Networks and Human Behavior,” which you can find under publications at Fowler’s webpage. I suspect this is not the last we will hear of this fascinating debate.
In class on Oct 23rd, two students were brave enough to show off their “Facebook report” from Wolfram Alpha. Thank you! In both cases, I was struck by how highly clustered the set of friends was. In other words, a large fraction of the friends were also friends of each other, implying a very high clustering coefficient for each of the two students.
After the “Facebook report” activity, I showed a demonstration of the preferential attachment model (also called the Barábasi-Albert model) in Netlogo. Here is a pretty clear description of how the model works (copied from Netlogo’s page on preferential attachment):
The model starts with two nodes connected by an edge. At each step, a new node is added. A new node picks an existing node to connect to randomly, but with some bias. More specifically, a node’s chance of being selected is directly proportional to the number of connections it already has, or its “degree.” This is the mechanism which is called “preferential attachment.”
Homework for Tuesday, Oct 30: Read “The Seventh Link” on “Rich Get Richer.” Come prepared to discuss this reading in class. Also, play with the Netlogo demonstration of the preferential attachment model and answer the following questions:
- Suppose you use the model to build a graph with 1000 nodes. What is the average degree of a node in this graph?
- What is the frequency of nodes of degree 1, degree 2, degree 3, and degree 4?
We discussed how some distributions (e.g., human heights, scores on standardized tests, etc.) have the “bell curve” shape (normal or Gaussian distribution) whereas others (e.g., populations of cities, sizes of earthquakes, etc.) have a very different, heavy tailed shape. In such distributions there are lots of items with tiny values and a few items with enormous values. The “heavy” (sometimes called “fat” or “long”) tail of the distribution refers to the fact that a small, but not insignificant number of items in the distribution take on extremely large values relative to the mean. See these slides for more details.
The reason we are interested in shapes of distributions is that real-world networks seem to have a heavy-tailed degree distribution. In other words, real-world networks seem to contain lots of nodes with very small degree, but also a small number of nodes (aka “hubs”) with very high degree. The Computational Epidemiology research group at Iowa has built a “contact networks” for health-care workers at the University of Iowa Hospitals and Clinics (UIHC) and these networks show a clear heavy-tailed degree distribution. See these slides for a description of how these networks were built and for plots that clear show the heavy-tailed shape of the degree distributions of these networks.
The fact that many real-world networks have heavy-tailed degree distributions means that random graph models for generating real-world networks need to able to generate nodes with these degree distributions. This is the main motivation for the preferential attachment models proposed by Barabasi and Albert. More on this next week.
Homework (Due by e-mail on Friday, Oct 26th): Wolfram Alpha is a “computational engine” on the web that I use a fair bit. If you have not used it you should spend some time playing with it! You can type “weather in Iowa” or “human height distribution” or “Solve x^2 + 9x = 12” or anything else that strikes your fancy and see what output is produced. You should also try clicking on “Examples” to read more about all the kinds the queries Wolfram Alpha is designed to answer.
Recently Wolfram Alpha added capabilities that allow it to do “personal analytics” using your Facebook data. See this blog post by Stephen Wolfram and this web page for more details. In Part I of the homework I would like you to use Wolfram Alpha to generate a “Facebook report” for yourself and take a close look at this report before class on Tuesday, Oct 23rd. In class, I would like to discuss your findings and I will also ask for volunteers to show their “Facebook report” in class (using my laptop). If you are uncomfortable doing this you can opt out by sending an e-mail before class. In Part II of the homework I want you to write a report (max: 2 typed pages) for the class on what you learned about your Facebook friends and the structure of your Facebook friends network from this report. Describe any aspects of the report you found surprising. Feel free to include plots/graphs produced by Wolfram Alpha in your report.
This is a much delayed post on our meeting from September 25th.
We examined the Watts-Strogatz random graph model in detail, based on its description in this paper. The Watts-Strogatz model starts with nodes distributed evenly on a circle (Question: is it important that the nodes be on a circle? Could other geometric configurations also work?). For some positive integer parameter k, each node is connected to k nearest neighbors on either side. This yields a network with high clustering coefficient (Why?), but also high diameter and average path length. Then, with some probability p each original edge e is “rewired.” Note that in the initial network, each edge connects pairs of nodes that are relatively close to each other on the circle. The process of “rewiring” deletes an edge and replaces it with one having randomly chosen endpoints. When p is small only a few edges are “rewired” and when p is close to 1, most edges are “rewired.” The main point of the Watts-Strogatz paper is that using a very small p does not disturb the clustering coefficient very much and the clustering coefficient continues to stay high. However, rewiring even a small number of edges causes the diameter and average path length to dramatically fall. Thus by using a small p, we have constructed a random network with high clustering coefficient and small average path length.
Homework: Due Oct 16th.
- Set the number of nodes to the maximum possible (i.e., 300). What is the clustering coefficient and average-path-length for the network with 300 nodes before you do any rewiring?
- Using what you know about the definition of clustering coefficient, explain the value of the clustering coefficient you see.
- Try “rewiring” the network with probabilities p = 0.05 and 0.1. Report the clustering coefficient and average-path-length for these rewiring probabilities.
2. In 1-2 paragraphs, explain whether you think the Watts-Strogatz model is appropriate for real-world networks. In what ways does this model capture key aspects of how networks form and in what ways does it fail to do so? Do you think the Watts-Strogatz model is appropriate for the network consisting of you and your “friends” on Facebook?
Clustering in networks is a phenomenon that we all understand intuitively. Paraphrasing Barabási (Page 46), two of your friends are much more likely to know each other than a “gondolier from Venice and an Eskimo fisherman.” In other words, it is reasonable to expect that your friends would “cluster” into a few tightly connected groups.
This notion of clustering is made precise using the network measure called clustering coefficient. Roughly speaking the clustering coefficient of a node v is the fraction of pairs of friends of v that are friends of each other. The clustering coefficient of a network is simply the average of the clustering coefficients of the nodes in the network. Here are slides in which I define the notion of clustering coefficient of a node as well as clustering coefficient of a network.
Once we start focusing on clustering in social networks, it is easy to see the inadequacy of Erdös-Rényi random graphs as models of social networks. Going back to the earlier example, in an Erdös-Rényi random graph the gondolier from Venice and the Eskimo fisherman have the same probability of being friends as two of my friends. This problem with the Erdös-Rényi model was pointed out in a widely cited paper titled “Collective dynamics of `small-world’ networks” by Watts and Strogatz (Nature, Vol 393, June 1998). The paper considers three empirical examples of small-world networks: (i) Films actors network, (ii) Power grid network, and (iii) C.elegans network and shows that in each of these networks the clustering coefficient is several orders of magnitude larger than the clustering coefficient of the corresponding (i.e., with the same size and expected number of edges) Erdös-Rényi graphs. Also, worth noting is that Erdös-Rényi graphs do get the average path length right, it is just that they significantly underestimate the clustering coefficient.
Homework for next class (9/25): The main contribution of the Watts-Strogatz 1998 Nature paper is a simple random graph model, which we will call the Watts-Strogatz model, that has both the “small world” property and the high clustering property. Your homework for next week is to understand this model by:
- reading its description in the Watts-Strogatz Nature paper; see Figure 1 and its caption,
- reading Chapter 4 on “Small Worlds” from the Barabási book. Section 4 in this chapter provides a somewhat informal description of the Watts-Strogatz model.
An additional resource is this article at Scholarpedia.
The focus of this class was a discussion of Stanley Milgram’s Experiment from 1967 that is widely credited for suggesting that the “social distance” between individuals is quite small. In other words, the “degree of separation” between individuals is small. There are many problems with the experiment, but the experiment has been repeated a number of times with different parameters and in other settings and has produced similar results. For a recent example, look at this news item claiming that the the degree of separation in Facebook is 4.74! In fact, if you want to participate in a current “small-world experiment”, you might want to enroll in Yahoo!’s small-world experiment.
These ideas depend on the notion of “distance” in a graph. Here are slides in which I define distance, diameter, and average path length in graphs. We spent a few minutes in class calculating these measures for a small, example graph. My hope is that this will help you better understand some of your readings on “small world” networks.
Homework for next class (9/19): Read “The Third Link” on “Six Degrees of Separation” from the textbook. Be prepared to discuss your reading.