Monday, December 9, 2013

Verification of the Zipf's Law for Cities in Iran

Zipf's law establishes a simple relationship between the size/population of N samples and their frequency ranking. The original study by Zipf in 1935* proposed that the frequency of any word in a natural language is inversely proportional to its rank in the frequency table. Before Zipf, others including Auerbach (1913)** proposed a similar law that the size distribution of cities in a country can be approximated by a Pareto distribution meaning that the size of cities is inversely related to their ranking. In other words, if you list cities of a country and rank them by their population, the population of each city would be inversely related to its ranking. If you're interested to learn more, see the following articles:
Jiang B. and Jia T (2011). Zipf’s Law for All the Natural Cities in the United States: A Geospatial Perspective. International Journal of Geographical Information Science, Volume 25, Issue 8, pp. 1269-1281.
Cristelli M, Batty M, Pietronero L (2012). There is more than a power law in Zipf. Nature, Scientific reports 2, pp. 1-7.
I decided to verify whether the Zipf's law holds for cities in Iran (my birth country). Following is a bar chart showing the first 20 cities in Iran sorted from the largest to the smallest, based on population data from 2006. Obviously, Tehran has the largest population with near 8 million followed by Mashhad (my hometown), Isfahan, Tabriz, Karaj, and Shiraz.

Now let's plot the log (population) against the log (ranking). In fact, results imply that the Zipf's law holds (approximately) for these cities (R-squared = 0.9752). Therefore one could predict the population of a city based on its ranking in a country or vice versa. Note that doing a simple regression here to get the coefficients of the Zipf's law is not exactly correct. More correct methods exist in the literature for estimating the Zipf's coefficients which I do not discuss in this post. The performed regression gives a reasonable approximation in my opinion. The underlying mechanism of the Zipf's law is not yet fully understood specially in the context of cities. It would be interesting to see how the following graph has evolved over time when cities shift in ranking and with increase/decline of population. Does the Zipf's law holds true for other self-formed communities (e.g. at the neighborhood level)? And most importantly, why do we see what we see here? Honestly, I am a little skeptical about the Zipf's law and its application in predicting cities population. I think there is something there that we're missing. The recent paper published in Nature by Cristelli et al. (listed above) sheds some light into it.
* George K. Zipf (1935) The Psychobiology of Language. Houghton-Mifflin.
** Auerbach F. (1913) Das gesetz der bevolkerungskoncentration (The Law of Population Concentration). Petermanns Geographische Mitteilungen, 59, pp. 74–76.

No comments:

Post a Comment