Scale Effects – What We Can Learn From National Football Teams (by Stephen Lewis)

What determines the quality of a national football team? Other things being equal, we would expect countries with a large population to produce stronger teams than those with a smaller population. They have more people to select from. It is therefore quite intuitive that football team quality must, to at least some extent, be positively impacted by population size.

This intuition seems to be borne out if we consider pairs of countries that have markedly different population sizes but are similar along other relevant dimensions. For example, take Italy and San Marino. Italy has a population of 60 million, while San Marino has a population of less than 50,000. The countries are otherwise (broadly) similar with respect to other factors that might determine football team quality, such as length of football tradition, the cultural significance of football, the relative popularity of alternative sports, climate, etc. Italy last played San Marino in 2017 and won 8-0 (having won all previous encounters on record). Results like this certainly cast doubt on any claim that there is no link between population and football team quality. There may even be a “minimum efficient scale” below which a national football team cannot credibly compete with leading football nations (and perhaps San Marino is below that scale).

But the question is how strong is the link between population size and football team quality and how small is any minimum efficient scale? Answer: surprisingly weak and surprisingly small. This is obvious from a cursory review of the international football landscape. The two most populous countries on the planet, China and India, have qualified for one world cup between them (China in 2002). Meanwhile, Croatia has achieved an all time FIFA ranking high of 3^rd (in 1999) and reached the World Cup final in 2018. Croatia’s population is 4 million – smaller than the United Arab Emirates (10 million), which recently beat India 6-0.

Even ignoring the high leverage outliers of India and China and considering clusters of countries in relatively close geographic proximity where football has a similar level of cultural significance, the effect of population on performance seems remarkably weak above a certain size. Uruguay (population: 3.5 million, FIFA ranking 9), is a match for much larger Argentina (population 45 million, FIFA ranking 8), which in turn is a match for much larger Brazil (population 220 million, FIFA ranking 3). Similarly, Belgium (population 12 million, FIFA ranking 1) is evenly matched with France (population 65 million, FIFA ranking 2). Indeed, today’s top 10 ranked teams include four countries with populations under 12 million (Belgium, Portugal, Uruguay and Denmark), while Germany (population 84 million) for the time being languishes in position 12.

And even amongst those countries with a very low population there are some standout national football teams, suggesting that if there is a minimum efficient scale, it may be very small indeed. With a population of around 300,000, Iceland knocked England (population 55 million) out of Euro 2016, and reached an impressive FIFA ranking of 18 in 2018.

Quantitative studies support the view that population has weak explanatory power for football team quality.

A 2010 PWC study performed a statistical analysis in which total World Cup points were regressed against population, average income levels and a count variable based on the number of times a country has hosted the competition (with values 0, 1 or 2). This included only 52 countries that have played at least 5 World Cup finals matches (so excluded China and India). Even among this football-playing-country sample, population is insignificant once these other variables are included.

Gelade (2007) finds that the relationship between FIFA ratings and (linear) total population is “vanishingly small”, finding in a sample of 204 countries that only 1% of variation in FIFA Ratings is explained by total population, and notes that this counterintuitive finding has also been reported by other studies.

The discussion above has focused on the Men’s game but considering the relative performance of teams in Women’s football reinforces the idea that factors other than population size are important for explaining football team quality. For example, the US is ranked 1st in the Women’s FIFA ranking and 20th in the Men’s, whereas the comparative advantage arising from having a large total population to select from is equivalent for both the Men’s and Women’s teams.

Now imagine a strange parallel universe where the only two countries are Brazil and Australia. Brazil is 10 times bigger than Australia and consistently wins when they play football. In this parallel universe, researchers are tempted to conclude that the relationship between population and football team quality is very strong. Not only are there sound a priori grounds for believing a larger population should translate into better football team quality, but this seems to be borne out by the only two observations available. But this inference is not valid. Brazil and Australia differ along various dimensions that are critical determinants of football team quality, such as footballing tradition and competition for athletic talent from other sports (football is the national sport of Brazil but football in Australia has to compete with other ball sports such cricket, Aussie rules, rugby league and rugby union). Of course, this would be obvious in a world with hundreds of observations available; far less so in our parallel universe with two.

What has this got to do with online search engines?

I should start by making clear that I make no claim that the apparent weakness of population scale effects in national football has any bearing at all on the strength or otherwise of any scale effects affecting search engine quality. The lesson from the football analogy is that researchers could be fooled into thinking that they can see a strong scale effect if they compare a small number of subjects that differ in scale and quality and do not take account of other factors that also affect quality.

My claim is that when it comes to analysing the effect of scale on search quality, competition authorities have not got far beyond the following reasoning:

Query data is used to produce search results (people are used to produce football teams). More query data is better than less query data (more people to select from is better than fewer people to select from). Google has many times more queries than Bing (Brazil has many times more people than Australia). Google has much higher search quality than Bing (Brazil has a much better football team than Australia). Therefore, query scale is a crucial determinant of search quality (population is a crucial determinant of national football team quality).

Some competition authorities have gone deeper than others, for example, by examining query level datasets to gain a better understanding of differences in the range and volumes of the distinct queries each search engine sees. But a query level comparison of Google and Bing just confirms the obvious – Google has a scale advantage over Bing. This, entirely unsurprisingly, implies that for any given distinct query, Google is likely to receive higher query volumes than Bing. It follows that queries that are rare for Bing are not rare for Google, while the converse tends not to be true. But this just supports the existence of a scale advantage. It does not shed light on how this translates to quality and the relative importance of scale compared to other factors. This would be like a researcher going to some lengths to establishing that not only does England have a higher population than Iceland, but also that for every left-footed person who can run fast (and who would therefore on paper make a good left wing back) in Iceland, there are 100 such individuals in England, and that for every tall agile person (who would on paper make a good goalkeeper) in Iceland, there are 100 in England. This deeper assessment of the nature of the scale advantage should not be confused with an assessment of the explanatory power of scale for performance.

Yet the reasoning in italics above is clearly faulty.

Companies, much like countries, differ in their histories, cultures and priorities. Just as national football team quality may be better explained by length of football tradition, cultural factors and presence of competing sports than by population size, the quality of a company’s search engine may be better explained by length of time trying to make incremental improvements to search algorithms, the importance of experimentation and measurable improvement in a company’s culture, and the general strategic centrality of search to the company as a whole, which impacts among other things investment and hiring priorities.

These factors clearly cannot be assumed to be similar across Google and Microsoft. This means that the extent to which scale advantages drive quality requires some unpicking. But no competition authority to date has made a serious attempt to do this unpicking.

So why is Google better than Bing in a given national market for search, say, Belgium? Of course, data-scale could in principle be a factor that explains the difference in quality, and it could be an important factor. But there’s another plausible story: it is about how many engineering hours the company has poured into improving its search engine.

Google entered Belgium in March 2002, launching a localised version of its search engine with French and Dutch language capabilities. Bing entered Belgium in October 2013, over 11 years later. If search engine quality in Belgium is a function of how many Wednesday-morning-meetings search engineers have had to discuss improving search quality in Belgium, then Google might be better than Bing simply because its engineers have had about 600 more Wednesday-morning-meetings than Bing.

So there are competing theories as to why Google is better than Bing in Belgium – is it data or is it the number of Wednesday-morning-meetings? Both are consistent with a scale gap (under one theory the scale gap drives a quality difference and under the other it is caused by a quality difference). Analysis of the extent of the scale advantage, even when based on granular query level data, cannot distinguish between these two competing theories.

Indeed, trying to unpick which theory is more plausible (or how much weight to place on each) is an area where competition authorities have yet to really scratch the surface. They are still trying to make inferences on the importance of population for football team quality by comparing Brazil and Australia.

Written by Alfonso Lamadrid

21 June 2021 at 4:30 pm

Posted in Uncategorized

2 Responses

Subscribe to comments with RSS.

I love the football analogy, but when it comes to search engines (or almost any other machine learning algorithm), the size of data (number of inquiries and following answers) is a direct factor for the quality of the algorithm. The right analogy is number of training each football team recives. I agree it’s not the only factor, maybe not even the most important one, but the anaolgy to population size just does not hold. I enjoyed the read!

Amit Zac

21 June 2021 at 9:24 pm

Reply
Like competition, the disclaimer also seems to be “one click away”…

https://www.rbbecon.com/our-experts/stephen-lewis/

Antoine

22 June 2021 at 6:20 pm

Reply

	Becket McGrath on Opinion of AG Szpunar in Case…
	AG Opinion in FIFA v… on Opinion of AG Szpunar in Case…
	On DMA designation p… on Procedural Fairness and the DM…
	espinosafran on Agree or disagree, abuses…
	Pablo Ibanez Colomo on Agree or disagree, abuses…

Chillin'Competition