In July of 2005 S.N. Dorogovtseva, J.F.F. Mendesa and J.G. Oliveira published the results of an experiment in which they put every number from 0 to 100 000 into Google, noted the number of results obtained. The paper states this research was (amazingly to Sci7) partially supported by grant money. Citation: Physica A 360 (2006) 548–556

The findings are largely unsurprising, with the popularity of numbers generally decreasing with size, powers of ten being more common than their neighbours, and various “special” numbers being particularly common, groups of special were identified as:

  • Powers of 10
  • Multiples of 10 and 5
  • Easy to remember or symmetric numbers eg. 666 and 131313
  • Powers of 2
  • Numbers with strong associations eg. 666
  • Popular zip codes eg. 78701
  • Toll free telephone number prefixes eg. 866, 877
  • Important historical dates eg. 1812
  • Serial numbers of popular products 747, 8086
  • Initial parts of mathematical constants 314159

The data was collected in the second week of December 2004, the number “2004″ was present at a particularly high frequency (3,030,000,000 pages), with a rapid fall off in popularity of future years.

Sci7 has at the begining of the second week of December 2005 obtained a page count for the years 1990-2015:

1990 268,000,000
1991 213,000,000
1992 246,000,000
1993 239,000,000
1994 359,000,000
1995 588,000,000
1996 562,000,000
1997 566,000,000
1998 658,000,000
1999 795,000,000
2000 1,440,000,000
2001 1,250,000,000
2002 1,340,000,000
2003 1,610,000,000
2004 2,140,000,000
2005 6,680,000,000
2006 1,020,000,000
2007 157,000,000
2008 103,000,000
2009 52,600,000
2010 93,400,000
2011 24,500,000
2012 39,300,000
2013 16,200,000
2014 13,500,000
2015 29,100,000

The top result for all the years to-date is the Wikipedia article for the year, the results for the years to come varies and includes the official london2012 site for 2012 and 2015.com. The popularity of the current year and fall off in future years is also seen in the above table. It is interesting to note that the current page count for “2004″ in December 2005 is 0.7 of what is was in December 2004. This could be interpreted as suggesting the web, as reflected in Google’s index is being purged of outdated information, or could be a reflection of the number of sites which display the current year on them for various reasons. 1992 is slightly anomalous in that there are currently more pages on Google referring to it than 1993.

The accuracy of Google’s number of pages returned count is incredibly not discussed, and neither is Google’s progress as of 2004 towards its stated aim of making the entirety of the world’s information searchable, and the bias of the current incomplete index of all human knowledge which Google holds.

Sci7 is able to produce datasets such as those used for the research discussed here (and those with much greater complexity) from a wide variety of sources.

A free full text PDF of the original article is available:
http://arxiv.org/pdf/physics/0504185

Leave a Reply

Cambridge UK