wikipedia logo

I came up with these six conjectures about wikipedia one night. It kept me up. I’ll admit that some of them are somewhat opaque to a casual reader. I might write some commentary later.

By the way – I have no idea if any of them are true.

Conjecture 1. That the distance between any two wikipedia pages, randomly chosen, as measured by wikilinks , is on average 6. Conjecture 2. That wikipedia is sufficiently formal and complete that you could build a useful general purpose AI knowledge base using it. Conjecture 3. That wikipedia has low information entropy . Conjecture 4. That the development of a wikipedia article over time occurs in a manner consistent to the biological evolution of a species . Conjecture 5. That the relationship between the amount of material in wikipedia and the number of article views is exponential . Conjecture 6. That wikipedia is, on average, factually accurate .

Motivational questions:

  1. Social networks conform to the "six degrees of separation" principle. If wikipedia does, what does that say about its social roots / the way it's constructed?
  2. See Cyc and others. Is there enough formally coded information in wikipedia? What about the semantic relationship between the source sentence containing a link and the summary of the linked article?
  3. What does "low entropy" mean anyway? More structured? Simpler? More redundant? More readable? What about the entropy across wikilinks?
  4. Does an article behave like DNA?
  5. Can we "prove" Reed's law? How do you measure the size of the content?
  6. Is it accurate more than average? Can you predict the accuracy of an article?