the ethos of internet

at least it’s an ethos

This viddie is a rather boring demonstration of Wolfram Alpha. It does basically what it has claimed to be able to do: it can process data in a variety of domains, answer queries in natural language that pertain to the data, and present answers and other relevant or useful information in a human readable form. The internet has been hyping and/or cynically doubting Alpha for the last few weeks, and although looks like it works pretty well I don’t think it deserves either.

The fervor Alpha has generated is really due to a misunderstanding of what Alpha is. Alpha is a systematic attempt to formalize the ontologies of certain scientific domains in order to query that data for specific kinds of information. It is an attempt, Wolfram suggests, of making science computable. This is a big project, and certainly worthwhile (if just a little wide-eyed). But it is also something that Wolfram has been working on for decades, and it appears to be a legitimate attempt.

Alpha is not a foundation for a semantic web.

Look: the semantic web is going to happen one way or another. It is the looming peak in the distance, and someone will scale it, and I imagine it will happen fairly soon. But this is not it. I have lots of complaints about the vision here, but my biggest complaint is certainly this: Alpha requires expert humans to explicitly build ontologies and pour in the data. This works well in certain scientific domains, but its not the sort of thing you can lay on top of the internet to create SmartGoogle, which is what everyone expects from the semantic web.

Ontologies cannot be planned in advanced. Ontologies are not pure formal properties that bind together a domain through pure logical syntax. Ontology is not a priori.

This is probably a profound philosophical point, but come on, Google figured this out a decade ago. Ontologies have to be built organically, by paying attention to the behavior of the users of that ontology. This is the only way you can systematically and automatically learn the ontology of any new domain. The alternative Alpha takes is to attemp to cram concepts into ill-fitted man-made containers, but we know that there is no shelf, and so the Alpha model is doomed to obsolescence before it is even born. Until the process of ontology-building is both automated and democratized (ie, is in the hands of its users, like Google), it cannot hope to deal with the problem of systematizing knowledge.

Wolfram himself doesn’t seem to have a clue about how to do this, and really seemed troubled and confused by questions along these lines. Wolfram is interested in domains where the ontology is established and more or less stable, like mathematics. Thats fine and useful, but it simply wont scale to cover the internet as a repository of human knowledge.

I like the idea of “curating data” as a description of epistemology (learning, science), and I appreciate the epistemological challenges that Alpha overcomes. But “curation” is antithetical to the ethos of Internet. The internet has no galleries; it chokes on the idea that some human would select only choice aspects of it to display. The internet is akin to giving people access to the basements of museums, allowing them to go through shelves and drawers and inspect all its contents.

That doesn’t mean, as Wolfram is clearly concerned about, that you need to give away proprietary data for free, or at least not in bulk. But it does mean that you can’t let the act of ontology-building be kept in the hands of Wolfram’s own ‘experts’, because no matter how good and trustworthy those experts are, and how honest and faithful their efforts are, you can’t anticipate the needs, interests, and purposes that motivate the kinds of queries a SmartGoogle would receive.

Here’s a simple example. Lally asked me today, “What is the name of the senator on The Wire?”

Finding an answer was easy: I went to Wikipedia, and searched the wire characters, and found the political characters list, of which Clay Davis was the first entry. It took 5 seconds of work, and absolutely no cleverness on my part.

If you type the original question into Google with quotes, the search completely fails, which shows that this is a good example of the kind of problem a SmartSemanticSearch should be able to solve that is out of reach of our best DumbSearch models. Google is helpful enough to give the results for the same search without quotes, but a simple glance at the first page wont itself give you an answer; most of the results are about real-world senators in the news, although there are some articles on David Simon, the creator of The Wire. Ask.com, Yahoo, MSNLive don’t do any better. (If you search “… in The Wire” in Yahoo, you get the name of the actor on the front page but not the name of the character himself).

So what sort of ontology do you need to have in order to answer this question? Well you need to know about fictional characters, and that some of those characters play senators, and so on. You also need to know about when I am asking about a fictional world, which is also implied by the question but not entirely explicit. But its easy to see that you can get around these troubles in this specific case.

The problem isn’t that such knowledge can’t be built, its that you can’t expect to design and manage such knowledge from the top down and in the abstract, especially if you expect to stay even a bit from hard quantitative data. You need to let these ontologies grow from the bottom up, where the “bottom” is grounded in the use of the system by interested users. Thats why the semantic web requires tagging and other forms of participation; thats why Google and Wikipedia and Facebook and so on get better the more times we use it. Our machines adjust to our standards as we use them. Our machines learn, by playing with us. You need to be able to ask Alpha a question, and if it doesn’t know the answer you should be able to tell it, and show it how you found out, and it should use this behavioral data to better handle the next query. That’s exactly how you automate epistemology, and its exactly what Alpha doesn’t do.

Alpha’s learning process is conducted behind the scenes, in a computer lab by only a few hundred people. The Internet is the largest repository of human knowledge that has ever existed, and possibly that will ever exist. If you are at all disappointed with the outcome of this match, you have completely underestimated the game.

Submit a comment