wikipedia is becoming the source for answers

From John Battelle’s Searchblog: Wikipedia and Search

A nice piece penned by Max Kalehoff.

A ranking of all Web sites based on the total volume of traffic received directly from search engines placed Wikipedia at 146 in June 2004. But in September 2004 it jumped in the ranking to 93; 71 in December 2004; and in March 2005, it was the 33rd most popular site in terms of visits received from search engines.

That means Wikipedia is impacting not only the trivial results of our Internet searches, but increasingly what content we consume and the types of answers we find to larger questions. This is a profound statement for anyone competing in the marketplace for attention to content and ideas.

Interesting. Wikis took off like weeds at work last year. The thing that’s been most valuable about the team wiki is that’s been a quick place to host a page that you need to share and possibly collaborate on. We *didn’t* get a a beautifully tended garden like wikipedia. I think it helps that wikipedia its modelled on an encyclopedia — the structure is simple and clear and there are clear ground rules for what an entry should look like.

Due Dilligence: State of Machine Translation

Over at Due Dilligence, they’ve posted a nice an overview of the current state of machine translation. Not surprisingly, it doesn’t seem to have improved much since my cognitive science days in the early 90s. As with other problems in AI, our effortless use of language makes it seem like an easy problem; whereas the truth is that we’re dealing with a problem that our brain is highly specialized to solve. In fact, that problem our brain is specialized for isn’t the obvious problem (ie communicating information) but something more subtle (e.g. reading the goals and intentions of others from their actions, being able to predict how our actions will affect the thoughts and actions of others, etc).

In chess, they’ve been able to match and even beat the best human players — not by solving the problem the way people would solve it, but by using brute force combined with specialized optimizations in various domains to trim the scope of the brute force problem down. It sounds like MT is headed in a similar direction. I have less hope this approach will succeed for MT as it has for chess. The domain of language is much more subtle and complex than chess.

grid sweet spot?

Grid computing seems to be climbing the hype curve. I can’t claim to be an expert on it, but I have to say it has the smell of improbablility.

Distributed computing architectures are in most cases more art than science. While some problems are easily decomposed into tiny pieces that can be solved in parallel, others are clearly not. Also, often the critical bottle necks are communication, not computation — you spend your time finding the right ways to package up the problem and connect the pieces to avoid moving data around unnecessarily.

You can certainly develop algorithms for distributing the computation that adaptively address these problems; however, any time you ask a computer to do something that has a degree of “art” to it, you have to accept a certain amount of slop.

Grid computing’s sweet spot seems to lie on an improbable location in the terrain — a place where computers are too cheap to be managed directly by people but too expensive to be left idle. The improbability of this location is increased by the as-yet steady march of Moore’s law and the inevitable friction you’d have to accept in the system.

When you add to this the challenges of trying to develop, debug, and manage such a grid, I think you are forced to conclude grid is likely to see more theory than practice.

I think the reason its so popular is that it is technically cool and it appeals to certain ideals widely held by engineers (e.g. efficient use of resources).

microsoft envy

Always On interviewed Bill Gates this week. They talked about a number of things (linux, security, stock vs options). I thought this was the most interesting quote:

Gates: … I think that jealousy has driven my competitors to more mistakes than any other factor I can name.

So true.

Coveting Microsoft’s position on the desktop has a led to a host of bad decisions. They all scream bloody murder when they see Microsoft (ab)using its power, all the while they’re running around trying to copy the strategies and tactics that got Microsoft where it is. And waste millions on futile attempts to take the desktop from Microsoft.

For example, you may recall that Java, when it was originally pitched, was all about applets — the idea was that your desktop software would be delivered to you over the network, in a form that could run on any machine. This was Sun’s attempt to take the desktop. How this would have benefitted them is completely unclear. They got lucky that the “write once, run anywhere” resonated with enterprise developers, who quickly co-opted the technolgy for servlets, so they could stop porting their software from platform to platform. The only way this currently benefits Sun, as far as I can tell, is that they get some license revenue via their certification and trademarking programs. It certainly doesn’t seem to have benefited much from the generation of Enterprise software that can run as easily on Intel or HP as it does on Sun.

Apple pursued its attempts to compete with Microsoft much longer than they should have trying to regain the desktop they once held. They tried beat microsoft at it’s game by producing a better operating system and application suite long after the market had tipped in Microsoft’s favor. They even went as far as starting to commoditize their hardware (remember those Mac compatibles that were available for a year or two?).

From my point of view, this is the fundamental thing Steve Jobs did after his return — convince Apple it didn’t need to be Microsoft; it could be great without beating Microsoft. By doing so, he’s been able to get Apple on to sound strategic footing (control the hardware to reduce the amount of hardware supported; move the OS to one better able to harness open source efforts) and steer it into niche markets where it was uniquely positioned to compete. Apple may never be as big as Microsoft, but it will continue to exist and may even thrive.

More than I will say for Sun, who’s big annoucnement this week was the Java Desktop.

shirky debunks the semantic web

In The Semantic Web, Syllogism, and Worldview, Clay Shirky debunks the notion of semantic web — the short of it is the problem of semantics is hard; the trivial use cases that have been proposed to date don’t scale and aren’t that useful any way. Spend a couple of months studying the history of knowledge representation (from artificial intelligence, back through math and philosophy) and you’ll find that this is problem with a long history and little progess.

One thing in Shirky’s article that’s not quite right: He gives an example of how syllogisms fail that’s not quite right:

Consider the following assertions:
- Count Dracula is a Vampire
- Count Dracula lives in Transylvania
- Transylvania is a region of Romania
- Vampires are not real

You can draw only one non-clashing conclusion from such a set of assertions — Romania isn’t real.

You wouldn’t conclude Romania isn’t real unless all the predicates had been “is a”. This does, however, highlight another problem of semantics: how do you come to a shared, complete, and consistent set of predicates well defined inferential properties.