Freebasing The Web… Or Just Making A Tired Idea Sound Fresh Again?

from the trying-to-figure-this-out dept

It seems like the buzz of the day is Danny Hillis’ new company, Metaweb Technologies, which is releasing a product called Freebase. The description is basically a mix between Wikipedia and the Open Directory Project, or perhaps it’s simply Wikipedia-with-metadata. It’s yet another shot at Tim Berners-Lee’s vision of the semantic web, where there’s metadata about data, making it easier for computers to actually make use of the data. Of course, many people have pointed out that there are tremendous problems with the idea of a semantic web — the biggest one being how do you actually get all that metadata connected to the data in the first place. From what’s being discussed so far, it sounds like Metaweb is simply hoping that everyone will come in and do it for them, in the same spirit as Wikipedia (and, in fact, they’ve already sucked in much of Wikipedia’s data to start with). In some ways, this project also reminds me of Cycorp, the big attempt to feed a computer all sorts of information while hoping that artificial intelligence would emerge. If you can make it into a fun game, people will do all sorts of things for you — but there doesn’t appear to be much of a game designed here.

Another concern is that this is simply creating yet another data silo. While they do appear to have made it open so that others can make use of the data, you still have to put the data into Freebase in the first place. However, perhaps the biggest problem with this concept is the very idea that you can accurately explain data with metadata. While the examples being given aren’t too complicated (defining the name of a company as being a company, having an address be an address) that can get very complicated very fast — and forcing metadata structures on existing data can often confuse things by forcing categories on things where they don’t quite apply. Either that, or you get so much metadata that it’s effectively useless. So, consider us skeptical, but intrigued. There are some very smart people working on this (and others whose opinions we trust seem impressed and awed by the project), but so far, it’s just not clear how all the metadata will keep getting classified and how useful it will be once that’s set.

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Freebasing The Web… Or Just Making A Tired Idea Sound Fresh Again?”

Subscribe: RSS Leave a comment
|333173|3|_||3 says:

What might be more usefull would be to produce a sort of programming language for text, so that each word has a specific meaning and part of speech, and a fixed, rigid structure enables the subject and object of verbs to be identified, and logical sntences to be built. A compiler would then be needed to produce text output in your language of choice, which could be produed as a FireFox extension and LaTeX package. Not only would this make life simpler for international web 2.0 prjects like Wikipedia (removing both the issues of “the German version says X, the English version says Y, which is correct?” and the argument over Commonwealth and US spellings in English pages), but it would also make it easier for AI projects to understand the page, and for search engines to get more relevant results.
I do appreciate that there would be considerable difficulties with this, such as the need for huge dictionaries for each language, and the fact that it would be hard to take into account changes in the way language is used. It would also be hard to produce good sentances, although it would probably be better than most machine translations now. Other problems would be finding a source of fixed definitions (perhaops the OED, wit numbers for each of the subsidiary definitions could be used for standard words, and a scientific dictionary for technical terms). The structure would also have to be less forgiving of mistakes than English, meaning that whiule at teh moment if someone makes a mess of a sentence on Wikipedia, you can try to figure out what is going on, whereas the translator would just have to escape the entire sentence and skip to the start of the next. The largest problem, and probably the killer, is that it would be hard for people to learn it (think of esperanto, which no-one uses) and even harder to keep within its structures (think of the illiterate crap that gets posted here every day – yes, I know that I and the other |333173|3|_||3 [there is at least one other person using my name] post).
FInally, there is the issue of evolution. As new words are needed, someone is going to have to create them, which means that you could end up with the situation in France where they have the language officially defined. Preferably the creators of the project should define the standards, until it becomes large enough for ISO to take over.

Brad says:


Why the hell is this thing called “Freebase”? Yeah, there’s a small “free” and “base” association with the product…but really?

That’d be like calling my new mass-consumer product “Crackpipe” because it contains a data security tool.

Drug references are all well and good…until you actually try to sell them.

Add Your Comment

Your email address will not be published.

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...