Freebasing The Web… Or Just Making A Tired Idea Sound Fresh Again?

from the trying-to-figure-this-out dept

Fri, Mar 9th 2007 03:30pm - Mike Masnick

It seems like the buzz of the day is Danny Hillis’ new company, Metaweb Technologies, which is releasing a product called Freebase. The description is basically a mix between Wikipedia and the Open Directory Project, or perhaps it’s simply Wikipedia-with-metadata. It’s yet another shot at Tim Berners-Lee’s vision of the semantic web, where there’s metadata about data, making it easier for computers to actually make use of the data. Of course, many people have pointed out that there are tremendous problems with the idea of a semantic web — the biggest one being how do you actually get all that metadata connected to the data in the first place. From what’s being discussed so far, it sounds like Metaweb is simply hoping that everyone will come in and do it for them, in the same spirit as Wikipedia (and, in fact, they’ve already sucked in much of Wikipedia’s data to start with). In some ways, this project also reminds me of Cycorp, the big attempt to feed a computer all sorts of information while hoping that artificial intelligence would emerge. If you can make it into a fun game, people will do all sorts of things for you — but there doesn’t appear to be much of a game designed here.

Another concern is that this is simply creating yet another data silo. While they do appear to have made it open so that others can make use of the data, you still have to put the data into Freebase in the first place. However, perhaps the biggest problem with this concept is the very idea that you can accurately explain data with metadata. While the examples being given aren’t too complicated (defining the name of a company as being a company, having an address be an address) that can get very complicated very fast — and forcing metadata structures on existing data can often confuse things by forcing categories on things where they don’t quite apply. Either that, or you get so much metadata that it’s effectively useless. So, consider us skeptical, but intrigued. There are some very smart people working on this (and others whose opinions we trust seem impressed and awed by the project), but so far, it’s just not clear how all the metadata will keep getting classified and how useful it will be once that’s set.

Comments on “Freebasing The Web… Or Just Making A Tired Idea Sound Fresh Again?”

What might be more usefull would be to produce a sort of programming language for text, so that each word has a specific meaning and part of speech, and a fixed, rigid structure enables the subject and object of verbs to be identified, and logical sntences to be built. A compiler would then be needed to produce text output in your language of choice, which could be produed as a FireFox extension and LaTeX package. Not only would this make life simpler for international web 2.0 prjects like Wikipedia (removing both the issues of “the German version says X, the English version says Y, which is correct?” and the argument over Commonwealth and US spellings in English pages), but it would also make it easier for AI projects to understand the page, and for search engines to get more relevant results.
I do appreciate that there would be considerable difficulties with this, such as the need for huge dictionaries for each language, and the fact that it would be hard to take into account changes in the way language is used. It would also be hard to produce good sentances, although it would probably be better than most machine translations now. Other problems would be finding a source of fixed definitions (perhaops the OED, wit numbers for each of the subsidiary definitions could be used for standard words, and a scientific dictionary for technical terms). The structure would also have to be less forgiving of mistakes than English, meaning that whiule at teh moment if someone makes a mess of a sentence on Wikipedia, you can try to figure out what is going on, whereas the translator would just have to escape the entire sentence and skip to the start of the next. The largest problem, and probably the killer, is that it would be hard for people to learn it (think of esperanto, which no-one uses) and even harder to keep within its structures (think of the illiterate crap that gets posted here every day – yes, I know that I and the other |333173|3|_||3 [there is at least one other person using my name] post).
FInally, there is the issue of evolution. As new words are needed, someone is going to have to create them, which means that you could end up with the situation in France where they have the language officially defined. Preferably the creators of the project should define the standards, until it becomes large enough for ISO to take over.

Add Your Comment

Subscribe: RSS Leave a comment

Chris

March 9, 2007 at 4:30 pm

well..

In instances like this I like to say at least someones trying. But you don’t hold the same regard to the kid trying to put the square block in the triangle hole. Will be interesting to see what develops.

|333173|3|_||3

March 10, 2007 at 1:27 am

Brad

March 12, 2007 at 5:42 pm

Hmm....

Why the hell is this thing called “Freebase”? Yeah, there’s a small “free” and “base” association with the product…but really?

That’d be like calling my new mass-consumer product “Crackpipe” because it contains a data security tool.

Drug references are all well and good…until you actually try to sell them.

Tuesday
09:36	Super Meth Isn't The Hero We Want, But It's The Hero We Deserve (1)
05:30	NPR Flubs Its Recovery From Brutal Republican Funding Attacks (1)
Sunday
12:00	Funniest/Most Insightful Comments Of The Week At Techdirt (20)
Saturday
12:00	This Week In Techdirt History: May 17th - 23rd (0)
Friday
19:39	The FDA Takes Its Turn Burying Studies Showing The Safety Of COVID, Shingles Vaccines (8)
15:55	Ken Paxton Wanted To Crack Down On Forum Shopping. Now Lawyers Say He’s Improperly Seeking Out Favorable Courts. (3)
13:14	France's Terrible Copyright Law, Hadopi, Is Not Quite Dead (2)
10:59	Journalists Identify Murder Victims Of Trump's Boat Strike Program (18)
10:54	Daily Deal: Headway Premium Memorial Day Sale (0)
09:32	SpaceX's IPO Filing Shows Elon's Twitter 'Business Genius' Was A Fantasy (16)

Freebasing The Web… Or Just Making A Tired Idea Sound Fresh Again?

from the trying-to-figure-this-out dept

Comments on “Freebasing The Web… Or Just Making A Tired Idea Sound Fresh Again?”

well..

Hmm....

Add Your Comment Cancel reply

Comment Options:

What's this?

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Tuesday

Sunday

Saturday

Friday

More

Tools & Services

Company

Contact

More

Freebasing The Web… Or Just Making A Tired Idea Sound Fresh Again?

from the trying-to-figure-this-out dept

Comments on “Freebasing The Web… Or Just Making A Tired Idea Sound Fresh Again?”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Trending Posts

Tuesday

Sunday

Saturday

Friday

More

Email This Story

Tools & Services

Company

Contact

More