How Newspapers Can Make Their Data More Useful: Uncovering The Semantic Newspaper

from the stuff-to-think-about dept

Earlier this week, when we wrote about yet another weak strategy that newspaper industry-types were discussing as a plan to “fight back” against the internet, a few people complained in the comments that we only seem to focus on the negative side of what newspapers do, and never highlight the positives or come up with any suggestions on our own. Part of this may be because it just seems like so few newspapers seem to be doing much right. However, it’s also not entirely true. In the past, we’ve discussed ways that newspapers can better customize and also why newspapers should recognize that their role has shifted from being just an information deliverer, to being an enabling party that helps its own readers spread the news — something sites like Digg have shown many people want to do. Techmeme has pointed us to another interesting idea, this time suggested by Adrian Holovaty, who has worked for many years on the digital side of various newspapers. Rather than coming up with vague statements about blogs, tags or whatever the latest buzzword is, Holovaty points out how newspapers need to fundamentally shift how they think about the data they create. That is, they need to recognize that it’s data they produce. Rather than focus on each “story” as a blackbox, they should be willing to break it up into chunks of useful metadata. That is, each story is likely to have certain consistent attributes, and making sure the newspaper database understands those attributes allows the newspaper to become a data source, rather than just a collection of news articles. This doesn’t mean to get rid of the story itself, but at least make sure the database recognizes the different data attributes.

This is a very powerful idea, that may bring to mind Tim Berners-Lee’s idea of the semantic web, where there’s a lot more metadata for computers to understand. Of course, the big stumbling block for the semantic web over the years is often that it involves setting up too rigid a structure, eliminating much of what made the web so useful in the first place. It forces people to make choices and to assign specific labels or categories when they might just want to put the full content out there. In fact, Holovaty acknowledges some of this, when he complains that too many in the newspaper industry just see the content management system as the fastest means possible of delivering their story. They just want to be able to dump the story in and have it published. However, as Holovaty has also seen, some are beginning to see the light — and with the consistency of certain types of news stories, there’s really very little need for the “flexibility” that often holds back attempts at the semantic web. Just last month, for example, we pointed out that Thomson Financial is trying to automate the process of writing certain stories, such as on earnings releases. That takes the same concept from a different angle, easing the labor side, but at the same time inherently recognizing the metadata involved.

While some journalists may protest this attempt to “chunkify” their stories, there’s nothing in this process that needs to take anything away from their traditional journalism. The story is still filed and is still important. What the additional data (or the classification/categorization of that data) does is open up a goldmine of additional information and services a newspaper can provide. Rather than just focusing on the qualitative angle, the data is exposed and can be used in a variety of ways — many of which may not be obvious at first, but will come to light later. Holovaty uses an example of being able to break up a ton of useful weather forecast data, and easily combine it with a system for keeping track of little league games (where weather info is important). That’s just a small example, but making news data, rather than stories, useful has plenty of other benefits that could revitalize the news business. As an example of how such things could be useful, I was going to point to the ChicagoCrime website that maps where crimes have occurred in Chicago — and in looking it up, only now realized that it was actually created by Holovaty as well (no wonder). So the good news is that there are some really good ideas out there for improving the value of traditional news organizations. It’s just a matter of getting more in the industry to embrace them.

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “How Newspapers Can Make Their Data More Useful: Uncovering The Semantic Newspaper”

Subscribe: RSS Leave a comment
dorpus says:

Freak Accident Categories?

How often do newsrooms get shot up by psychos?

How often do tow trucks get towed?

How often do novice skydivers go into a panic, entangle their instructor’s parachute, and kill both of them?

How often do fire trucks get stolen?

How often do lifeguards drown?

How often do serial killer nurses kill doctors?

How often do Arab terrorists accidentally kill other terrorists while building bombs or practicing firearms?

Clifford VanMeter (user link) says:


I personally like the idea of the Semantic approach, but there is an underlying obstacle to dealing with all that meta-data. Money.

How does the newspaper monitize the meta-data, how do the journalists who are collecting this information getting paid. Its easy enough the pay 2¢ per word, but journalists are already wanting to be paid again for web publication of articles they created for print publications. How about photographers? Editors?

Who gets paid when two or more journalists’ “chunk” get mashed up into the result for a single custom query? These are the real world problems that must be dealt with BEFORE we can even consider moving in that direction.

Maybe the real question should be this — If every journalist can create and distribute their own content directly to a mass audience, what real purpose do newspapers serve anymore? Aren’t they really little more than content aggregators? A kind of manual RSS feed?

Phil Wolff (user link) says:

Re: News

Providing structured data doesn’t address the revenue issue, at least not directly.

I’ll agree that it would be a value added service, and that you would readily synthesize new products to sell. The act of publishing structured data won’t solve the problem of a business (news) and an occupation (journalism) with falling barriers to entry and a long line of new entrants.

Anonymous Coward says:

Re: Re: News

If the structured data were used correctly, I think it could generate revenue in a similar way to how ESPN Insider works, as well as possibly attracting online advertisers.

For example, it’s local election night and a reporter, in addition to writing the story about Suzie Q winning the election and hugging her nephew and crying at city hall upon hearing the results, actually inputs those results into a Web GUI back in the office that goes into a database. That database can then be maintained over every election cycle and made available online to readers for a premium price (or who subscribe to the print version). Same could be done for campaign contributions, local crime data, school testing scores, etc.

The problem: convincing crotchety editors and crotchety reporters, who would rather tell stories about the old Linotype machines, that this could actually be a good thing that serves their readers well.

Anonymous Coward says:

semantic spam

The semantic web is a dream that will only work within very trusted sites (maybe a newspaper site would qualify). We’ve already had a taste of the semantic web when people use to put META tags in their html headers to list the subject matter of their page. False informantion was stuffed in their to show up on certain search engines.
The semantic web seems a wonderful thing – for spam-style search engine hijackers.

Anonymous Coward says:

To Mike

This is about your reply on the other newspaper post in which i was complaining. You stated that the purpose of a newspaper is to draw people to the classifieds.

Do you seriously think that’s what newspapers are for? I certainly hope not. People pay the newspaper for space to advertise things they’re selling and for ther various purposes. Classifieds are advertisments from indivual people. The keyword being advertisments. Is the purpose of televised news for people to watch the commercials? The purpose is the news, the advertisements are the financial means to allow for distribution of the news.

I am glad to finally see suggestions from techdirt though.

Joe Wroblewski says:

Only the innovators will survive

I find myself reading the newspaper less and less all the time. I have often wondered if publishers don’t understand the basic shift that is taking place or they don’t know what to do about it.

I think Adrian’s idea is exactly the type of innovation the newspaper industry needs. It will surely provide a lot of value to readers and it will do it by leveraging new and evolving technologies.

There just isnt’ a need for so many newspapers anymore, so only the innovators will survive.

It's me again says:

Um, so who's going to pay for REAL news?

I for one, would like to get my news from a REAL reporter, not EveryBlogger in his skivvies typing his opinions into a laptop. And if you want REAL reporters, someone’s going to have to pay them.

Just recently, I read this and thought it was great: “If newspapers were invented tomorrow, people would think they were the latest, greatest, newest thing! Portable news! Take it with you where you want! Read it when you want!”

It’s a loose paraphrase, and if someone knows who wrote that, I’d love to know.

Anonymous Coward says:

“That is, they need to recognize that it’s data they produce. Rather than focus on each “story” as a blackbox, they should be willing to break it up into chunks of useful metadata…”

So who would benefit the most from this … oh yes the bloggers; they would then not have to recycle 3rd of 4th hand interpretations of newspaper stories, they could direct to the news source and build their contrived and predictable opinions directly on the “facts”.

If it ever happens you will soon find facts getting in the way of a good blog.

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...