DailyDirt: The Ever-Growing Growth Of Data…

from the urls-we-dig-up dept

Thu, Jan 24th 2013 05:00pm - Michael Ho

There are a lot of reasons to be optimistic about the future. Some folks will always predict doom and gloom, but we say, “The Sky Is Rising!” (loud and proud — and again with sequel The Sky Is Rising 2). The advent of digital information has created an enormous wealth of data, and the amount of this digital awesomeness seems to be growing all the time. Here are just a few more examples of the amazing abundance of media that surrounds us.

If you’d like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post.

Comments on “DailyDirt: The Ever-Growing Growth Of Data…”

Anonymous Coward

January 24, 2013 at 7:55 pm

Since the beginning of time until 2003, humans generated about 5 billion gigabytes of data… and now we generate that much every 2 days. And that rate is accelerating (but humans are not exclusively generating all that data).

I would find the prospect enlightening if I didn’t suspect that the majority of that new information isn’t being generated by humans.

Rekrul

January 24, 2013 at 11:18 pm

The Internet Archive...

I don’t know if they’ve changed it, but the Internet Archive used to go too far in obeying the robots.txt files. I once tried to access the archive of a site that no longer existed and was told that the site had been blocked due to a robots file. I was sure this was a mistake as the site was very simple and open while it was up. I was told that whoever owned the domain now, and who had put up one of those parking pages, had probably included a standard robots.txt file.

In other words; Person puts up site. Internet Archive makes copy of site. Site goes bust. New owner puts up generic site with robots.txt file. IA see robots.txt file and disables access to existing backup of old site.

When I asked if they couldn’t manually override this for sites that are obviously not the same anymore, I was told that it was impossible.

Mr. Applegate

January 25, 2013 at 4:00 am

Data Generation

“Since the beginning of time until 2003, humans generated about 5 billion gigabytes of data… and now we generate that much every 2 days. And that rate is accelerating (but humans are not exclusively generating all that data).”

Now how about telling us of that data we actually understand, or today even utilize? I seem to recall that number was extremely small (like a single digit percentage).

The problem is we have all this data but most of it is locked away so that it can’t be used by the masses, poorly organized so that even if one has access you can’t find the data you need, and much of the data is inaccurate or incomplete.

So we are nothing more than a bunch of pack rats!

Steph (user link)

January 25, 2013 at 4:33 am

Way to go!

[quote] and this virtual backup of the web doesn’t even touch sites that have a login or a robot.txt file that blocks the Wayback Machine.[/quote]

Nice. Now people who didn’t know you could do that will start.

Don’t give it away, people!

bob (profile)

January 25, 2013 at 8:02 am

generated data incorrect, maybe

“Since the beginning of time until 2003, humans generated about 5 billion gigabytes of data… and now we generate that much every 2 days. And that rate is accelerating (but humans are not exclusively generating all that data).”

but.. how much of that is redundant data?
I see the same posts on multiple sites, and news churns through thousands of news sites, and millions of blogs..
retweets by the really giant bucket, the same movie 50 times per torrent site, the same song in a billion drop boxes, etc.
so, removing the echos, how much data is actually generated?

Anonymous Coward

January 25, 2013 at 3:24 pm

I wondered how come Data started looking older and heavier.

Add Your Comment

Wednesday
15:16	War As A Pretext: Gulf States Are Tightening The Screws On Speech—Again (1)
13:15	ACAB: Cops Are Bringing 'Delinquency Of A Minor' Charges Against Adults Who Assist Students During Anti-ICE Protests (7)
11:08	Judge Tosses Trump's Ridiculous $10 Billion Defamation Suit Against Rupert Murdoch (10)
11:03	Daily Deal: Geekey Multi-Tool (1)
09:30	Administration Apparently Planning To Blow Off FISA Court's Ordered Fixes For Section 702 (2)
05:25	'Trump Phone' Sees Price Hike, But Still No Release Date (Or Actual Phone) (15)
Tuesday
20:10	The CDC Doesn't Want You To See A CDC Report On How Effective COVID Vaccines Are (27)
15:34	John Deere Pays $99 Million To Settle 'Right To Repair' Class Action (10)
13:30	Techdirt Podcast Episode 450: Infrastructure For The New Private Internet (0)
11:09	438 Experts Said Age Verification Is Dangerous. Legislators Are Moving Forward With It Anyway. (34)

DailyDirt: The Ever-Growing Growth Of Data…

from the urls-we-dig-up dept

Comments on “DailyDirt: The Ever-Growing Growth Of Data…”

The Internet Archive...

Data Generation

Way to go!

generated data incorrect, maybe

Add Your Comment Cancel reply

Comment Options:

What's this?

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Wednesday

Tuesday

More

Tools & Services

Company

Contact

More

DailyDirt: The Ever-Growing Growth Of Data…

from the urls-we-dig-up dept

Comments on “DailyDirt: The Ever-Growing Growth Of Data…”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Wednesday

Tuesday

More

Email This Story

Tools & Services

Company

Contact

More