FCC Releases All Net Neutrality Comments As Giant XML Files For Data Analysis
from the open-records dept
While it had trouble keeping its site up during times of intense commenting, the FCC’s IT team is now working to make all the submitted comments on its “open internet” net neutrality proposals available to download in a bunch of XML files:
Because of the sheer number of comments and the great public interest in what they say, Chairman Wheeler has asked the FCC IT team to make the comments available to the public today in a series of six XML files, totaling over 1.4 GB of data ? approximately two and half times the amount of plain-text data embodied in the Encyclopedia Britannica. The release of the comments as Open Data in this machine-readable format will allow researchers, journalists and others to analyze and create visualizations of the data so that the public and the FCC can discuss and learn from the comments we?ve received. Our hope is that these analyses will contribute to an even more informed and useful reply comment period, which ends on September 10. We will make available additional XML files covering reply comments after that date.
While the more cynical among you may see this as more of a statement on the rather weak capabilities of the current FCC’s system for handling searching through the submitted comments, it’s still nice to see at least a move towards openness and transparency in sharing this data for others to search through. As we’ve noted, we’ve been digging into some of the data on the comments, and hopefully this will make the process much easier.
Filed Under: comments, fcc, net neutrality, nprm, open data, open internet, xml
Comments on “FCC Releases All Net Neutrality Comments As Giant XML Files For Data Analysis”
Where's the...
…magnet link? BitTorrent would be the way to distribute those……oh, wait!
Re: Where's the...
Sorry, the comment got misplaced. See here.
Since it’s government public data, it should be in public domain. So in theory no one should stop anyone from taking and redistributing these anyway they want.
Re: Re:
Since it’s government public data, it should be in public domain. So in theory no one should stop anyone from taking and redistributing these anyway they want.
That’s not accurate. The rules only apply to works produced by government employees. That is not the case here. Any copyrights remain with the creators of the content.
Re: Re: EULA?
Do you know what the terms and conditions were for submitted comments?
I’m guessing it was probably something like “Grant the FCC a perpetual license to copy and display this comment.” However, I don’t know for sure. I figure you all would have picked up on it if there was a copyright assignment clause, though.
I hope none of the people analyzing this data work for places with legal departments who don’t understand fair use. Especially since, that’s the only way that anyone reporting on this can directly quote any of the comments.
Another little aside is that while it might be legal to download these huge files to your personal PC, it’s almost certainly illegal to give a copy of them to anyone else. They have to go use up FCC bandwidth by obtaining the files from a “authorized source”.
Re: Re: Re:
I’m pretty sure that correspondence with a government entity is in the public domain unless it hits one of the exclusion cases. This information does not.
Re: Re: Re:
I guess then it can depend on conditions of the submission. If for example they include relinquishing rights into the public domain, then the whole archive is in public domain. For example contributing to Wikipedia binds that to Creative Commons and so on.
What’s an “Encyclopedia Britannica”?
Re: Re:
I feel old.
Re: Re:
The original Wikipedia in expensive hardcover.
>:P
Might have to grab this one… I lost the manual for my washing machine.
A million monkeys typing on a million keyboards...
… though not particularly randomly.
I wonder why they compare the size of XML marked up data to raw text? I suspect the submission data would drop up to half of its size if displayed as raw text. Also, boo to offering XML files for download without compression! They serve their web pages gzip-encoded (compressed), why not their XML files? Or at least pre-compressed versions… the 5th file zips from 100M down to 5.6M, and 7zip takes it down to 3.5M! Anyway, I guess I wouldn’t expect the FCC to know anything about the internet by now.
Also, I like this from their release:
While you’re helping us to do our work, would you mind conforming to the standards that we have to adhere to?
Joking aside, I’m glad they’re making this available to everyone. Even if we were to trust that they know exactly what they’re doing, if they were doing all the work themselves they’d only analyse the submissions in whatever ways they can both imagine and implement in time. Giving this to the internet lets them benefit from novel ways of parsing the data that they might not have thought about, and it puts the onus on everyone else to try to justify whatever they claim to find in the data.
If the FCC was to make all of the findings itself without making the data public, anyone who disagreed with the result would simply say that the FCC was selectively parsing the results. Now, whenever anyone tries to make any claims from the data, everyone else will be able to verify those claims… and if someone tries to make a claim without saying how they came to that conclusion, then that will be worth about as much as 1.5Gb of uncompressed XML text.
Re: A million monkeys typing on a million keyboards...
Unless they added a slew of comments from their corporate masters afterwards.
I was disappointed to see the XML files don’t include IP address, though not surprising given the comment submission form didn’t exactly demand users verity themselves.
This means separation by city and state. It also means a possibility of gross inaccuracies regarding the data. Geez, this could get bad.
Oops, my cynicism is showing itself again. I suppose I could take the data on face value. Though, it’ll be difficult to determine a margin of error without the IP address.
Don’t take this as a fact, because I’ve not counted the responses in the actual files yet, but cursory scans seems to have a majority favoring classification as common carrier.
In addition to this, the commentary also seems to be sparse, as though people simply voted without leaving comment.
Once I do this, I’ll sit back and wait for others to post the results so I can determine who’s lying and who’s honest.
The FCC did good by releasing these files.
Re: IP addy
su – bob
All the comments in favor came from 8.8.8.8.
Obama is slipping
The FEDERAL communication commission is releasing information unredacted without a foia request being filled? That’s not the way this administration is supposed to work. ..
Re: Obama is slipping
That is the way all American (and hopefully other) administrations are supposed to work. They just happen to behave in as opposite a way as possible usually.
Well, at least they didn’t release it in Lotus Notes only filetypes.
they used this huge file size to slow the dl process
My guess
My guess is that data analysis will reveal that commenters favored the FCC going common carrier or implementing strong net neutrality rules by something like 90% to 10%.
Then the FCC will totally ignore the will of the people and implement “fast” (and by default, slow) lanes anyway.
Downloaded it...
And just from a quick look there are lots of entries which the respondent copy and pasted the “Net neutrality is the First Amendment of the Internet, the principle that Internet
service providers (ISPs) treat all data equally. As an Internet user, net neutrality is vitally important to me. …” text. So someones campaign appeared to work!
Re: Downloaded it...
And a quick search shows 111651 comments out of ~447K comments included that first sentence as part of their comment.
I came across this article after searching for why my comments wouldn’t load on the FCC site.
I just came from trying to make a comment on the FCC comment site. Despite it being made for laywers and law firms and not regular people, the site was still unable to take my comment saying that it “could not add the text to the file” and, after turning it into a PDF and submitting it through their Expert submission “disk quota is full”. No matter how big this XML file is, it doesn’t represent half the comments people want to share about net neutrality.
Emailed comments too?
When the dishwasher manual was exposed here, the site did not appear to include comments that were sent to the email address, only comments posted to the web page. Does this dump have the email comments?