FCC Releases All Net Neutrality Comments As Giant XML Files For Data Analysis

from the open-records dept

While it had trouble keeping its site up during times of intense commenting, the FCC's IT team is now working to make all the submitted comments on its "open internet" net neutrality proposals available to download in a bunch of XML files:
Because of the sheer number of comments and the great public interest in what they say, Chairman Wheeler has asked the FCC IT team to make the comments available to the public today in a series of six XML files, totaling over 1.4 GB of data – approximately two and half times the amount of plain-text data embodied in the Encyclopedia Britannica. The release of the comments as Open Data in this machine-readable format will allow researchers, journalists and others to analyze and create visualizations of the data so that the public and the FCC can discuss and learn from the comments we’ve received. Our hope is that these analyses will contribute to an even more informed and useful reply comment period, which ends on September 10. We will make available additional XML files covering reply comments after that date.
While the more cynical among you may see this as more of a statement on the rather weak capabilities of the current FCC's system for handling searching through the submitted comments, it's still nice to see at least a move towards openness and transparency in sharing this data for others to search through. As we've noted, we've been digging into some of the data on the comments, and hopefully this will make the process much easier.

Reader Comments (rss)

(Flattened / Threaded)

  •  
    identicon
    Anonymous Anonymous Coward, Aug 5th, 2014 @ 8:39pm

    Where's the...

    ...magnet link? BitTorrent would be the way to distribute those......oh, wait!

     

    reply to this | link to this | view in chronology ]

  •  
    identicon
    Shmerl, Aug 5th, 2014 @ 8:59pm

    Since it's government public data, it should be in public domain. So in theory no one should stop anyone from taking and redistributing these anyway they want.

     

    reply to this | link to this | view in chronology ]

    •  
      icon
      Mike Masnick (profile), Aug 6th, 2014 @ 5:06am

      Re:

      Since it's government public data, it should be in public domain. So in theory no one should stop anyone from taking and redistributing these anyway they want.

      That's not accurate. The rules only apply to works produced *by government employees*. That is not the case here. Any copyrights remain with the creators of the content.

       

      reply to this | link to this | view in chronology ]

      •  
        icon
        Arthur Moore (profile), Aug 6th, 2014 @ 6:04am

        EULA?

        Do you know what the terms and conditions were for submitted comments?

        I'm guessing it was probably something like "Grant the FCC a perpetual license to copy and display this comment." However, I don't know for sure. I figure you all would have picked up on it if there was a copyright assignment clause, though.

        I hope none of the people analyzing this data work for places with legal departments who don't understand fair use. Especially since, that's the only way that anyone reporting on this can directly quote any of the comments.

        Another little aside is that while it might be legal to download these huge files to your personal PC, it's almost certainly illegal to give a copy of them to anyone else. They have to go use up FCC bandwidth by obtaining the files from a "authorized source".

         

        reply to this | link to this | view in chronology ]

      •  
        identicon
        Anonymous Coward, Aug 6th, 2014 @ 12:45pm

        Re: Re:

        I'm pretty sure that correspondence with a government entity is in the public domain unless it hits one of the exclusion cases. This information does not.

         

        reply to this | link to this | view in chronology ]

      •  
        identicon
        Shmerl, Aug 6th, 2014 @ 3:33pm

        Re: Re:

        I guess then it can depend on conditions of the submission. If for example they include relinquishing rights into the public domain, then the whole archive is in public domain. For example contributing to Wikipedia binds that to Creative Commons and so on.

         

        reply to this | link to this | view in chronology ]

  •  
    identicon
    Anonymous Coward, Aug 6th, 2014 @ 12:03am

    What's an "Encyclopedia Britannica"?

     

    reply to this | link to this | view in chronology ]

  •  
    identicon
    Anonymous Coward, Aug 6th, 2014 @ 12:30am

    Might have to grab this one... I lost the manual for my washing machine.

     

    reply to this | link to this | view in chronology ]

  •  
    icon
    MrTroy (profile), Aug 6th, 2014 @ 1:30am

    A million monkeys typing on a million keyboards...

    ... though not particularly randomly.

    I wonder why they compare the size of XML marked up data to raw text? I suspect the submission data would drop up to half of its size if displayed as raw text. Also, boo to offering XML files for download without compression! They serve their web pages gzip-encoded (compressed), why not their XML files? Or at least pre-compressed versions... the 5th file zips from 100M down to 5.6M, and 7zip takes it down to 3.5M! Anyway, I guess I wouldn't expect the FCC to know anything about the internet by now.

    Also, I like this from their release:
    Finally, we hope that whatever visualizations are developed using this open data will comply with the standards that allow use and access by differently-abled individuals. The Chairman and the FCC CIO are committed to ensuring accessible web content in multiple forms for all.

    While you're helping us to do our work, would you mind conforming to the standards that we have to adhere to?

    Joking aside, I'm glad they're making this available to everyone. Even if we were to trust that they know exactly what they're doing, if they were doing all the work themselves they'd only analyse the submissions in whatever ways they can both imagine and implement in time. Giving this to the internet lets them benefit from novel ways of parsing the data that they might not have thought about, and it puts the onus on everyone else to try to justify whatever they claim to find in the data.

    If the FCC was to make all of the findings itself without making the data public, anyone who disagreed with the result would simply say that the FCC was selectively parsing the results. Now, whenever anyone tries to make any claims from the data, everyone else will be able to verify those claims... and if someone tries to make a claim without saying how they came to that conclusion, then that will be worth about as much as 1.5Gb of uncompressed XML text.

     

    reply to this | link to this | view in chronology ]

    •  
      identicon
      Anonymous Coward, Aug 6th, 2014 @ 6:35am

      Re: A million monkeys typing on a million keyboards...

      Unless they added a slew of comments from their corporate masters afterwards.

       

      reply to this | link to this | view in chronology ]

  •  
    icon
    Violynne (profile), Aug 6th, 2014 @ 4:06am

    I was disappointed to see the XML files don't include IP address, though not surprising given the comment submission form didn't exactly demand users verity themselves.

    This means separation by city and state. It also means a possibility of gross inaccuracies regarding the data. Geez, this could get bad.

    Oops, my cynicism is showing itself again. I suppose I could take the data on face value. Though, it'll be difficult to determine a margin of error without the IP address.

    Don't take this as a fact, because I've not counted the responses in the actual files yet, but cursory scans seems to have a majority favoring classification as common carrier.

    In addition to this, the commentary also seems to be sparse, as though people simply voted without leaving comment.

    Once I do this, I'll sit back and wait for others to post the results so I can determine who's lying and who's honest.

    The FCC did good by releasing these files.

     

    reply to this | link to this | view in chronology ]

  •  
    identicon
    beech, Aug 6th, 2014 @ 5:31am

    Obama is slipping

    The FEDERAL communication commission is releasing information unredacted without a foia request being filled? That's not the way this administration is supposed to work. ..

     

    reply to this | link to this | view in chronology ]

    •  
      icon
      Killer_Tofu (profile), Aug 7th, 2014 @ 1:09pm

      Re: Obama is slipping

      That is the way all American (and hopefully other) administrations are supposed to work. They just happen to behave in as opposite a way as possible usually.

       

      reply to this | link to this | view in chronology ]

  •  
    identicon
    Anonymous Coward, Aug 6th, 2014 @ 5:48am

    Well, at least they didn't release it in Lotus Notes only filetypes.

     

    reply to this | link to this | view in chronology ]

  •  
    identicon
    Anonymous Coward, Aug 6th, 2014 @ 6:46am

    they used this huge file size to slow the dl process

     

    reply to this | link to this | view in chronology ]

  •  
    icon
    JWW (profile), Aug 6th, 2014 @ 7:44am

    My guess

    My guess is that data analysis will reveal that commenters favored the FCC going common carrier or implementing strong net neutrality rules by something like 90% to 10%.

    Then the FCC will totally ignore the will of the people and implement "fast" (and by default, slow) lanes anyway.

     

    reply to this | link to this | view in chronology ]

  •  
    identicon
    David, Aug 6th, 2014 @ 8:03am

    Downloaded it...

    And just from a quick look there are lots of entries which the respondent copy and pasted the "Net neutrality is the First Amendment of the Internet, the principle that Internet
    service providers (ISPs) treat all data equally. As an Internet user, net neutrality is vitally important to me. ..." text. So someones campaign appeared to work!

     

    reply to this | link to this | view in chronology ]

    •  
      identicon
      David, Aug 6th, 2014 @ 8:06am

      Re: Downloaded it...

      And a quick search shows 111651 comments out of ~447K comments included that first sentence as part of their comment.

       

      reply to this | link to this | view in chronology ]

  •  
    identicon
    TRL, Aug 6th, 2014 @ 10:20am

    I came across this article after searching for why my comments wouldn't load on the FCC site.

    I just came from trying to make a comment on the FCC comment site. Despite it being made for laywers and law firms and not regular people, the site was still unable to take my comment saying that it "could not add the text to the file" and, after turning it into a PDF and submitting it through their Expert submission "disk quota is full". No matter how big this XML file is, it doesn't represent half the comments people want to share about net neutrality.

     

    reply to this | link to this | view in chronology ]

  •  
    identicon
    Eli the Bearded, Aug 6th, 2014 @ 2:12pm

    Emailed comments too?

    When the dishwasher manual was exposed here, the site did not appear to include comments that were sent to the email address, only comments posted to the web page. Does this dump have the email comments?

     

    reply to this | link to this | view in chronology ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Save me a cookie
  • Note: A CRLF will be replaced by a break tag (<br>), all other allowable HTML will remain intact
  • Allowed HTML Tags: <b> <i> <a> <em> <br> <strong> <blockquote> <hr> <tt>
Follow Techdirt
Advertisement
Essential Reading
Techdirt Deals
Techdirt Insider Chat
Techdirt Reading List
Advertisement
Recent Stories
Advertisement
Support Techdirt - Get Great Stuff!

Close

Email This

This feature is only available to registered users. Register or sign in to use it.