Obscure Analytics Tool Helps Cops Make Sense Of All That Location Data They're Grabbing Without A Warrant

Hong Kong Court Revokes Bail For Jimmy Lai After Deciding It Didn't Interpret Vague National Security Law Vaguely Enough

Seven Years Ago, CERN Gave Open Access A Huge Boost; Now It's Doing The Same For Open Data

Culture

from the tim-berners-lee-would-be-proud dept

Mon, Jan 4th 2021 08:22pm - Glyn Moody

Techdirt readers will be very familiar with CERN, the European Council for Nuclear Research (the acronym comes from the French version: Conseil Européen pour la Recherche Nucléaire). It’s best known for two things: being the birthplace of the World Wide Web, and home to the Large Hadron Collider (LHC), the world’s largest and most powerful particle accelerator. Over 12,000 scientists of 110 nationalities, from institutes in more than 70 countries, work at CERN. Between them, they produce a huge quantity of scientific papers. That made CERN’s decision in 2013 to release nearly all of its published articles as open access one of the most important milestones in the field of academic publishing. Since 2014, CERN has published 40,000 open access articles. But as Techdirt has noted, open access is just the start. As well as the final reports on academic work, what is also needed is the underlying data. Making that data freely available allows others to check the analysis, and to use it for further investigation — for example, by combining it with data from elsewhere. The push for open data has been underway for a while, and has just received a big boost from CERN:

The four main LHC collaborations (ALICE, ATLAS, CMS and LHCb) have unanimously endorsed a new open data policy for scientific experiments at the Large Hadron Collider (LHC), which was presented to the CERN Council today. The policy commits to publicly releasing so-called level 3 scientific data, the type required to make scientific studies, collected by the LHC experiments. Data will start to be released approximately five years after collection, and the aim is for the full dataset to be publicly available by the close of the experiment concerned. The policy addresses the growing movement of open science, which aims to make scientific research more reproducible, accessible, and collaborative.

The level 3 data released can contribute to scientific research in particle physics, as well as research in the field of scientific computing, for example to improve reconstruction or analysis methods based on machine learning techniques, an approach that requires rich data sets for training and validation.

CERN’s open data portal already contains 2 petabytes of data — a figure that is likely to rise rapidly, since LHR experiments typically generate massive quantities of data. However, the raw data will not in general be released. The open data policy document (pdf) explains why:

This is due to the complexity of the data, metadata and software, the required knowledge of the detector itself and the methods of reconstruction, the extensive computing resources necessary and the access issues for the enormous volume of data stored in archival media. It should be noted that, for these reasons, general direct access to the raw data is not even available to individuals within the collaboration, and that instead the production of reconstructed data (i.e. Level-3 data) is performed centrally. Access to representative subsets of raw data — useful for example for studies in the machine learning domain and beyond — can be released together with Level-3 formats, at the discretion of each experiment.

There will also be Level 2 data, “provided in simplified, portable and self-contained formats suitable for educational and public understanding purposes”. CERN says that it may create “lightweight” environments to allow such data to be explored more easily. Virtual computing environments for the Level 3 data will be made available to aid the re-use of this primary research material. Although the data is being released using a Creative Commons CC0 waiver, acknowledgements of the data’s origin are required, and any new publications that result must be clearly distinguishable from those written by the original CERN teams.

As with the move to open access in 2013, the new open data policy is unlikely to have much of a direct impact for people outside the high energy physics community. But it does represent an extremely strong and important signal that CERN believes open data must and will become the norm.

Follow me @glynmoody on Twitter, Diaspora, or Mastodon.

Filed Under: experiments, knowledge, open access, open data, science, sharing
Companies: cern

2 Comments Leave a Comment

If you liked this post, you may also be interested in...

Comments on “Seven Years Ago, CERN Gave Open Access A Huge Boost; Now It's Doing The Same For Open Data”

Subscribe: RSS Leave a comment

Anonymous Coward

January 4, 2021 at 9:07 pm

This is a disaster! How will scientists be motivated to collect data if their great-grandchildren can’t cash in on the copyrights? How will they pay for their supercolliders, supercomputers, and vacation homes? How can they keep individuals from inferior races from doing science also?

And just imagine, some of the data might be chanted by a rapper without attribution. Or used to remote-control a John Deere tractor.

Stand up and stop the madness! Send your anti-proton to CERN now!

Christenson

January 5, 2021 at 6:26 pm

More! More!

I saw two issues:
a) It’s reasonable for CERN to not want random people with no qualifications implying they are associated with them, just as Techdirt wouldn’t want just anyone implying they do work for Techdirt — but it should be framed as a Trademark issue over confusion, not "must attribute this data".
b) Releasing a reasonable quantity of samples of the basic data from the sensors should be required. This allows important independent checks of the data reduction algorithms to happen. Anyone else remember an ozone hole that was made invisible by certain satellite data reduction algorithms assuming what was seen was a sensor problem?

Given the huge volume of raw data, CERN would be really smart to collocate and possibly allow guests to run their own data reduction at the time the data is taken.

Add Your Comment Cancel reply

Obscure Analytics Tool Helps Cops Make Sense Of All That Location Data They're Grabbing Without A Warrant

Hong Kong Court Revokes Bail For Jimmy Lai After Deciding It Didn't Interpret Vague National Security Law Vaguely Enough

Follow Techdirt

Subscribe to Our Newsletter

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »

Essential Reading

The Techdirt Greenhouse

Read the latest posts:

Read All »

Techdirt Deals

Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Tools & Services

Company

Contact

Brought to you by Floor64

Designed with WordPress. Hosted by Pressable.

Friday
12:05	Ctrl-Alt-Speech: Making The Best Of A Ban Situation (1)
Thursday
19:39	The Nintendo/Palworld Patent Suit Appears To Be Heading For A Muted Conclusion (4)
15:04	Sotomayor Trashes SCOTUS Majority For Cherry-Picking Qualified Immunity Cases To Reverse (4)
13:05	T-Mobile Jacks Up Prices For Everybody, Ignores Years Of 'Uncarrier' Promises (4)
10:59	Thin-Skinned Palantir Loses Its Bid To Bully A Swiss Magazine Into Publishing Its Rebuttals To Embarrassing Reporting (5)
10:54	Daily Deal: MYNT3D Professional Printing 3D Pen with OLED Display (0)
09:27	Man Arrested For Playing Darth Vader's Theme Music At National Guard Troops Scores Settlement (23)
05:30	More IPO Fluffing: Musk's Starlink Hints At Becoming Full Wireless Phone Company (5)
Wednesday
19:58	No, Tim Sweeney, Valve Isn't 'Irresponsible' For Having An AI Disclosure Tag On Games (54)
14:55	German Court Says Google Is Liable For False Claims In Its AI Overviews Because They Are Its Own Words (40)

Seven Years Ago, CERN Gave Open Access A Huge Boost; Now It's Doing The Same For Open Data

from the tim-berners-lee-would-be-proud dept

Comments on “Seven Years Ago, CERN Gave Open Access A Huge Boost; Now It's Doing The Same For Open Data”

More! More!

Add Your Comment Cancel reply

Comment Options:

What's this?

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Friday

Thursday

Wednesday

More

Tools & Services

Company

Contact

More

Seven Years Ago, CERN Gave Open Access A Huge Boost; Now It's Doing The Same For Open Data

from the tim-berners-lee-would-be-proud dept

Comments on “Seven Years Ago, CERN Gave Open Access A Huge Boost; Now It's Doing The Same For Open Data”

Add Your Comment Cancel reply

Comment Options:

What's this?

Techdirt Daily Newsletter

Get all our posts in your inbox with the Techdirt Daily Newsletter!

The Techdirt Greenhouse

Friday

Thursday

Wednesday

More

Email This Story

Tools & Services

Company

Contact

More