Class Action Lawsuit Hopes To Hold GitHub Responsible For Hosting Data From Capital One Breach

from the into-the-breach dept

As soon as the Capital One breach was announced, you knew the lawsuits would follow. Handling the sensitive info of millions of people carelessly is guaranteed to net the handler a class-action lawsuit or two, but this one — filed by law firm Tycko & Zavareeri — adds a new twist.

The 28-page lawsuit filed Thursday in the U.S. District Court for the Northern District of California asserted that GitHub “actively encourages (at least) friendly hacking.”

It notes that the hacked Capital One information was posted online for months and alleges that the company violated state law to remove the information. “GitHub had an obligation, under California law, to keep off (or to remove from) its site Social Security numbers and other Personal Information,” the suit says

Weird legal theory, but one that could possibly to be stretched to target some of the $7.5 billion Microsoft paid to acquire GitHub. But it takes a lot of novel legal arguments to hold a third party responsible for content posted by a user, even if the content contained a ton of sensitive personal info.

The lawsuit [PDF] alleges GitHub knew about the contents of this posting since the middle of April, but did not remove it until the middle of July after being notified of its contents by another GitHub user. The theory the law firm is pushing is that GitHub was obligated to scan uploads for “sensitive info” and proactively remove third-party content. The lawsuit argues GitHub is more obligated than most because (gasp!) it encourages hacking and hackers.

GitHub knew or should have known that obviously hacked data had been posted to Indeed, GitHub actively encourages (at least) friendly hacking as evidenced by, inter alia,’s “Awesome Hacking” page

GitHub had an obligation, under California law, to keep off (or to remove from) its site Social Security numbers and other Personal Information.

Further, pursuant to established industry standards, GitHub had an obligation to keep off (or to remove from) its site Social Security numbers and other Personal Information.

The “industry standards” the lawsuit references are voluntary moderation efforts engaged in by social media platforms. Certainly no platform would want to be known as the habitual host of exfiltrated credit card data, but comparing the removal of offensive or plainly illegal content to the removal of strings of numbers from a site hosting an unusually large amount of strings of numbers is quite another. The law firm feels this assertion helps its case. It probably doesn’t.

Moreover, Social Security numbers are readily identifiable: they are nine digits in the XXX-XX-XXXX sequence. Individuals’ contact information such as addresses are similarly readily identifiable.

Thus, it is substantially easier to identify—and remove—such sensitive data. GitHub nonetheless chose not to.

Nine digits in a sequence. Oh, like phone numbers. And phone numbers tend to be found near addresses, especially when coders and developers are using GitHub as an offshoot of LinkedIn, posting their personal info for employers to find. Even long lists of personal info wouldn’t necessarily be innately suspicious. Employers and recruiters looking for people with certain skills have probably compiled all of this freely-provided personal info for easy reference. It’s not as easy to moderate content as the litigants believe.

But this belief, if backed by a judge, could add Github’s money to the pool of damages. Things will get a lot more interesting once GitHub responds to unintentionally hilarious assertions like these:

GitHub knew or should have known that the Personal Information of Plaintiffs and the Class was sensitive information that is valuable to identity thieves and cyber criminals. GitHub also knew of the serious harms that could result through the wrongful disclosure of the Personal Information of Plaintiffs and the Class.

As an entity that not only allows for such sensitive information to be instantly, publicly displayed, but one that also arguably encourages it, GitHub is morally culpable, given the prominence of security breaches today, particularly in the financial industry.

Well, we’ll see how “morally culpable” stands up in court, where “legally culpable” is the actual standard. GitHub will rely on Section 230 to be dismissed from this case and rightly so. The person responsible for posting sensitive data exfiltrated from Capital One is, unsurprisingly, the person who posted the sensitive data exfiltrated from Capital One. Capital One has a duty to protect the information it gathers from customers. A third party site with hosting capabilities does not and it’s not nearly as easy to moderate and proactively remove content as this lawsuit says it is.

Filed Under: ,
Companies: capital one, github

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Class Action Lawsuit Hopes To Hold GitHub Responsible For Hosting Data From Capital One Breach”

Subscribe: RSS Leave a comment
aerinai (profile) says:

How dare Github allow numeric digits to be used in code!

I think it would be safer if GitHub moderated every line of code and checked for copyright as well. How many infringing people are out there? We don’t know until GitHub finds those nefarious codethiefs! And don’t let them use numbers, those could be sensitive.

You know what… just to be safe… let’s make Github just stop letting 3rd parties post to their site. Only employees of GitHub should be allowed to post code to that site, like the publishers that they are! That seems like a sensible solution to solve these rogue hackers from hacking things.


Deputy Dickwad (profile) says:

Re: Re: Re: How dare Github allow numeric digits to be used in c

End all user content because Citizens don’t know what is best for themselves.

That is why I have to keep shooting these idiots!

Just imagine if they knew how to code, then I’d have to shoot their computers, phones, & cars just to keep ’em safe!

I cant afford that many bullets!

Help me out here! Get rid of 230!!!

Anonymous Coward says:

Re: Re: Re:3 What is the relevant law?

If there’s absolutely no basis then the lawsuit will be dismissed

This is the most likely outcome, yes.

there really isn’t much point in Tim posting these types of stories.

I’m sorry, what? This is newsworthy for several reasons:

  1. It’s an abuse of our court system and rule of law. The law says you only hold responsible those people who are actually responsible.
  2. These lawyers are people who should know and understand that, yet they are doing it anyway. Either they are deliberately ignoring it or they don’t understand the law. Either possibility is not good for someone who is supposedly an expert in the law.
  3. This is part of a pattern of attacks on the law in attempting to hold large companies liable for actions they didn’t commit, just because the actual responsible parties either can’t be found/prosecuted or are too poor to pay out the massive amounts of money the people suing think they are owed.
  4. There are other people out there (non-lawyers) who don’t understand the problems involved and think that this behavior is perfectly acceptable, some of whom may in fact be the people who hired the lawyers to bring the suit in the first place. Writing about these types of stories hopefully educates people who would otherwise not understand why this is stupid or bad practice.
  5. Capital One, a major credit card company was breached and personally identifiable information was stolen on thousands of people. The information was then publicly posted on Microsoft owned code repository, Github. This lawsuit is a (stupid) continuation of that series of events involving several large, major companies. This is the definition of newsworthy.

Given all that, why should he NOT report on these types of stories?

Gary (profile) says:

Re: Re: Re:3 What is the relevant law?

Reading the suit now. They say Github violated state law, CALIFORNIA CIVIL CODE § 1798.85

  1. Plaintiff Zielicke individually and on behalf of the California Subclass, repeats and alleges all paragraphs above, as if fully alleged herein.
  2. Plaintiff Zielicke alleges this claim individually and on behalf of the California Subclass.
  3. The California Civil Code § 1798.85 provides, inter alia, that an entity may not “[p]ublicly post or publicly display in any manner an individual’s social security number.”
  4. The statute defines “publicly post” or “publicly display” as “intentionally communicate
    or otherwise make available to the general public.”
  5. By engaging in the conduct alleged herein and/or by failing to act as alleged herein, GitHub has publicly posted or publicly displayed Plaintiff Zielicke’s and the California Subclass members’ Social Security numbers within the meaning of the statute.
  6. As a direct and proximate result of GitHub’s having publicly posted or publicly displayed this Personal Information, Plaintiff Zielicke and the California Subclass members sustained actual losses and damages as described herein, and will continue to suffer damages for, potentially, years to come.

Some nice points:
They claim that GitHub failed up uphold their own terms of service. The TOS in question? "GitHub reserves the right to remove anyone at any time for any reason." So since GitHub has failed to remove Everyone, they haven’t upheld their own TOS. Sounds legit!
Claims GitHub should have notified everyone involved about the breach. Clearly spurious – it wasn’t a breach of GitHub data/users.
Claims GitHUb should have just plain known that the data was there. Also, clearly silly.
Super silly – claims GitHub is in violation of Federal law regarding safe storage of personal data. Again, not their data!
Oh – violation of federal Wiretap laws! I guess "Throw all the charges at the wall, see what sticks" is what they are going for.

Anonymous Coward says:

Re: Re: What is the relevant law?

Do you even want github to have an upload filter? There could easily be legitimate code that uses social security numbers. One example could be if someone could have posted open source code on how to filter Social Security numbers from their website. The example would be a fake social security code but if you have any filter that blocks it, then likely it couldn’t be uploaded as if the SS example number isn’t in the code itself, it is probably on the comments.

Anonymous Coward says:

Re: Re: Re:2 What is the relevant law?

Sure. I’d be okay with that.

Why? It’s a code hosting website. How do you even BEGIN to make an upload filter for code?

Everybody likes to imagine the extreme version of what that might look like and what the consequences would be, but that’s probably not what would happen.

As many others have pointed out, it’s code, inevitably code is going to contain strings of numbers that match things like SSNs. For example, many programmers will define variables and test data that contain junk data such as "123456789" which would match those filters. As such it’s impossible to filter it out without catching legitimate code. And it’s not a "well the number of false positives will be small", no it’s going to be ANY code with strings of digits, which is likely most code. That kind of a filter would likely render Github useless.

NZgeek (profile) says:

Re: Re: Re:4 What is the relevant law?

It actually could be valid.

Wikipedia contains some good information about the structure of SSNs. The rules are fairly loose, and there’s no check digit to ensure that it’s not just a nonsense value.

The only public rules around SSNs are:

  • they’re made up of 9 digits, typically grouped 3-2-4
  • none of the 3 groups can be made up only of zeroes
  • the first digit cannot be 9
  • the first group cannot be 666

None of these rules would prevent 123-45-6789 from being issued.

Under the old issuing scheme (retired in June 2011), that number would be a completely valid SSN issued in New York. It would be area 123, group 45, serial 6789.

The newer scheme randomly generates numbers. It’s unlikely that this number will be generated, but it’s possible.

Cdaragorn (profile) says:

Re: Re: Re: Re:

I think the problem is that you’re assuming the format is relevant. It isn’t.

SSN’s are easily and often represented without those dashes. Phone numbers are also represented in many different formats. So the only way an SSN could be declared as obviously so is if you decided that any 9 digit number must be an SSN. This is clearly false.

Thad (profile) says:

Re: Re: Re:2 Re:

I think the problem is that you’re assuming the format is relevant.

I’m not the one assuming that; that’s the claim explicitly made in the lawsuit, as quoted in the article.

The filing says:

Moreover, Social Security numbers are readily identifiable: they are nine digits in the XXX-XX-XXXX sequence.

Tim responds:

Nine digits in a sequence. Oh, like phone numbers.

Which, no, that’s not what it says; it refers, explicitly, to "nine digits in the XXX-XX-XXXX sequence." That pattern does not, to the best of my knowledge, match any locale’s phone number.

There are plenty of reasons to criticize the lawsuit’s implication that Github should have used some kind of automated system to watch for SSNs, from false positives to the ease of circumventing such a filtering system. I don’t disagree with Tim’s overall point at all. I just disagree with his seeing the phrase "nine digits in the XXX-XX-XXXX sequence" and saying that could be a phone number. To the best of my knowledge, no, it couldn’t.

TFG says:

Re: Re: Re:3 Re:

There’s a problem with filtering based on the dashes: it’s incredibly easy to circumvent by simply removing the dashes, which is itself pretty easy to automate. Searching specifically for the xxx-xx-xxxx format is not a great plan, accordingly. A more general filter looking for just the dashes will run smack dab into the phone number issue.

If you’re trying to specifically filter out SSNs, you’ll be less easily circumvented by looking for strings of 9 numbers in a row – but that only makes sense in a context where you expect to be dealing with SSNs and have a specific need to filter it.

Even then, you’re going to run into false positives – there are street addresses in the US of A that wind up with nine numbers in a row (44546 2500 Street, is the format I’ve seen) and it’s all too easy for those to get snapped up in the problem.

Tim’s calling out of phone numbers really isn’t all that far off-base. What is incredibly off-base is the lawsuits understanding of … well, anything.

Anonymous Coward says:

Re: Re: Re:3 Re:

Store SSN’s in any relevant database and you will find they are stored as 9 digit values, generally with no dashes or separations (the most efficient use of what used to be very limited space).

So while 9 digits in that specific format "may" resemble a SSN, that doesn’t guarantee that it is a SSN and not some other type of number.

urza9814 (profile) says:

Re: Re: Re:4 Re:

When I took my introductory programming classes, we were taught that something like an SSN should always be stored as a string, since you wouldn’t typically do any manipulation on that object. Storing as a number might save a bit of space, but ideally it should be a string and you can (probably should) include those dashes. Although plenty of devs will still use a number to save space or ensure consistent formatting.

The bigger issue IMO is that there’s plenty of numbers which are designed to be compatible with SSNs. Penn State University student numbers are the biggest use case I’ve experienced, but that alone is probably tens or even hundreds of thousands of people/numbers. See, the software was originally designed to just identify students by SSN. But they’d use those numbers, for example, if a professor wanted to post test scores outside their office — so students could check their score, could compare it to others, but couldn’t easily see what another specific student scored. But they eventually realized that posting a big public list of SSNs wasn’t a great idea, so they started generating new numbers. They’re still formatted like SSNs though because the software and workflows were all designed to use SSNs. Technically the numbers they assign aren’t valid (they start with 9) but I can’t imagine they’re the only ones with SSN clones as ID numbers.

Anonymous Hero says:

No they’re not, but I assume that’s what Timmy was going for because the article was about the leakage of USA social security numbers. USA Social Security numbers are restricted to citizens of the USA.

That’s why I said "USA" phone numbers have 10 digits (formerly wrote "a 10th digit", hence the "a 10 digits" typo in the original post of mine).

TFG says:

Re: Re:

Which doesn’t make a difference in terms of filtering data. That the stolen data is sourced from inside the US is irrelevant when the place it was posted to deals with data from multiple countries – upload filters will either be easily circumvented if they are geographically restricted (upload from overseas IP Address) or will have a massive false positive rate (9 digit phone numbers).

It’s not just phone numbers. There are street addresses that wind up with nine digits in a row (44345 2500 street, par exemple). And if you decide to filter based on format alone (xxx-yy-zzzz) you set yourself to be easily bypassed by removing the dash format.

Additionally, a string of nine digits could easily crop up in code itself – and heaven forbid the code happens to deal with physical or mailing address formats. The lawsuits assertions are made from the perspective of someone who doesn’t know a goddamn thing about what they are stating should be easy.

Shufflepants (profile) says:

Re: Re: Re:

Additionally, a string of nine digits could easily crop up in code itself – and heaven forbid the code happens to deal with physical or mailing address formats. The lawsuits assertions are made from the perspective of someone who doesn’t know a goddamn thing about what they are stating should be easy.

Or hell, the git hub project could be some code to specially handle social security numbers and have mock social security numbers in some unit test class!

Anonymous Coward says:

afaik – the social security number was not to be used for identification purposes other than that of the Social Security system.

Perhaps the method(s) for obtaining credit, loans, etc should be made more secure. If your business is the victim of fraud, I am in no way responsible for it just because some of my personal info was used.

Shufflepants (profile) says:

Fun fact, since social security numbers are only 9 digit numbers, there are only 999,999,999 possible numbers. With each numeral being a single byte and a billion bytes in a gigabyte, it’d be fairly trivial to produce a file that contains all valid and all unused social security numbers. Uncompressed, the file would only be ~1GB, and it would probably compress very well.

ECA (profile) says:

THIS is going to go one forever..

I mentioned before about the out breaks.
Every state, every nation, every one will want to create LAWS/RULES/REGULATIONS to control the net..

there is no Standard for what is/will be passed.
No group/consolidation of Anything, as each will be interpreted ANY WAY THEY WANT IT..

They will backdoor, and go around anything created Just to cause problems..

By the time we/they/it has settled, WE might as well be china/asia/middle east and restrict access from other nations as well as Cut ourselves off from others. And the Corps will love it, because THEN, every game will need to have locations in Every nation just tobe played/used/enjoyed.
Wow, what a way to control the game industry..

The internet. The Biggest experiment in total opinionation(new word??)..
Love the Thought(not fulfilled) that we are an open and non-opinionated nation. That we are the Dream of so many people…but we ACT as bad as the Worst nations.

Vidiot (profile) says:

Be wise... capitalize!

PACER needs an automated filter, too… to detect when Random Words in a Lawsuit have been Capitalized (see what I did there?), after which a red-inked rubber stamp marks the document "BULLSHIT". Surely, sharing Personal Information is a far greater crime than sharing personal information.

Of course, capitalization is the text equivalent OF SHOUTING, which brings to mind the immortal quotation from one Squidward Tentacles:

Squidward: People talk loud when they wanna act smart, right?

Plankton (shouts): CORRECT!

urza9814 (profile) says:

Lots of similar numbers...

When I went to college, the university (PSU) assigned us all a nine digit student number. The reason these were nine digit numbers is because they originally used SSNs, until a decade or two ago when they realized that was a bit of a security issue. All of their existing systems were designed around using SSNs though, so they created new numbers for everyone which used the same formatting so their existing workflows wouldn’t need to be modified.

I can’t imagine that they’re the only place which did something like that. So if you’re filtering nine digit numbers that look like SSNs, you’re probably going to get a lot of nine digit numbers that aren’t SSNs but are designed to look similar, which is going to cause a lot of additional problems…

Anonymity says:

I was compromised

So I get a lot of people’s Honor and loyalty to github but I just found out about my info bring compromised and the info comes just over a week or so after finding odd transactions being made on my account. They did what protocol they were supposed to n sent me a new card. In my inbox today is the official. Notice that my sensitive info had been compromised by the hack and the unauthorized transactions were a result of said hack attack. Thst being said I am quite pissed and feel violated from thus incident and quite frankly if there indeed was knowledge of it being uploaded and accessible from this site or any prior to the greater public being made aware then yes I do think all involved in not making sure these things were prevented should be held accountable.
Sorry but the fact that github is a subsidiary of Microsoft doesn’t make then untouchable or unaccountable, it in fact makes them more liable and responsible to ensure privacy information is protected. They are a company who’s main focal point in any of their marketed products is safety and data protection. Even if github is its own entity it should be structured under the same protection umbrella like the rest of Microsoft’s invested assets and interests.
Its pretty pathetic that the hack even happened but if you help a wanted murderer in any way shape or form with knowledge of their crimes you are considered an accessory to the crime and are held liable. This is no differant except githubs excuse is they didn’t know what types of information it was. Then they should have more stringent measures In place to sniff out such sensitive info. A snowden type algorithm of sorts. I don’t know but that’s a failure against their t.o.s. and their obligation to make sure their publicly accessed site isn’t a safehaven for such individuals and their criminal activity. If not then we might as well call it the darkgit and send it to the darkest region of the Internet.

TFG says:

Re: I was compromised

That Github is now owned by Microsoft is irrelevant. Github removed the content as soon as they were made aware of it.

You are asking for magic. You are telling people to Nerd Harder. It is not possible to do what is being asked of them, for multiple reasons already discussed in other comment. If you were compromised, be angry at the idiots who let your data be taken away as opposed to the guys who immediately took down the bulletin board with your information on it as soon as they knew it existed.

Add Your Comment

Your email address will not be published.

Have a Techdirt Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

Techdirt community members with Techdirt Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the Techdirt Insider Shop »

Follow Techdirt

Techdirt Daily Newsletter

Techdirt Deals
Techdirt Insider Discord
The latest chatter on the Techdirt Insider Discord channel...