Data Leakage

GitHub Hosted Details on More Than a Million Items of Leaked Data

By Andre Luiz R. Silva on

On July 19th 2019, the personal information of 100 million Americans and 6 million Canadians was leaked after Capital One’s system was hacked. The hacking took place on March 22nd, but then the situation got even worse: content regarding the hacking methods became available on the GitHub repository starting April 21st—and was there for almost three months before it was detected and removed.


How the leakage occurred

Capital One announced the data leak ten days after the breach was discovered, and was quite transparent regarding the approximately 106 million customers affected. The announcement stated that the leakage included the data of customers who had applied for credit cards.

The company revealed that within that leakage was information such as credit scores and limits, transaction and payment histories, as well as contact information. A lesser number of Social Security and bank account number leaks were also mentioned in the announcement.

The leakage was discovered after notification by a security researcher on July 17th. Two days later, the company found the breach and the actual dates of the unauthorized access, on March 22nd and 23rd

The lawsuit against GitHub

A new lawsuit now charges that GitHub, in addition to Capital One, shares the blame for this sad state of affairs. According to the suit, the credit card and social security numbers had been exposed since April 21st, and the repository did nothing. 

According to the document, GitHub “encourages hackers”; it alleges that this information should have been easily identified, as it contained a recognizable pattern of numbers. Both that platform and Capital One, however, state that only information regarding the hacker’s method of access to the financial system was hosted on the site. 

In any case, neither of the parties has denied that the “key” that could cause a lot more damage was, in fact, hosted on GitHub. It’s also undeniable that the data should have been identified more quickly and efficiently. But how—and by whom?


GitHub: one location for monitoring digital risks

Considering the size of GitHub’s community, it’s critical for companies who care about sensitive data—both their own and that of their clients—to have such data watched and monitored for credit card, credential and program code leaks.

GitHub is one of the five primary sites with the most sensitive data exposed, monitored and removed by Axur. And it’s not for naught: therein exist countless mentions of brands and companies (some of them, in fact, appearing in official GitHub profiles). So it’s not at all rare to find, in the midst of so many codes and texts, infringements that can cause financial damage. The same is true of many other sites on the Internet that are characterized by collaboration between users.


What are the appropriate actions to take in a case like this?

Capital One’s data leakage brings up two primary problems: the lack of adequate legislation regarding data protection in the US, and the lack of adequate monitoring and response to cybercriminal activities.

Data protection legislation

It’s shocking how recurrent leakages as significant as that of Capital One are. It’s also evident that new laws such as Europe’s General Data Protection Regulation (GDPR), and so many others taking effect worldwide, are holding companies responsible for their clients’ data.

Other than California’s Consumer Privacy Act of 2018 (yet to be implemented), such legislation is still nonexistent in the US. According to specialists, the lack of appropriate action in response to the equally gigantic Equifax leak in 2017 demonstrates that this type of problem tends to recur. And companies must not simply wait for a regulation before they take responsibility for their clients’ data.

Effective monitoring and response

The use of appropriate tools for detecting code and other types of leakage on a platform such as GitHub is indispensable for companies that want to be proactive in identifying and removing these threats. Regardless of legislation, preventing a breach from being exposed for months demonstrates real concern for your customer’s journey in the online environment.

At Axur, our robots are at work 24/7 to prevent any kind of wrongful exposure of a brand’s data. With our solution for Programmming Code, you can see and request removal of all trademark infringements and data leaks in places like GitHub, using Axur One (a simple and intuitive platform). It will save you from future headaches.



Eduardo Schultze, Coordenador do CSIRT da Axur, formado em Segurança da Informação pela UNISINOS – Universidade do Vale do Rio dos Sinos. Trabalha desde 2010 com fraudes envolvendo o mercado brasileiro, principalmente Phishing e Malware


Andre Luiz R. Silva

A journalist working as Content Creator at Axur, in charge of Deep Space and press activities. I have also analyzed lots of data and frauds here as a Brand Protection team member. Summing up: working with technology, information and knowledge together is one of my biggest passions!