Your Private GitHub Repos Aren't as Private as You Think
Summary
TLDRThis script reveals a significant GitHub vulnerability where private and deleted repository data remains accessible indefinitely through forks. Even after deletion, commits, including sensitive information like API keys, are retrievable with the commit hash. GitHub acknowledges this as an intentional feature, not a bug, which contradicts user expectations of privacy and data destruction upon deletion. The script urges GitHub to reconsider this design to align with security best practices and user trust.
Takeaways
- π GitHub's design allows anyone to access data from deleted forks, deleted repositories, and even private repositories, forever.
- π Users may mistakenly believe that deleting a repository removes all associated data from public access, but this is not the case on GitHub.
- π The vulnerability stems from GitHub's repository network structure, where forks retain access to commits from the original repository, even if deleted.
- π Commit hashes, even partial ones, can be used to access deleted or private data, posing a risk to sensitive information.
- π« GitHub's response to the issue confirms it as an intentional design feature, not a bug, and it's documented in their security considerations.
- π€ The average user perceives the separation of private and public repositories as a security boundary, which this feature contradicts.
- π‘ The GitHub archive website stores every event on GitHub, including commit hashes, making it possible to find and access deleted or private data.
- π Deleting a repository does not securely delete the data; it remains accessible through any existing forks.
- π Changing the visibility of a repository from private to public results in two separate repository networks, making past private commits in forks visible to the public.
- π For open-source projects, any code committed before the project was made public remains accessible, even in a private fork.
- π Rotating API keys or sensitive information is crucial if accidentally committed to a repository, as simple deletion is not enough to secure the data.
Q & A
What is the main concern raised in the blog post mentioned in the script?
-The main concern is that anyone can access data from deleted forks, deleted repositories, and even private repositories on GitHub, and this data remains accessible forever, which is an intentional design feature of GitHub, not a bug.
How can one access data from a deleted fork on GitHub?
-One can access data from a deleted fork by using the commit hash. Even if the fork is deleted, the commit is still accessible as long as the commit hash is known.
What is the vulnerability with accessing deleted fork data?
-The vulnerability lies in the fact that even after a fork is deleted, the code with potentially sensitive information remains accessible using the commit hash, contrary to what one might expect after deletion.
Why is the commit hash important in accessing deleted data?
-The commit hash is crucial because it uniquely identifies a commit. Knowing even a part of the commit hash can allow access to the commit's data, even if the repository or fork has been deleted.
How secure is the commit hash against brute force attacks?
-The commit hash is not entirely secure against brute force attacks. A minimum of six characters is required to access the commit, which is a large number but not large enough to be considered safe against brute force.
What is GitHub Archive and how does it relate to the vulnerability?
-GitHub Archive is a website that archives every event that happens on GitHub, including commits. This means that commit hashes for almost every commit on every repository that was once public are available, potentially exposing private information.
What happens when a public upstream repository that has been forked is deleted?
-When a public upstream repository is deleted, instead of deleting the whole tree, GitHub reassigns the root node to one of the downstream forks, making all commits from the upstream repository still accessible via any fork.
Why does GitHub keep the data accessible even after deletion?
-GitHub's design decision is based on the principle of open code collaboration, where visibility is intended to be public. This design allows forks to remain accessible even if the upstream repository is deleted.
What is the implication of this feature for users who open source a tool on GitHub?
-The implication is that any code committed to a private fork before making the upstream repository public is also accessible to the public, as the commits are part of the same repository network.
What does GitHub's response to the vulnerability reveal about their stance on this feature?
-GitHub's response indicates that this is an intentional design decision and is working as expected. They do not have any immediate plans to change this functionality, as it is documented in their security considerations.
What are the takeaways for users regarding the security of their repositories on GitHub?
-The main takeaway is that any commit to a repository network, including the upstream repo or downstream forks, will exist forever and cannot be deleted or hidden. Users must be cautious about committing sensitive information and consider rotating any accidentally committed private keys immediately.
Outlines
π GitHub's Persistent Data Access Issue
This paragraph discusses a significant vulnerability in GitHub that allows anyone to access data from deleted forks, deleted repositories, and even private repositories indefinitely. The author demonstrates how deleted fork data can still be accessed by using commit hashes, which, although difficult to guess, are not entirely secure against brute force attacks. The GitHub Archive is mentioned as a place where commit hashes are stored, potentially exposing sensitive information that was mistakenly committed and thought to be deleted.
π The Risks of Deleted and Private Repository Data
The second paragraph delves into the consequences of GitHub's design where data from deleted forks and repositories can still be accessed if at least one fork remains. It explains how users have unknowingly exposed API keys by hardcoding them into example files within forks, which they later delete, under the false assumption of privacy. The paragraph also covers how private commits made to a repository before it becomes public are accessible through the public repository network, highlighting a common yet risky workflow that many developers might not be aware of.
π¨ GitHub's Response to Data Accessibility Concerns
In the final paragraph, the author shares GitHub's response to the reported vulnerability, stating that the persistent data access is an intentional design feature, not a bug. The author argues that while GitHub's documentation mentions this functionality, most users are unaware of it and expect a higher level of privacy. The paragraph concludes with a call for GitHub to reconsider its security model, given the potential for misuse and the discrepancy between user expectations and actual privacy protections.
Mindmap
Keywords
π‘GitHub
π‘Repository
π‘Fork
π‘Commit
π‘API Key
π‘Vulnerability
π‘Brute Force Attack
π‘GitHub Archive
π‘Repository Network
π‘Security Model
π‘Deletion
Highlights
GitHub repositories, including deleted and private ones, can be accessed by anyone through deleted Forks.
This data access is an intentional feature of GitHub, not a bug.
Deleted Fork data remains accessible forever, contrary to what users might expect.
A vulnerability exists where private information in commits can be accessed even after deletion.
Users mistakenly believe that deleting a repository removes all associated data from public access.
Accessing deleted Fork data requires knowing the commit hash, which can be partially guessed.
GitHub archive exposes commit hashes, making it possible to access private information.
40 valid API keys were discovered from deleted Forks, indicating a real-world impact of this vulnerability.
Even if a public repository is deleted, its data remains accessible if at least one Fork exists.
GitHub's repository network structure means that deleting an Upstream repo does not delete its history.
A major tech company's private key was exposed due to misunderstanding GitHub's deletion and privacy features.
Any code committed to a public repository may be permanently accessible if there is at least one Fork.
Private repos that are Forked and then made public have their commits visible to everyone.
GitHub acknowledges this feature in their documentation, but many users are unaware.
The security implications of GitHub's design are significant and may require a change in user expectations.
GitHub's response to the findings indicates that they do not plan to change this feature in the near future.
The blog post suggests that GitHub should revisit its security model to better align with user expectations.
Transcripts
if you thought your private GitHub
repositories were safe from prying eyes
think again this blog post caught my
attention today and I'm kind of
surprised that no one's talking about it
because this seems like a big deal
anyone can access deleted and private
repository data on GitHub specifically
you can access data from deleted Forks
deleted repositories and even private
repositories on GitHub and it's
available Forever This is known by
GitHub and intentionally designed that
way that's right this is a feature not a
bug so what's the vulnerability here how
can you access this data here's how the
vulnerability Works accessing deleted
Fork data so let's say you have a fork
of a public repository you then commit
code to your fork and you delete your
fork so it would look something like
this you would create the fork commit
something let's say that code has some
private information that you don't want
people seeing for example you might have
accidentally put an API key or a
password or something like that in there
and then you delete it now a reasonable
person would say that okay this
repository has been deleted this
information should no longer be
available so it's fine no big deal
however that is a wrong assumption the
code is actually still accessible even
though it shouldn't be right you deleted
it but it is and it's accessible forever
out of your control there is absolutely
nothing that you can do to remove that
data dat from the public record here's
how you would do that so I'm going to go
to GitHub uh let's find some repository
YT DLP that looks good let me just
create a fork of
that let's edit this read me
file
secret commit the
changes
oops and now let's say I have this
information here this secret that I
don't want to be
public let me just copy that URL and
let's say okay I noticed that I
accidentally committed some information
that I shouldn't have so I'm just going
to delete the repository thinking that
that will remove this information from
the public record so I'll go to
settings delete this repository I want
to delete this repository
yes okay now it's gone that information
should no longer be publicly available
right well maybe let's go back to the YT
DLP
repository and if I paste in this URL
that I had before which contains the
commit hash let's see what we get so
notice that here it says this commit
does not belong to any branch on this
repository and may belong to a fork
outside the repository oh look here is
that commit that I made where I deleted
all of this information from the readme
and added a secret so this information
that I thought would no longer be
accessible actually is hm interesting
notice however that we needed to
actually know the commit ID to do this
and the commit ID is a pretty long hash
so trying to brute force that is
actually quite difficult so you might
think okay well at least there's some
kind of safety there but you don't need
the full hash let's see how far we can
get let's delete all that yep still
accessible let's go back one more let's
go back two more characters yep still
accessible we delete the B4 okay then
it's not found B still not found okay
looks like in this case the minimum that
we need is six characters which isn't
enough to be safe against a Brute Force
attack like that is not a huge number I
think each one of these is 16 since it's
a heximal and so 16 to^ of
6 it's a large number 16 million but
it's not large enough that I would
consider this to be secure so anyone
that knows the commit hashes and as I
mentioned earlier even the short commit
hashes which can possibly be brute
forced then they'd be able to access all
that private information the minimum
number of characters that get requires
for a short commit hash is actually Four
so instead of 16 to the 6 the minimum is
actually 16 to the 4 which is
65,536 very much in the realm of being
brute forcible and in fact the commit
hash is actually discoverable remember
how I mentioned that you need the commit
hash to access that private information
well here's a place where you can find
it GitHub archive this is a website that
basically archives every single event
that happens on GitHub there are 15 plus
event types which I won't go into the
details of but basically this is a
massive store of information about every
single thing that happens on GitHub
which includes commits this means that
the hashes for just about every commit
on every repository that was at once
public are available on this
website and how often can we find data
from deleted Forks well the person that
wrote this blog post Joe Leon from
truffle security company found 40 valid
API keys from deleted Forks in which
apparently users did something like this
first they forked the repo then they
hardcoded an API key in into an example
file then they did some changes and then
they deleted the fork like this this is
something that a new user might want to
do they'd see a placeholder in some
example file showing how to use the
program that's in a certain repository
and they'll just change the example file
to contain the API key that seems like a
reasonable way to do things as a new
user especially if you know that you're
going to delete the fork later
unfortunately this vulnerability shows
that no you absolutely should not do
that because even if if you delete the
fork you cannot trust GitHub to actually
securely delete it however that's not
the only vulnerability here it gets
worse accessing deleted repo data so
consider this situation you have a
public repository on GitHub some user
Forks your repo you commit data after
they Fork it and they never sync their
fork with your updates and you delete
the entire repo in this case the code
that you committed after they forked is
still accessible so as long as at least
one fork exists then that information
will be publicly accessible forever so I
mentioned earlier that this is a feature
instead of a bug and why is that well
let's go into the details so GitHub
stores repositories and forks in a kind
of repository network with the original
Upstream repository acting as the root
node it's like a tree the way that git
itself is a tree right you have an
initial commit and then you have
branches on top of that and you have a
whole history that comes back to this
route however in this GitHub repository
Network when a public Upstream
repository that has been forked is
deleted instead of just deleting the
whole tree because well GitHub you know
probably shouldn't do that you wouldn't
really want your fork to be deleted if
the Upstream goes away the way that
GitHub solves this issue is by
reassigning the root node to one of the
downstream Forks however notice what
happens here all of the commits from the
Upstream repository still exist and are
accessible via any fork and according to
the author this isn't some hypothetical
scenario and apparently this just
happened last week quote I submitted a
P1 vulnerability to a major tech company
showing that they accidentally committed
a private key for an employees GitHub
account that had significant access to
their entire GitHub organization so
obviously that is a pretty big security
vulnerability the company should first
of all get rid of that API key and
second of all probably remove that from
the history if possible well what they
did is they immediately you deleted the
repository but since it had been forked
you could still access the commit
containing sens data via a fork despite
the fork never sinking with the original
Upstream repository which is very scary
that seems like a huge violation of the
trust that users have in GitHub the
implication here is that any code
committed to any public repository may
be accessible forever as long as there
is at least one fork of that repository
so even if that fork doesn't have some
commits that are on your Upstream
version or on your private version or
anywhere as long as one public Fork
exists every commit in that repository
network is public forever but it gets
worse accessing private repo data okay
so consider this common workflow for
open sourcing a new tool on GitHub so
step one is you create a private repo
that will eventually be made public you
know you might not want to create it
publicly right off the bat because it's
still in a very early State and maybe it
just doesn't make sense to have people
looking into it and you know you're just
not ready to manage the community yet
perfectly reasonable afterwards you
create a private internal version of the
repo via forking and commit additional
code for features you're not going to
make public again that makes sense let's
say that this is something that you're
trying to make money from well that
seems like a reasonable way to do it you
might have a public version that is
fully open source that has all of its
code accessible and then you add some
kind of Enterprise features that you
want to charge money for in a private
fork okay seems reasonable and step
three you make your Upstream repository
public public and keep your fork private
this seems like a fairly common workflow
and you might think that the private
features that you added to your private
Fork are inaccessible to the public but
guess what they are 100% viewable by
anyone any code committed between the
time you created an internal Fork of
your tool and when you open source the
tool those commits are accessible on the
public repository so just to clarify any
commits you made to the private Fork
after you make the Upstream repository
public or are not viewable and the
reason for that is because changing the
visibility of a private Upstream
repository results in two repository
networks one for the private version and
one for the public version so looking at
these graph these commits that are on
this private Fork of the tool are public
and again here's a demo video I'm not
going to play it if you want to see the
details check out the link it'll be in
the description and this is a fairly
common workflow right like creating
something private and then creating a
private Fork of it and then making the
original public that seems like a
totally reasonable thing that many
people would do and would assume that
everything that is in the private Fork
stays private and everything that's in
the public Upstream version is public
but no apparently with GitHub that's not
the case so what does GitHub have to say
about this well the author submitted
their findings to GitHub via their bug
program and here's the response thanks
for the submission this is an
intentional design decision and is
working as expected as noted in our
documentation we make make this
functionality more strict in the future
but don't have anything to announce
right now so it's a feature not a bug
and it's pretty clear actually in their
documentation that it was designed to
work this way under important security
considerations you can see commits to
any repository in a fork Network can be
accessed from any repository in the same
Fork Network including the Upstream
repository and under the section
changing a private repository to a
public repository they say when you
change a private repository to public
all the commits in that repository
including any commits made in the
repositories that it was forked into
will be visible to everyone so there we
go it's in the docs GitHub says it's a
feature and that's how it's intended to
work however I don't think users really
understand that or at least most users
don't at least to me I haven't looked
into these pages in detail so you know
what maybe that's entirely my fault but
I think most users would assume that if
you delete something that information
will no longer be available and then
information in a private Fork will
always remain private so perhaps it's
time for GitHub to revisit this and the
author of the blog post agrees with me
the average user views the separation of
private and public repositories as a
security boundary and understandably
believes that any data located in a
private repository cannot be accessed by
public users unfortunately as we
documented above that is not always true
what's more the action of deletion
implies the destruction of data however
as we saw deleting a repository or Fork
does not mean that your commit data is
actually deleted now before any of you
FSF Shield jump in and say oh well
clearly this is Microsoft and
ifying the product and well if they
hadn't acquired GitHub this wouldn't
have happened and to that I say no I
don't think so GitHub was designed this
way from the start where it was supposed
to be a place for open code
collaboration where everything is
visible publically and all of this
information is available to anyone that
wants to see it so I would actually say
that if anything Microsoft didn't Focus
hard enough on what Enterprises want and
that is privacy and the ability to hide
information from people I think if they
focused a bit harder on that this issue
actually wouldn't have happened because
this functionality would have been
removed or changed earlier I'm not
against open source but I'm just saying
in this case it does seem like they're
taking open architecture a bit too far
so what are the takeaways from this well
the main one is as long as a fork exists
any commit to that Repository Network
which includes commits on the Upstream
repo or Downstream forks will exist
forever you cannot delete them you
cannot hide them they will always be
publicly accessible this means that
simply deleting a commit that
accidentally added some private
information is not enough if a private
API key was committed for example you
must immediately rotate that API key if
some private information such as maybe a
person's name and social security number
were committed too bad that information
will always be publicly accessible
forever the second takeaway is that
perhaps GitHub should change this now
GitHub has a reputation for being very
good about security right they put a lot
of work into making sure that their
products are very secure that things
that are private are intended to be
private right they have a very Advanced
security program they have you know all
these certifications by all these
standards and basically they've put a
lot of work into making sure that all of
the information that is intended to be
private is private so perhaps it's time
to change the way that these repository
networks work because security is only
as good as the users and if users don't
understand the security model and
clearly they don't then perhaps it's
time to change the security model but
what do you think did you know that
GitHub works this way cuz I sure as heck
didn't let me know in the comments
Browse More Related Video
Microsoft Cloud App Security: Protecting GitHub
Github Can Never Support Kernel Development
Setup Codebase Gitpod AWS CLI
HTTP | Mastering React: An In-Depth Zero to Hero Video Series
Stop Using Eufy Security Cameras NOW! (+ Anker & Soundcore)
How to deploy your websites to Cloudflare's Pages Platform for free
5.0 / 5 (0 votes)