Downloading an Application's Entire Source Code Through an Exposed GIT Directory

Downloading an Application's Entire Source Code Through an Exposed GIT Directory

Posted on February 19, 2016 by Roberto Salgado

It is very common during a penetration test on a web application to use automated tools such as dirbuster to find sensitive files and directories. Dirbuster uses a dictionary based approach to look for files and directories which are "hidden", or not linked anywhere on the site.


Dirbuster is included in Kali, but can also be downloaded from http://dirb.sourceforge.net/.


To run the tool, just provide the URL to the website you would like to test:

 

# dirb https://www.example.com

 

Running this tool on our target website managed to find the folder ".git", which is used by the source code management system that goes by the same name "GIT". The folder has been exposed to the Internet, possibly because it was forgotten about or the web admin never expected anyone to find it.

For those who are unfamiliar with Git, essentially it is a popular source code management system which allows developers to keep track of any changes done to the files in the repository. Because Git is used to manage source code, the ".git" folder contain a copy of the application's source-code.

In our case, our target server was also misconfigured to allow for directory listings, making our job much easier.

With the directory listings flaw and the tool "wget", it is pretty straightforward to recursively download every file from the repository.


# wget –r https://www.example.com/.git

 

While wget is running, something interesting can be observed. The repository contains ".pack" files of several megabytes in size. Pack files, in simple terms, are data structures which contain hashed versions of the source-code with references, indexes and other meta data. More information about pack files can be found at https://schacon.github.io/gitbook/7_the_packfile.html.

 

Once wget has finished downloading the folder, we are left will the following:

 

If we try to run a Git command under this folder, for example "git status", it will return an error indicating that certain files contain an incorrect filename. This is because wget also downloaded all the HTML index files (e.g. index.html?C=D, index.html?=C=M) for each folder and their sub-folders.

 

To use Git normally, it is necessary to eliminate these extra HTML files that don't belong. We can do so recursively with the "find" command and then try the git command again.

# find .git -type f -name 'index.htm*' -delete
# git status

 

The Git repository is functional again, but because our "project" is empty, git will list all the files as if they had been deleted. However, we can recover all the "deleted" files using the following command:

# git checkout -- .

 

And automatically all the missing files for the application are added back again.

 

With the full copy of the application's source-code, the possibilities are endless. We can look for the database credentials, do static-analysis on the code, find hidden debugging parameters or vulnerabilities, and much more. We have basically converted a blackbox penetration test into a whitebox one.

 

Impact

• Proprietary and sensitive information such as the source-code can be obtained.
• The attack surface increases drastically since we know the entire file and folder structure of the web application.
• By analyzing the source-code, it is easier to find vulnerabilities (SQLi, file uploaders, RCE, etc...), even ones that may have not been possible to find otherwise.

 

Remediation

The most direct solution is simply not to have the ".git" folder in a folder that is exposed to the Internet. However, if it is necessary to have it there, then there are other solutions available:

• Disable directory listing on the server.
• Configure mod_rewrite to disallow access to the ".git" directory (RewriteRule "^(.*/)?\.git/" - [F,L]).
• Assign a different user and strict privileges to the ".git" directory which can't be accessed by the user which the webserver is running under.

 

This post was originally written by Lenin Alevsk and translated by Roberto Salgado. The originally post in Spanish can be found here.


Latest Blog Entries

Belkin Wemo Switch NMap Scripts
Belkin Wemo Switch Smart Plug is a network controlled power outlet. The current firmware version does not requiere authentication to switch the power ON or OFF or to gather information such as nearby wireless networks. Two NMap scripts have been published

Downloading an Application's Entire Source Code Through an Exposed GIT Directory
Website administrators sometimes inadvertently leave an exposed .git directory, from which it is possible to download the entire source code of the web application using just wget and a common server misconfiguration.

credmap: The Credential Mapper
An overview of credmap, an open source penetration testing tool that automates the process of testing for credential reuse. It does so by testing supplied user credentials on known websites and verifies if the password has been reused on any of these.

Latest News

Blackhat EU 2015
Websec participated with two tools at the Blackhat, EU Arsenal held in Amsterdam, NL from the 10-13 of November, 2015. During this event, we introduced our brand new tool "credmap: The Credential Mapper" and also presented an amped-up version of Panoptic.

BSides Vancouver 2015
Websec is proud to announce that we will be attending the 3rd annual edition of BSides Vancouver, a local non-profit information security conference held in the heart of Vancouver, BC on March 16 and 17.