Git: scan repositories for secrets using Gitleaks

By | 08/16/2021
 

A confidential data leak such as RDS keys or passwords to a Git repository, even if it is a private Github repository, is a very bad thing and it’s good to check your repositories to know if any developer pushed a commit with such data.

Scanning utilities

To check Git repositories for a leak, at first glance there are a lot of utilities:

  • Gittyleaks – looks interesting, but the last update 2 years ago
  • Repo Supervisor – has a WebUI, uses AWS Lambda, fully integrated with Github, can be checked later
  • Truffle Hog – CLI only, looks not bad
  • Git Hound – a plugin for  git, can perform scan before commits only, but not remote repositories
  • Gitrob – the last update was three years ago
  • Watchtower – looks interesting, even has a WebUI, but they didn’t even post about their pricing on the website, so out of the race
  • GitGuardian – a really good solution, but overpriced
  • gitleaks – CLI only, the one, we will use in this post

So, from the list above it’s worth trying Truffle Hog and gitleaks, but I didn’t like the Truffle Hog documentation.

Repo Supervisor looks promising too, will check it in the following post.

From those two:

  • Gitleaks: just a scanner – gave an URL of a repository, and it will generate a JSON report with findings
  • Repo Supervisor: can be used in two ways:
    • just to scan a local directory
    • scan a remote repository on PullRequest/push/etc

So, for the Gitleaks we can create a cronjob in Jenkins or Kubernetes that will take a list of repositories to be checked, and then will send a report a Slack channel.

Also, Gitleaks can be used with Github Actions, see more here>>>, but not all our developer teams use Actions. Another way can be pre-commit hooks.

Planning

So, for now, let’s try a solution with Jenkins, although there are various ways to run it:

  • trigger a job with the GitHub Pull Request Builder
  • trigger a job with the через GitHub hook trigger for GITScm polling или Poll SCM
  • run just as a crontask

At first, we will create a simple job running by schedule, and then will check for other solutions.

What do we have in our project:

  • around 200 Github repositories
  • around 10 developers teams – backend, frontend, analytics, iOS, and Android mobile applications, gaming, devops.

What we can do with Gitleaks:

  • create a Jenkins job for every team
  • the job will accept a parameter with a list of repositories of the team
  • will create a dedicated Slack channel for every team
  • once a day will run scanning and will send reports to a corresponding Slack channel

At first, let’s run Gitleaks manually to see how it’s working, and then will do an automation job.

Gitleaks – manual run

Install it. On Arch Linux, can be installed from AUR:

[simterm]

$ yay -S gitleaks

[/simterm]

Github token

Next, need to create a token to access a Github organization’s repositories.

Go to your Github user’s setting, create a token:

Give it repo permissions:

 

And run Gitleaks with the token, a repository’s URL, add --verbose, save results to a file:

[simterm]

$ gitleaks --access-token=ghp_C6h***3z5 --repo-url=https://github.com/example/BetterBI --verbose --report=analytics-repo.json
...
INFO[0036] scan time: 32 seconds 756 milliseconds 672 microseconds 
INFO[0036] commits scanned: 1893                        
WARN[0036] leaks found: 111

[/simterm]

Check the report:

[simterm]

$ less analytics-repo.json

[/simterm]

And an example from the findings:

...
 {
  "line": "  \"private_key\": \"-----BEGIN PRIVATE KEY-----\\nMIIEvQIBADA***CCaM=\\n-----END PRIVATE KEY-----\\n\",",
  "lineNumber": 5,
  "offender": "-----BEGIN PRIVATE KEY-----",
  "offenderEntropy": -1,
  "commit": "0f047f0cca3994b3465821ef133dbd3c8b55ee7a",
  "repo": "BetterBI",
  "repoURL": "https://github.com/example/BetterBI",
  "leakURL": "https://github.com/example/BetterBI/blob/0f047f0cca3994b3465821ef133dbd3c8b55ee7a/adslib/roas_automation/example-service-account.json#L5",
  "rule": "Asymmetric Private Key",
  "commitMessage": "DT-657 update script for subs and add test\n\nDT-657 create test for check new json (add new subs)",
  "author": "username",
  "email": "[email protected]",
  "file": "adslib/roas_automation/example-service-account.json",
  "date": "2021-05-11T19:46:46+03:00",
  "tags": "key, AsymmetricPrivateKey"
 },
...

Here:

  • line: what exactly was found
  • offender: a rule, triggered for this finding
  • commit: a commit ID with the secret

Finding are performed with regex expressions, described in the default.go.

Also, you can create your own configuration file and pass it to Gitleaks.

For example, the private RSA key above was found by the Asymmetric Private Key rule:

[[rules]]
    description = "Asymmetric Private Key"
    regex = '''-----BEGIN ((EC|PGP|DSA|RSA|OPENSSH) )?PRIVATE KEY( BLOCK)?-----'''
    tags = ["key", "AsymmetricPrivateKey"]

So, we can create a dedicated config file for each team or repository and pass them via Kubernetes ConfigMap or as a file in a Jenkins job.

Jenkins job

Now, when we’ve seen how Gitleaks can be started, let’s add a Jenkins job to run it periodically.

Pipeline script

So, for each team, we will create a dedicated Jenkins job that will have a parameter with a repositories list of the team.

Loops in Groovy

Some time ago I did a similar solution using Golang, check the Go: checking public repositories list in Github. Go slices comparison post for details, and there it was a bit simpler to run a loop over the list. With Groovy, had to google a bit.

Create a new Jenkins job, set its type to the Pipeline:

In the job’s setting, create a string parameter with a list of the team’s repositories, here is only two used:

Next, go to the Jenkins script.

Set a variable named $repos_list, that will accept an environment variable $TEAM_REPOS, and then by using the split() method divide the lists’ objects.

Then, by using the for loop integrate over them:

node('master') {

  def repos_list = "${env.TEAM_REPOS}".split(',')

  for (repo in repos_list) {
    println repo
  }

}

Run the job:

Jenkins Docker plugin

Our default approach to run Jenkins builds is by using a Docker container to keep the hosts’ system clean.

Add another parameter with the Password type, save Github token here:

By using the Jenkins Docker Plugin create a Docker container with Gitleaks, pass the token, URL, and report’s file. Pay attention, that the report’s file will contain a repository’s name:

node('master') {
      
  def repos_list = "${env.TEAM_REPOS}".split(',')

  for (repo in repos_list) {
    stage("Repository ${repo}") {
      docker.image('zricethezav/gitleaks').inside('--entrypoint=""') {
        sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json"
      }
    }
  }
}

Here, for every repository name from the repos_list list, we will create a dedicated Jenkins Pipeline Stage that also will use a repository’s name.

Run and check:

Um… And here is an issue: scan will be stopped right after the first findings in the first scanned repo, as Geatleaks found a leak, returned the exit 1 code, and the job was immediately stopped:

Ignoring errors in a Jenkins stage{}

To solve it, we can use the  try/catch solution: each stage will be running in its try, in case of errors, we will catch them with catch, and will proceed with the build:

node('master') {
  
  def repos_list = "${env.TEAM_REPOS}".split(',')
  def build_ok = true
  
  for (repo in repos_list) {
    try {
      stage("Repository ${repo}") {
        docker.image('zricethezav/gitleaks').inside('--entrypoint=""') {
          sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json"
        }
      }
    } catch(e) {
        currentBuild.result = 'FAILURE'
    }
  }
}

Run it:

Good – now all stages are running despite a previous stage result.

Slack notifications from Jenkins

The next step for us is to configure sending alarms to a Slack workspace.

Let’s use the Slack Notification plugin for this. See its documentation here>>>.

Create a Slack Bot

Go to the Slack Apps, create a new application:

Go to the Permissions:

Add the following:

  • files:write
  • chat:write

Go to the OAuth & Permissions, install the bot to the Slack workspace:

Save the token:

Jenkins credentials

Add the token to the Jenkins – go to the Manage Jenkins > Manage Credentials:

Add a new one:

Set its type to the Secret file:

In the Slack workspace, create a new channel:

Invite the bot to the channel:

Add a new function to the Jenkins script – notifySlack(), and run it from the catch{} to send alarms if any secrets were found during the scan:

def notifySlack(String buildStatus = 'STARTED') {

    // Build status of null means success.
    buildStatus = buildStatus ?: 'SUCCESS'

    def color
    //change for another slack chanel
    def token = 'gitleaks-slack-bot'

    if (buildStatus == 'STARTED') {
        color = '#D4DADF'
    } else if (buildStatus == 'SUCCESS') {
        color = '#BDFFC3'
    } else if (buildStatus == 'UNSTABLE') {
        color = '#FFFE89'
    } else {
        color = '#FF9FA1'
    }

    def msg = "${buildStatus}: `${env.JOB_NAME}` #${env.BUILD_NUMBER}:\n${env.BUILD_URL}"
    slackSend(color: color, message: msg, tokenCredentialId: token, channel: "#devops-alarms-gitleaks-analytics")
}


node('master') {
  
  def repos_list = "${env.TEAM_REPOS}".split(',')
  
  for (repo in repos_list) {
    try {
      stage("Repository ${repo}") {
        docker.image('zricethezav/gitleaks').inside('--entrypoint=""') {
          sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json"
        }
      }   
    } catch(e) {
        currentBuild.result = 'FAILURE'
        notifySlack(currentBuild.result)
    } 
  }     
}

jenkins.plugins.slack.StandardSlackService postToSlack Response Code: 404

Run the build, and get the following error:

12:42:39 ERROR: Slack notification failed. See Jenkins logs for details.

Check Jenkin’s logs on the https://<JENKINS_URL>/log/all:

Go to the Manage Jenkins > Configure System, find Slack plugin’s options, set the Custom slack app bot user:

The credentials here are default, we are overriding them from the pipeline.

In the Advanced remove the Override URL, if it was set:

Run again and now everything is working:

File upload to Slack

Now, let’s add a report file upload with findings to the message in the Slack channel by using the slackUploadFile() function:

def notifySlack(String buildStatus = 'STARTED', reportFile) {

    // Build status of null means success.
    buildStatus = buildStatus ?: 'SUCCESS'

    def color
    //change for another slack chanel
    def token = 'gitleaks-slack-bot'

    if (buildStatus == 'STARTED') {
        color = '#D4DADF'
    } else if (buildStatus == 'SUCCESS') {
        color = '#BDFFC3'
    } else if (buildStatus == 'UNSTABLE') {
        color = '#FFFE89'
    } else {
        color = '#FF9FA1'
    }

    def msg = "${buildStatus}: `${env.JOB_NAME}` #${env.BUILD_NUMBER}:\n${env.BUILD_URL}"
    slackSend(color: color, message: msg, tokenCredentialId: token, channel: "#devops-alarms-gitleaks-analytics")
    slackUploadFile(credentialId: token, channel: "#devops-alarms-gitleaks-analytics", filePath: "${reportFile}")
}


node('master') {
  
  def repos_list = "${env.TEAM_REPOS}".split(',')
  
  for (repo in repos_list) {
    try {
      stage("Repository ${repo}") {
        docker.image('zricethezav/gitleaks').inside('--entrypoint=""') {
          sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json"
        }
      }   
    } catch(e) {
        currentBuild.result = 'FAILURE'
        notifySlack(currentBuild.result, "analytics-${repo}-repo.json")
    } 
  }     
}

Channel here can be moved to the job’s parameter later.

Here, in the notifySlack() we’ve added another parameter  – the reportFile, and then during the notifySlack() function call, we are passing the report’s file as a second argument to the function.

Run the job, check the Slack channel:

And the final thing is to set a schedule to run the job:

Gitleaks configuration

Commits to check

At this moment, Gitleasks will perform a full scan of the repository – all commits, all history.

If we will run it every day, then each day we will get messages about some old problematic commits.

As a way to mitigate it, we can create two jobs: in the first job, we will do a full scan, and in the second one will perform a kind of incremental scan for changes, made during the last 24 hours.

I.e. the “incremental” job will be run daily at 12:00 pm when all developers are in the office, and the job will check commits for the last day only.

To do so, Gitleaks has the --commit-since option. Let’s add a new variable called yesterday with yesterday’s date taken by the previous() method of the Date() class, and then this date will be passed to the --commit-since:

...

node('master') { 
  
  def repos_list = "${env.TEAM_REPOS}".split(',')
  def yesterday = new Date().format( 'yyyy-MM-dd' ).previous()

  println yesterday
  
  for (repo in repos_list) {
    try {
      stage("Repository ${repo}") {
        docker.image('zricethezav/gitleaks').inside('--entrypoint=""') {
          sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json --commit-since=${yesterday}"
        }
      }
    } catch(e) {
        currentBuild.result = 'FAILURE'
        notifySlack(currentBuild.result, "analytics-${repo}-repo.json")
    }
  }
}

Gitleaks configuration file

Another thing is to create a dedicated rules file for the Gitleaks.

This can be done with the --repo-config-path, and in each repository, we can add its own configuration file.

Add some default rules there, plus I’d like to check for passwords passed as plaintext to commits:

...

[[rules]]
    description = "Plaintext password"
    regex = '''(?i)pass*[a-z]{5}[:|=]? +["|'](.*)["|']'''
    tags = ["password", "PlainTextPassword"]

[allowlist]
    description = "Allowlisted files"
    files = ['''^\.?gitleaks.config$''']

With the (?i)pass*[a-z]{5}[:|=]? +["|'](.*)["|'] regular expression, we are looking for a string started with pass, then for a “:” or “=” symbol, then it can contain or not a space, then a quote mark, then any text, and a quote mark again.

Seems must be working:

Save it to the repository as .github/gitleaks.config, and in the job add another parameter by using this file:

...
        docker.image('zricethezav/gitleaks').inside('--entrypoint=""') {
          sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json --commit-since=${yesterday} --repo-config-path=.github/gitleaks.config"
        }
...

That’s all for now.