A confidential data leak such as RDS keys or passwords to a Git repository, even if it is a private Github repository, is a very bad thing and it’s good to check your repositories to know if any developer pushed a commit with such data.
Contents
Scanning utilities
To check Git repositories for a leak, at first glance there are a lot of utilities:
- Gittyleaks – looks interesting, but the last update 2 years ago
- Repo Supervisor – has a WebUI, uses AWS Lambda, fully integrated with Github, can be checked later
- Truffle Hog – CLI only, looks not bad
- Git Hound – a plugin for
git
, can perform scan before commits only, but not remote repositories - Gitrob – the last update was three years ago
- Watchtower – looks interesting, even has a WebUI, but they didn’t even post about their pricing on the website, so out of the race
- GitGuardian – a really good solution, but overpriced
- gitleaks – CLI only, the one, we will use in this post
So, from the list above it’s worth trying Truffle Hog and gitleaks, but I didn’t like the Truffle Hog documentation.
Repo Supervisor looks promising too, will check it in the following post.
From those two:
- Gitleaks: just a scanner – gave an URL of a repository, and it will generate a JSON report with findings
- Repo Supervisor: can be used in two ways:
- just to scan a local directory
- scan a remote repository on PullRequest/push/etc
So, for the Gitleaks we can create a cronjob in Jenkins or Kubernetes that will take a list of repositories to be checked, and then will send a report a Slack channel.
Also, Gitleaks can be used with Github Actions, see more here>>>, but not all our developer teams use Actions. Another way can be pre-commit hooks.
Planning
So, for now, let’s try a solution with Jenkins, although there are various ways to run it:
- trigger a job with the GitHub Pull Request Builder
- trigger a job with the через GitHub hook trigger for GITScm polling или Poll SCM
- run just as a crontask
At first, we will create a simple job running by schedule, and then will check for other solutions.
What do we have in our project:
- around 200 Github repositories
- around 10 developers teams – backend, frontend, analytics, iOS, and Android mobile applications, gaming, devops.
What we can do with Gitleaks:
- create a Jenkins job for every team
- the job will accept a parameter with a list of repositories of the team
- will create a dedicated Slack channel for every team
- once a day will run scanning and will send reports to a corresponding Slack channel
At first, let’s run Gitleaks manually to see how it’s working, and then will do an automation job.
Gitleaks – manual run
Install it. On Arch Linux, can be installed from AUR:
[simterm]
$ yay -S gitleaks
[/simterm]
Github token
Next, need to create a token to access a Github organization’s repositories.
Go to your Github user’s setting, create a token:
Give it repo
permissions:
And run Gitleaks with the token, a repository’s URL, add --verbose
, save results to a file:
[simterm]
$ gitleaks --access-token=ghp_C6h***3z5 --repo-url=https://github.com/example/BetterBI --verbose --report=analytics-repo.json ... INFO[0036] scan time: 32 seconds 756 milliseconds 672 microseconds INFO[0036] commits scanned: 1893 WARN[0036] leaks found: 111
[/simterm]
Check the report:
[simterm]
$ less analytics-repo.json
[/simterm]
And an example from the findings:
... { "line": " \"private_key\": \"-----BEGIN PRIVATE KEY-----\\nMIIEvQIBADA***CCaM=\\n-----END PRIVATE KEY-----\\n\",", "lineNumber": 5, "offender": "-----BEGIN PRIVATE KEY-----", "offenderEntropy": -1, "commit": "0f047f0cca3994b3465821ef133dbd3c8b55ee7a", "repo": "BetterBI", "repoURL": "https://github.com/example/BetterBI", "leakURL": "https://github.com/example/BetterBI/blob/0f047f0cca3994b3465821ef133dbd3c8b55ee7a/adslib/roas_automation/example-service-account.json#L5", "rule": "Asymmetric Private Key", "commitMessage": "DT-657 update script for subs and add test\n\nDT-657 create test for check new json (add new subs)", "author": "username", "email": "[email protected]", "file": "adslib/roas_automation/example-service-account.json", "date": "2021-05-11T19:46:46+03:00", "tags": "key, AsymmetricPrivateKey" }, ...
Here:
line
: what exactly was foundoffender
: a rule, triggered for this findingcommit
: a commit ID with the secret
Finding are performed with regex expressions, described in the default.go
.
Also, you can create your own configuration file and pass it to Gitleaks.
For example, the private RSA key above was found by the Asymmetric Private Key rule:
[[rules]] description = "Asymmetric Private Key" regex = '''-----BEGIN ((EC|PGP|DSA|RSA|OPENSSH) )?PRIVATE KEY( BLOCK)?-----''' tags = ["key", "AsymmetricPrivateKey"]
So, we can create a dedicated config file for each team or repository and pass them via Kubernetes ConfigMap or as a file in a Jenkins job.
Jenkins job
Now, when we’ve seen how Gitleaks can be started, let’s add a Jenkins job to run it periodically.
Pipeline script
So, for each team, we will create a dedicated Jenkins job that will have a parameter with a repositories list of the team.
Loops in Groovy
Some time ago I did a similar solution using Golang, check the Go: checking public repositories list in Github. Go slices comparison post for details, and there it was a bit simpler to run a loop over the list. With Groovy, had to google a bit.
Create a new Jenkins job, set its type to the Pipeline:
In the job’s setting, create a string parameter with a list of the team’s repositories, here is only two used:
Next, go to the Jenkins script.
Set a variable named $repos_list
, that will accept an environment variable $TEAM_REPOS
, and then by using the split()
method divide the lists’ objects.
Then, by using the for
loop integrate over them:
node('master') { def repos_list = "${env.TEAM_REPOS}".split(',') for (repo in repos_list) { println repo } }
Run the job:
Jenkins Docker plugin
Our default approach to run Jenkins builds is by using a Docker container to keep the hosts’ system clean.
Add another parameter with the Password type, save Github token here:
By using the Jenkins Docker Plugin create a Docker container with Gitleaks, pass the token, URL, and report’s file. Pay attention, that the report’s file will contain a repository’s name:
node('master') { def repos_list = "${env.TEAM_REPOS}".split(',') for (repo in repos_list) { stage("Repository ${repo}") { docker.image('zricethezav/gitleaks').inside('--entrypoint=""') { sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json" } } } }
Here, for every repository name from the repos_list
list, we will create a dedicated Jenkins Pipeline Stage that also will use a repository’s name.
Run and check:
Um… And here is an issue: scan will be stopped right after the first findings in the first scanned repo, as Geatleaks found a leak, returned the exit 1 code, and the job was immediately stopped:
Ignoring errors in a Jenkins stage{}
To solve it, we can use the try/catch
solution: each stage will be running in its try
, in case of errors, we will catch them with catch
, and will proceed with the build:
node('master') { def repos_list = "${env.TEAM_REPOS}".split(',') def build_ok = true for (repo in repos_list) { try { stage("Repository ${repo}") { docker.image('zricethezav/gitleaks').inside('--entrypoint=""') { sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json" } } } catch(e) { currentBuild.result = 'FAILURE' } } }
Run it:
Good – now all stages are running despite a previous stage result.
Slack notifications from Jenkins
The next step for us is to configure sending alarms to a Slack workspace.
Let’s use the Slack Notification plugin for this. See its documentation here>>>.
Create a Slack Bot
Go to the Slack Apps, create a new application:
Go to the Permissions:
Add the following:
files:write
chat:write
Go to the OAuth & Permissions, install the bot to the Slack workspace:
Save the token:
Jenkins credentials
Add the token to the Jenkins – go to the Manage Jenkins > Manage Credentials:
Add a new one:
Set its type to the Secret file:
In the Slack workspace, create a new channel:
Invite the bot to the channel:
Add a new function to the Jenkins script – notifySlack()
, and run it from the catch{}
to send alarms if any secrets were found during the scan:
def notifySlack(String buildStatus = 'STARTED') { // Build status of null means success. buildStatus = buildStatus ?: 'SUCCESS' def color //change for another slack chanel def token = 'gitleaks-slack-bot' if (buildStatus == 'STARTED') { color = '#D4DADF' } else if (buildStatus == 'SUCCESS') { color = '#BDFFC3' } else if (buildStatus == 'UNSTABLE') { color = '#FFFE89' } else { color = '#FF9FA1' } def msg = "${buildStatus}: `${env.JOB_NAME}` #${env.BUILD_NUMBER}:\n${env.BUILD_URL}" slackSend(color: color, message: msg, tokenCredentialId: token, channel: "#devops-alarms-gitleaks-analytics") } node('master') { def repos_list = "${env.TEAM_REPOS}".split(',') for (repo in repos_list) { try { stage("Repository ${repo}") { docker.image('zricethezav/gitleaks').inside('--entrypoint=""') { sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json" } } } catch(e) { currentBuild.result = 'FAILURE' notifySlack(currentBuild.result) } } }
jenkins.plugins.slack.StandardSlackService postToSlack Response Code: 404
Run the build, and get the following error:
12:42:39 ERROR: Slack notification failed. See Jenkins logs for details.
Check Jenkin’s logs on the https://<JENKINS_URL>/log/all:
Go to the Manage Jenkins > Configure System, find Slack plugin’s options, set the Custom slack app bot user:
The credentials here are default, we are overriding them from the pipeline.
In the Advanced remove the Override URL, if it was set:
Run again and now everything is working:
File upload to Slack
Now, let’s add a report file upload with findings to the message in the Slack channel by using the slackUploadFile()
function:
def notifySlack(String buildStatus = 'STARTED', reportFile) { // Build status of null means success. buildStatus = buildStatus ?: 'SUCCESS' def color //change for another slack chanel def token = 'gitleaks-slack-bot' if (buildStatus == 'STARTED') { color = '#D4DADF' } else if (buildStatus == 'SUCCESS') { color = '#BDFFC3' } else if (buildStatus == 'UNSTABLE') { color = '#FFFE89' } else { color = '#FF9FA1' } def msg = "${buildStatus}: `${env.JOB_NAME}` #${env.BUILD_NUMBER}:\n${env.BUILD_URL}" slackSend(color: color, message: msg, tokenCredentialId: token, channel: "#devops-alarms-gitleaks-analytics") slackUploadFile(credentialId: token, channel: "#devops-alarms-gitleaks-analytics", filePath: "${reportFile}") } node('master') { def repos_list = "${env.TEAM_REPOS}".split(',') for (repo in repos_list) { try { stage("Repository ${repo}") { docker.image('zricethezav/gitleaks').inside('--entrypoint=""') { sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json" } } } catch(e) { currentBuild.result = 'FAILURE' notifySlack(currentBuild.result, "analytics-${repo}-repo.json") } } }
Channel here can be moved to the job’s parameter later.
Here, in the notifySlack()
we’ve added another parameter – the reportFile
, and then during the notifySlack()
function call, we are passing the report’s file as a second argument to the function.
Run the job, check the Slack channel:
And the final thing is to set a schedule to run the job:
Gitleaks configuration
Commits to check
At this moment, Gitleasks will perform a full scan of the repository – all commits, all history.
If we will run it every day, then each day we will get messages about some old problematic commits.
As a way to mitigate it, we can create two jobs: in the first job, we will do a full scan, and in the second one will perform a kind of incremental scan for changes, made during the last 24 hours.
I.e. the “incremental” job will be run daily at 12:00 pm when all developers are in the office, and the job will check commits for the last day only.
To do so, Gitleaks has the --commit-since
option. Let’s add a new variable called yesterday
with yesterday’s date taken by the previous()
method of the Date()
class, and then this date will be passed to the --commit-since
:
... node('master') { def repos_list = "${env.TEAM_REPOS}".split(',') def yesterday = new Date().format( 'yyyy-MM-dd' ).previous() println yesterday for (repo in repos_list) { try { stage("Repository ${repo}") { docker.image('zricethezav/gitleaks').inside('--entrypoint=""') { sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json --commit-since=${yesterday}" } } } catch(e) { currentBuild.result = 'FAILURE' notifySlack(currentBuild.result, "analytics-${repo}-repo.json") } } }
Gitleaks configuration file
Another thing is to create a dedicated rules file for the Gitleaks.
This can be done with the --repo-config-path
, and in each repository, we can add its own configuration file.
Add some default rules there, plus I’d like to check for passwords passed as plaintext to commits:
... [[rules]] description = "Plaintext password" regex = '''(?i)pass*[a-z]{5}[:|=]? +["|'](.*)["|']''' tags = ["password", "PlainTextPassword"] [allowlist] description = "Allowlisted files" files = ['''^\.?gitleaks.config$''']
With the (?i)pass*[a-z]{5}[:|=]? +["|'](.*)["|']
regular expression, we are looking for a string started with pass, then for a “:” or “=” symbol, then it can contain or not a space, then a quote mark, then any text, and a quote mark again.
Seems must be working:
Save it to the repository as .github/gitleaks.config
, and in the job add another parameter by using this file:
... docker.image('zricethezav/gitleaks').inside('--entrypoint=""') { sh "gitleaks --access-token=${GITHUB_TOKEN} --repo-url=https://github.com/example/${repo} --verbose --report=analytics-${repo}-repo.json --commit-since=${yesterday} --repo-config-path=.github/gitleaks.config" } ...
That’s all for now.