Automating Your Data Pipeline with GitHub and Jenkins
Jenkins and GitHub are powerful tools for automating your data pipeline. In this post, we'll walk through the steps for setting up a Jenkins job that runs a script at a scheduled time using GitHub as the source code repository.
A GitHub repository with a script
A Jenkins server
Step 1: Set Up SSH Access to GitHub for Jenkins
The first step is to allow Jenkins to access your GitHub repository using SSH keys. Here's how to do it:
1. Generate an SSH key pair on your Jenkins server:
Log in to your Jenkins server as the user that will run Jenkins.
Open a terminal or command prompt.
Type ssh-keygen and press Enter.
Accept the default location and file name for the key pair.
Enter a passphrase for the private key (optional but recommended).
2. Add the public key to your GitHub account:
Log in to your GitHub account.
Click on your profile icon and select "Settings".
Click on "SSH and GPG keys".
Click on "New SSH key".
Give the key a title (e.g., "Jenkins SSH Key") and paste the contents of the public key file into the "Key" field. In most cases, you can get the public key using the following command: sudo cat .ssh/id_rsa.pub
Click "Add SSH key".
3. Configure the SSH credentials in Jenkins:
Log in to Jenkins as an administrator.
Click on "Credentials" on the left-hand side menu.
Click on "System" on the top tabs and then "Global credentials".
Click on "Add Credentials".
Select "SSH Username with private key" as the kind.
Enter your GitHub username as the username.
Paste the private key file into the "Private Key" field. In most cases, you can get the private key using the following command: sudo cat .ssh/id_rsa
Give the credential a meaningful ID and description.
4. Configure the Jenkins job to use the SSH credentials:
Open the Jenkins job you want to configure.
Click on "Configure" on the left-hand side menu.
Scroll down to the "Source Code Management" section.
Select "Git" as the type.
Enter the SSH URL of your GitHub repository (e.g., firstname.lastname@example.org:username/repo.git) in the "Repository URL" field.
Select "SSH Username with private key" as the credential type.
Select the credential you added in step 3 from the "Credentials" dropdown.
Save your changes.
Once you've completed these steps, Jenkins can access your GitHub repository using the SSH keys.
Step 2: Schedule the Jenkins Job
Now that Jenkins has access to your GitHub repository, you can schedule the job to run the script at a specific time. Here's how to do it:
Open your Jenkins instance and navigate to the job you want to schedule.
Click on "Configure" to open the job's configuration page.
Scroll to the "Build Triggers" section and select the "Build periodically" option.
In the "Schedule" text box, enter the cron expression that defines the schedule you want to use. The cron expression consists of five or six fields separated by spaces, each specifying a time or date value.
The format of a cron expression is as follows:
* * * * * * | | | | | | | | | | | ----- Day of the week (0 - 7) (Sunday is 0 or 7) | | | | ------- Month (1 - 12) | | | --------- Day of the month (1 - 31) | | ----------- Hour (0 - 23) | ------------- Minute (0 - 59)
Note: The sixth field is optional and represents the year.
For example, if you want to schedule the job to run every day at 3:30 AM, you would enter the following cron expression:
30 3 * * *
Step 3: Set up the pipeline in Jenkins
1. Scroll down to the "Builds Steps" section and select "Execute a shell script" as the definition. In the shell script field, enter the command to run your script in the GitHub repository. For example, if your script is called "myscript.sh", you can enter the following command:
2. click "Save" to save the job's configuration.
Your Jenkins job will run your script according to your specified cron expression.
Automating your data pipeline with GitHub and Jenkins can save you time and reduce the risk of errors. By setting up a pipeline in Jenkins and scheduling it to run at regular intervals, you can ensure that your data is always up-to-date and accurate. With these simple steps, you can start automating your data pipeline today!