4 git and GitHub: Version Control and Collaboration
Git is a version control system that records changes to your code and enables seamless collaboration. GitHub is a web-based platform that hosts Git repositories and offers integrated tools for collaboration, issue tracking, code review, and project management. Together, Git and GitHub are essential for research projects, especially in team settings, because they enhance reproducibility, traceability, transparency, and collective productivity.
4.1 Getting Started
Prof. Vincent Grégoire (HEC Montréal) has a nice blog post and Youtube video tutorial therein introducing Git and GitHub for researchers: Using GitHub for Academic Research. Refer to his materials for an introduction to Git and Github.
4.2 Git Workflows
The collaboration workflow discussed during the workshop is the Feature Branch Workflow. Here is a brief overview of the steps involved:
- Create a new branch: When starting work on a new feature or bug fix, create a new branch from the main branch. (Bonus: start by creating an issue on GitHub to track the work.)
- Make changes: Work on the new feature or bug fix in the new branch. Commit changes frequently with descriptive messages.
- Push the branch: Once the feature or bug fix is complete, push the branch to the remote repository on GitHub.
- Create a pull request: Open a pull request on GitHub to merge the new branch into the main branch. This allows for code review and discussion.
- Review and merge: After the pull request has been reviewed and approved, merge the new branch into the main branch. Resolve any conflicts if necessary. Delete the feature branch if it’s no longer needed. (If you started with an issue on Github, you can now close the issue.)
4.3 A Few Tips
Below are some tips to help you use Git and GitHub effectively:
- Commit often: Make small, frequent commits with descriptive messages to keep track of changes and make it easier to revert if needed.
- Use branches: Create separate branches for new features or experiments to keep the main branch stable.
- Pull regularly: If you’re collaborating with others, pull changes from the remote repository frequently to stay up-to-date.
- Resolve conflicts carefully: When merging branches, carefully review and resolve any conflicts to ensure code integrity.
- Don’t ignore the
.gitignorefile: It specifies files and directories that Git should ignore, such as temporary files, build artifacts, and sensitive information. - Don’t check in large files: Avoid checking in large files (e.g., datasets, binaries) directly into your Git repository. This makes the repository large and slow. Instead, consider using other file hosting services (or Git Large File Storage).
- Don’t check in files that you can easily regenerate: Avoid checking in files that can be easily regenerated (e.g., compiled files, virtual environment folders). This keeps your repository clean and reduces its size. In general, only check in source code and essential files needed to run your project. Anything that doesn’t make sense to run
git diffon should probably not be checked in.
4.4 Learning Resources
To learn more about Git and Github, consider the following resources:
Using Git source control in VS Code: A guide on how to use Git and Github within Visual Studio Code.
Git Cheat Sheet: A one-page cheat sheet for common Git commands by Wizard Zines.