What is version control?
Version Control
Version control is a system that records all changes and modifications to files for tracking purposes. Developers also use the term source control or source code management.
The primary goal of any version control system is to keep track of changes. It achieves this by allowing developers access to the entire change history with the ability to revert or roll back to a previous state or point in time.
There are different types of changes, such as:
- adding new files
- modifying or updating files
- deleting files
The version control system is the source of truth across all code, assets, and the team itself.
For developers, especially those working in a team, there are many benefits associated with version control. These include:
- revision history
- identity
- collaboration
- automation
- efficiency
Revision history
Revision history provides a record of all changes in a project. It provides developers with the ability to revert to a stable point in time in cases where code edits cause issues or bugs.
Keeping a record of changes is great, but it doesn’t have as much value if you don’t know who is responsible for adding or changing a record.
Identity
All changes made are always recorded with the identity of the user that made them.
Teams can also analyze the editing, creation, and the deletion of files on the control system.
Another important aspect of a version control system is something called a peer review. Developers working on a task create a peer review once the code is ready for inspection.
- Code inspection
- Code reviewed by others
Peer review
The peer review aims to get other developers on your team to review the code and provide feedback where necessary.
Version control helps keep track of all changes. It plays an integral role in the explosion of development operations, or DevOps, as it’s commonly called.
DevOps
DevOps is a set of practices, philosophies, and tools that increase an organization’s ability to deliver applications or services to a high quality and velocity.
Version control is a key tool in this process, and it is used not only to track all changes but also to aid in software quality release and deployments.
You and your team may work using processes from the agile methodology.
Agile methodology
In an agile process, a team normally plan and execute two weeks of work to complete, which is called an iteration. Each iteration has a list of tasks to complete before the two weeks ends.
These tasks, while complex in some cases, are aided by having version control in place.
Testing and having some level of automation on every task introduced allows the team to be more efficient. It also ensures more confidence that any new feature being introduced will not break any existing flows.
Version Control Git terminology
Systems of version control and tools
There are many different version control systems available. For example:
- Subversion,
- Perforce,
- AWS Code Commit,
- Mercurial,
- and Git to name a few.
Version control systems can be split into two types or categories.
- Centralized version control systems
- Distributed version control systems
Centralized version control systems
Centralized version control systems, or CVCS for short, contain a server and a client. The server contains the main repository that houses the full history of versions of the code base.
Developers working on projects using a centralized system need to pull down the code from the server to their local machine. This gives the user their own working copy of the code base. The server holds the full history of changes. The client has the latest code, but every operation needs to have a connection to the server itself.
In a centralized version control system, the server is the central copy of the project. After making changes to the code, the developer needs to push the changes to the central server so that other developers can see them. This essentially means that viewing the history of changes requires that you are connected to the server in order to retrieve and view them.
Distributed version control systems
Distributed version control systems or DVCS for short, are similar to the centralized model. You still need to pull code down from the server to view the latest changes.
The key difference is that every user is essentially a server and not a client. This means that every time you pull down code from the distributed model, you have the entire history of changes on your local system.
Advantages and Disadvantages of CVCS
The advantage of CVCS is that they are considered easier to learn than their distributed counterparts. They also give more access controls to users.
The disadvantage is that they can be slower given that you need to establish a connection to the server to perform any actions.
Advantages and Disadvantages of DVCS
With DVCS, you don’t need to be connected to the server to add your changes or view a file’s history. It works as if you are actually connected to the server directly but on your own local machine. You only ever need to connect to the server to pull down the latest changes or to push your own changes. It essentially allows users to work in an offline state.
Speed and performance are also better than its CVCS counterpart.
DVCS is a key factor in improving developer operations and the software development life cycle.
A history of version control
As you know by now, version control is a system that records changes to a file or set of files over time so that you can access specific versions later. In software development, Version Control Systems (VCS) allows developers to manage changes to their code and track who made each change. But how did this software come about?
Version Control has a long history going back to the 1980s. In fact, version control systems were created before the Internet!
One of the first significant Version Control Systems was the Concurrent Versions System (CVS). It was first developed in 1986 by Walter F. Tichy at Purdue University and released publicly in 1990.
CVS stores information about every file in a folder structure, including the name of the file, its location in the folder structure, who last modified it, and when it was last modified. The CVS also stores information about folders, including their names and who created them.
It was popular for many years; however, it has some significant flaws in its design. CVS does not include integrity checks which means your data can become corrupted. When you update or submit changes to the system, if an error occurs, the system accepts the partial or corrupted files. Additionally, the system was designed mainly for text files, not binary files such as images or videos.
The main successor to CVS was Subversion (SVN).
CollabNet developed Subversion in 2000 and solved many of the issues present in CVS. To ensure data integrity, it included integrity checks in its design. It also supported the versioning of binary files better than CVS. Thanks to these improvements, SVN became popular in the open-source community with free hosting being offered for open-source projects by Google and SourceForge.
However, Subversion used a centralized VCS model. This means that all operations have to be done using a centralized server. If the server were down or slow, this would impede development.
In 2005, two new projects were started to develop distributed version control systems; Mercurial and Git. Both projects were created in response to an event involving the Linux kernel development.
Previously, the Linux kernel was using a proprietary VCS known as BitKeeper. BitKeeper was one of the first distributed version control systems initially released in 2000. BitKeeper had originally provided a free license to Linus Torvalds to support Linux’s development. However, in 2005, the license was revoked. This controversy led to the creation of the Mercurial and Git projects.
Mercurial was developed by Olivia Mackall. It is developed as a high-performance distributed VCS. Many platforms offering Subversion hosting began to offer Mercurial hosting too. It became popular as Subversion users found it easy to transition to a Mercurial repository, thanks to the hosting providers and its small learning curve.
Git was developed by Linus Torvalds to host the Linux kernel’s source code. Like Mercurial, it is a distributed VCS. Its first public release came in 2007.
Git became popular in the open-source community due to its distributed VCS design and Github offering free Git hosting for open-source projects. Git has since become the selected version control system for many open-source and proprietary software projects.
Version control in professional software development
Version Control plays a crucial part in software development. As a developer, you’ll work with other developers on projects to deliver software to customers. Depending on the role, you could be working with a small team of 2 or 3 developers in a single project or a large team spanning multiple projects. In either scenario, Version Control will be a crucial tool to help your team succeed.
However, Version Control must be complemented by other tools and procedures to ensure quality and efficiency throughout the software development process. In this lesson, we’ll explore some of the common tools and strategies developers use in conjunction with Version Control.
Workflow
Using Version Control without a proper workflow is like building a city without traffic lights; without appropriate management, everything will turn into chaos.
For example, let’s say you’re working on a big project and editing a file. Another developer also starts editing a file. Both of you submit the file to the VCS at the same time. Now there’s a conflict! How should the conflict be resolved? A good workflow will have a process for resolving conflicts.
Another example is when a new junior developer is joining your team. If the project code is used for a critical system, it is risky to allow them to submit code changes directly. To solve this, many developers use a peer review system where another developer must review code before it can be merged in.
Workflows are essential to ensure code is managed correctly and reduce mistakes from happening. Different projects will have different workflows. In this course, you’ll learn some common workflows using the Git Version Control System.
Continuous Integration
Continuous Integration, or CI, is used to automate the integration of code changes from multiple developers into a single main stream. Using a workflow whereby small changes are merged frequently, often many times per day, will reduce the number of merge conflicts.
This process is widespread in test-driven software development strategies. CI is often used to automatically compile the project and run tests on every code change to ensure that the build remains stable and prevent regressions in functionality.
Continuous Delivery
Continuous Delivery is an extension of Continuous Integration. Once the changes have been merged into the main stream, a Continuous Delivery system automatically packages the application and prepares it for deployment. This helps avoid human error when packaging the application.
Continuous Deployment
Continuous Deployment is an extension of Continuous Delivery. The goal of Continuous Deployment is to deploy and release software to customers frequently and safely. The strategy commonly involves automatically deploying to a test (also known as staging) environment first to validate the deployment package and software changes. Once validated, it can automatically deploy to the live (also known as production) environment for customers.
Conclusion
With these tools and procedures, it is possible to understand how software starts from a developer writing code to being deployed live for customers to use. Of course, there is much more to running a live software service, but that is a lesson for another day.
A history of revisions
When working with a team of developers, it’s essential for the codebase to have a source of truth that has all historical changes.
Version control systems play an integral part in aiding this process by providing a full history of changes of every single file added to its repository.
It makes collaboration across a team easier. Also aids in working toward a common goal. Whether it is adding new features and following the flow of how they were implemented, or discovering where a potential issue may have been introduced.
Being able to accurately pinpoint
- the who,
- the when,
- and the what of those changes is paramount.
The revision history will record the essential data points so any developer or team member can walk through the entire project from start to its current state.
Every change that has occurred on the project should be easily accessible either by a simple command or integration into the developer’s IDE.
It’s important for organizations to define standards for how developers communicate changes they make. Developers need to know prior to looking at the code, what the lead developers’ aims are. The more information the better. This creates a stronger team environment that is more transparent and open.
After commit, the developer will push their changes to the repository and create something called a pull request. Developers will then peer review the poll request to approve request changes or decline.
When working on a single project, there’s usually some level of crossover between the developers. When this does occur, the history of revisions can help aid the developers in seeing the full life cycle of changes that have occurred.
It is also essential for merging conflicts where multiple developers have made changes that may need to be resolved prior to the code being approved. The history will show who made the change, for what reason, the code itself and its changes, and the date and time of when they occurred.
Typically on a new project, you will encounter changes in one task that may cause potential issues or conflicts with another. The history of revisions will give the team the ability to manage and merge these changes and deliver the business objectives in a timely manner.
Staging vs. Production
Development Environments
Every development team prior to releasing their new features or changes needs to verify that the code they do release is not going to cause any issues or bugs. In order to achieve this, they normally set up multiple environments for different ways to test and verify. A common practice is for teams to have a developer environment, a UAT or QA environment, and a staging environment. The main purpose of this flow is to find any potential issues that may arise due to changes or new features being added to the codebase. The more ways to test the changes the less likely bugs will be introduced.
Staging
The staging environment should mimic your production environment. The reason for this is because you want to test the code in an environment that matches what you have in production. This allows teams to spot or find any potential issues prior to them getting to production. The closer the staging environment is to your production, the more accurate your testing is going to be. Staging environments can also be used for testing and verifying new features and allow other teams including QA or stakeholders to see and use those features as a pre-trial. Staging should also cover all areas of the architecture of the application including the database and any other services that may be required. Areas that benefit from staging environments include:
New Features
Developers submitting new features along with feature flags for turning them on and off should always do a testing round in a staging environment. They allow teams to verify that the feature works, it can be turned on and off via configuration flags and also that it does not break or interfere with existing functionality.
Testing
As the staging environment mimics your production environment, it’s also a great place to run tests. QA teams will normally use it to verify new features, configuration changes or software updates/patching. The types of testing covered will be Unit testing, Integration testing and performance testing. All except performance testing can also be carried out in production. Performance can also be completed in production but only at specific times - usually out of hours as it will have a drastic effect on the user experience.
Sometimes it is not always feasible to have an exact replication either due to costs or time. Certain areas can be cut back - for example, if your service is load balanced on 10 virtual machines in production, you could still have 4 virtual machines in staging. The underlying architecture is the same but the overall performance may be different.
Migrations
Staging is a perfect place to test and verify data migrations. Snapshots can be taken from production and used to test your migration scripts to confirm your changes will not break anything. If in the case it does cause an issue, you simply rollback and try again. Doing something like a migration in production is extremely risky and error-prone.
Configuration Changes
Configuration can also cause headaches for teams, especially in a large cloud-based architecture. Having a staging environment will allow you to spot any potential issues or bottlenecks.
Production
Production is live. It’s out there for people to see and/or interact with. Any issues or problems you may have had should have been caught and fixed in the staging environment. The staging area gives the team a safety net to catch these possible issues. Any code that is deployed to production should have been tested and verified before the deployment itself.
Downtime
Downtime for any service especially customer facing will most likely be revenue impacting. If customers can not access or use your website or app to its full capabilities, it will most likely have a cost involved. Take for example an e-commerce company that allows users to buy goods and services online. If they release a new feature to their shopping cart which actually breaks the payment process, this will have an impact on customers not being able to buy goods online.
Vulnerabilities
Cyber-security should also play a big role in what gets released in production. Any updates to software such as patching or moving to the latest version should be checked and verified. This is also the same rule for not upgrading software when critical updates are released.
Reputation
Downtime or issues in production is damaging for a company as it does not instill confidence in end users. If something is down or broken it can cause the company to lose potential customers.
Additional Resources
About Version Control
https://git-scm.com/book/en/v2/Getting-Started-About-Version-Control
List of Version Control Software
https://en.wikipedia.org/wiki/List_of_version-control_software
The benefits of a distributed version control system
https://about.gitlab.com/topics/version-control/benefits-distributed-version-control-system/
What is Cloning?
https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository
Previous one → 10.Testing Tools | Next one → 2.Unix Commands