Create a Shared Python Dev Environment for your Team — Part 1: GitHub Codespaces & Project Utilities

Eshwaran Venkat
10 min readFeb 25, 2023

--

Photo by Desola Lanre-Ologun on Unsplash

This is Part-1 of a two-part series on creating a shared Python development environment for a team. In both these articles, we shall introduce and use Devcontainers to create a shared development environment. This will cover configuration and basic setup syntax. We shall also cover GitHub Codespaces as a viable option for a shared workspace on the cloud using devcontainers. Pros and cons are also highlighted using Python packages as examples. In Part-2, we get into the nitty-gritty details and use a hypothetical team scenario to set up a shared Devcontainer configuration from scratch. If you’re familiar with GitHub Codespaces as a concept, then I’d recommend skipping to Part 2.

Link to Part-2: Configuring a Devcontainer

Table of Contents for Part 1

1. The Problem
2. Goals of a Potential Shared Development Environment
3. What is a GitHub Codespace
4. Codespace States
5. Codespace Management from VSCode
6. Shortcomings of Codespaces
7. Project Setup Best-Practices

The Problem

Docker and related containerization products are widely used to ship and test production code uniformly. Besides scaling easier, they also ensure that the code running on prod is in a stable, deterministic and consistent environment. There is however no prevailing consistency in environment during development, and this is a good thing. Developers, data scientists, researchers, engineers and other related professionals prefer to have control over the customization of the environment they’re building and testing on, as it enables freedom and ownership of the nuts & bolts used to bring their workspace together.

There are however situations where it makes sense to abstract away the build and config process of a development environment and distribute a common environment among a team should they need it. Essentially, “to pick up an environment from off the shelf and get started”. To list a couple of these scenarios:

  • Onboarding new team members. Onboarding pipelines can be sometimes extensive when bringing in fresh blood into a team. The bottleneck is usually the time spent understanding internal systems and projects. These are often not introduced all at once either due to complexity reasons, compliance, etc. This process also becomes difficult to standardize in a startup environment where tools and configs can change frequently at the beginning, with new gizmos coming in all the time. In other scenarios, new colleagues consequently start out sometimes with a “fog-of-war”-type view of their work or project. As a result, this could affect what tools (and versioning) they setup for themselves for development, and end up having to reconfigure or retrofit in the short and medium term when a dependency is not met, or breaks.
  • Teams with skill variance. Not all team members are developers, and not all developers are made equal. This is often a strength in diversity, because no-one chooses the entire squad to be mages when playing DnD do they? The problem is that helping with configuration or setup is sometimes frustrating for someone ‘with experience’ to share with someone ‘without’. It’s natural to want to spend your time solving the real problems, and not wanting to spend 2 hours to realize that two projects used different versions of a library. This introduces friction in terms of asking and offering support to setup shop. Unbeknownst to most, this fabricates an invisible barrier into an org’s tech stack & projects.

Goals of a Shared Development Environment

So, let’s outline some goals we want to achieve with a shared development environment:

  • Comes in-built with CLI tools, extensions and other technologies that are used for development and general productivity
  • Code repositories required for a specific project or team are made available and installed for building, debugging and testing with minimal action required from a developer to setup or initialize.
  • Allows one to switch the type of machine based on the workload of their development, and further configure the machine or machine template for sharing.
  • Standardize automated-best practices such as linting, formatting, etc.
  • Access the development environment with ease (allowing for some security-based friction).

Now that we’ve established the problem and goals, we need only explore the tools available to bring a shared development environment to life. This occurs in two stages.

  • The first stage is the workspace itself, such as making a machine available to work on, with team-wide configs and productivity tools in-place.
  • The second stage is the project-level configurations that make it easy to get started building and modifying one or more projects within the workspace from stage 1.

What is a GitHub Codespace

Let’s get the trivialities out of the way. I’m not a huge fan of the GitHub docs so I’ve attempted to simplify a lot of the bare-bones setup.

GitHub Codespaces are a product offered by GitHub to develop your code on the cloud. This means that you receive a full-fledged virtual machine that comes pre-configured with a repository of your choosing. You need to have a GitHub account to access and use Codespaces. Think of a “Codespace” as meaning a “VM” (Virtual Machine).

For all intents and purposes, when you think “Codespace”, just think “VM asterisk”

Creating a Codespace for Mission Dotlas repository from GitHub UI. Image by Author.

This will open a new tab and display a VSCode IDE on the web with the repository loaded-in and ready for development.

An in-browser VSCode IDE from the Mission Dotlas Codespace. Image by Author.

Notice how codespace is indicated in the bottom-left of the IDE

Remote connection to codespace in browser VSCode IDE. Image by Author.

Codespaces are priced based on storage used and hours consumed. GitHub has a free-tier of the maximum amount of these parameters that can be used before being charged. Refer to GitHub docs for more info.

Codespace States

  • Creating a Codespace means you request GitHub to get you a VM with the repository loaded-in. GitHub will allocate a machine from one of their server farms (usually the one closest to you by latency) and spin up a machine for you to use.
  • A Codespace is deleted if the user requests the Codespace to be deleted. This means that all saved data or work done in the Codespace that has not been pushed to remote will be lost. This includes data on-disk.
Delete option for the created Codespace. Image by Author.
  • A Codespace is Stopped when the user requests it to stop, or if the Codespace is idle for a specific amount of time. Data saved on disk (such as new files in the repository) will persist when the Codespace is stopped, but data on RAM (i.e, running jobs) will be terminated.
  • A Codespace is Started or Resumed when it goes from a stopped state to running .
  • Multiple codespaces can be setup on a single branch.
  • When the machine-type of a Codespace is changed (i.e, modifying CPUs or RAM), then the Codespace is stopped and then resumed with the new system configuration.
  • TLDR; A Codespace is created or deleted once. It can be started or stopped multiple times while persisting data on disk

Codespace Management from VSCode

While we used the browser in the previous section to create and manage codespaces, this can also be achieved from inside the VSCode Desktop Application IDE. Download and install the GitHub Codespaces Extension or search for it by the extension id github.codespaces

Official GitHub Codespaces extension available on VSCode. Image by Author.

This makes Codespaces available through the command palette to manage, create, stop and resume our machines.

Ctrl / Cmd + Shift + P and type `Codespaces` to see options. Image by Author.

Shortcomings of Codespaces

Codespaces are not great if you want to run large jobs on the cloud and are just looking for a VM to do so. One reason is because the maximum idle timeout is 4 hours. Another reason is that lower tiers of machines on Codespaces offer very low disk space (and remember that the OS also has to be loaded in, along with swap space for memory). Codespaces are primarily meant for providing as much compute and memory necessary to develop and push code which can then go run large jobs on the cloud elsewhere.

GitHub Codespaces Machine Types as of Feb 2023. Image by Author.

The idle timeout can also be a boon to use Codespaces, as it can lead to significant cost-savings as opposed to running VMs when no-one on the team is using them. Codespaces are also useful for situations when you want to spend less time on setup and infrastructure. GitHub enterprise lets you manage Codespace creation and maintenance policies across an organization.

Project Setup Best-Practices & Utilities

In a shared development environment like GitHub Codespaces (or devcontainers), it is important to have consistent tools and dependencies to ensure that all developers are working in the same environment. There are five supporting pillars to make a great, healthy and reliable Python package project. They are Tests, Linting, Documentation, Hooks and Releases. Let’s explore each of these briefly.

Tests: Any Python project or general software project that’s worth its weight has tests that have good code coverage. This means that different functions have a wide variety of test-cases and these tests span the entire codebase such that different lines of code that are conditionals execute based on each test-case. The ideal set of tests will cover every possible route your program takes. Pytest is a battle-tested testing library, and you can use Coverage along with pytest to build a reliable testing suite.

Sample Pytest run with a code coverage report. Image by Author.

Linting & Formatting: Linting and formatting are two essential processes for maintaining high-quality Python code. Linting involves analyzing the source code for potential errors, inconsistencies, and style violations, while formatting ensures that the code is written in a consistent, readable, and maintainable style. By performing these processes, developers can detect and fix potential issues early, reduce code review time, and improve code readability and maintainability, ultimately leading to better software quality and developer productivity.

There are several tools available for linting and formatting Python code, such as PyLint, Flake8, Black, and autopep8. PyLint is a widely used static analysis tool that checks for syntax errors, coding standards, and potential bugs. Flake8 combines PyLint with additional tools for checking style and formatting, and it can be easily integrated with most text editors and IDEs. Black is a code formatter that automatically reformats Python code to follow the PEP 8 style guide, while autopep8 is a tool that automatically applies PEP 8 formatting rules to Python code.

Documentation: Documentation is essential for creating maintainable and understandable Python code. One common method for documenting Python code is by using function docstrings, which are triple-quoted strings that provide a detailed description of the code’s purpose and how to use it. There are several styles of docstrings, including Google-style, reStructuredText, and NumPy-style, among others. Tools like MkDocs, a static site generator, can be used to create visually appealing and easy-to-navigate documentation sites, with the MkDocs Material theme being a popular choice. By following a consistent style and using docstrings to document code, developers can ensure that their code is easy to understand and maintain, both for themselves and for others.

Sample Documentation page on Mkdocs with Material theme. Image by author.

Hooks: Pre-commit is a Python package that automates the process of running code checks and tests before committing changes to a version control system like Git. It includes a wide range of hooks for checking code quality, including code formatting, style checks, and static analysis tools like Black, Flake8, and Mypy. By using pre-commit hooks, developers can ensure that their code meets quality standards and is ready for review before it is merged into the codebase. Pre-commit can be used in conjunction with Husky, a similar tool for Node.js projects, to ensure that the check runs automatically during a commit.

Pre-commit hooks running before a commit is finalized. Image by author.

Releases & Conventional Commits: Conventional Commits is a specification that provides a structured way of committing code changes, making it easier to generate changelogs and automate the release process. A changelog is a document that records the changes made to a software project between different versions or releases. It typically includes information about new features, bug fixes, and other changes that may affect how the software behaves or how it should be used. By using Conventional Commits, developers can categorize their code changes into types such as feat for new features, fixfor bug fixes, and chorefor maintenance tasks, among others. This structure makes it easier to understand the scope and purpose of each commit, simplifying the process of generating release notes and creating release cycles.

Adding conventional commits through commitizen (cz). Image by author.

In the Python ecosystem, Google’s release-please tool is a popular choice for automating the release process using Conventional Commits. Release-please uses commit messages to generate release notes, update version numbers, and create release branches and tags automatically. By using Conventional Commits and release-please, Python developers can automate the tedious parts of the release process and focus on writing code instead. Additionally, Conventional Commits and release-please can help maintainers of Python packages ensure that their releases are consistent and well-documented, improving the overall quality of the package and its adoption by other developers.

Cookiecutter Template with Utilities

The open source Dotlas cookiecutter for Python packages comes pre-configured with all of the tools mentioned above to manage your package end-to-end right from the get-go. Give it a spin by running this from your terminal (with cookiecutter installed — pip install cookiecutter):

cookiecutter gh:dotlas/cookiecutter-pypackage

Continue Reading — Link to Part 2: Configuring a Devcontainer.

Shoutout to Kelvin for being an advocate of the best practices mentioned above!

--

--