Create a Shared Python Dev Environment for your Team — Part 2: Configuring a Devcontainer

Eshwaran Venkat
12 min readFeb 25, 2023

--

Photo by Sigmund on Unsplash

This is Part-2 of a two-part series on creating a shared Python development environment for a team. In both these articles, we shall introduce and use Devcontainers to create a shared development environment. This will cover configuration and basic setup syntax. We shall also cover GitHub Codespaces as a viable option for a shared workspace on the cloud using devcontainers. Pros and cons are also highlighted using Python packages as examples. In Part-1, we introduced GitHub Codespaces along with some utilities and best practices to manage shared python package projects. If you’re familiar with GitHub Codespaces as a concept, then I’d recommend skipping Part 1.

Link to Part-1: Introduction to GitHub Codespaces & Project Utilities

Table of Contents for Part 2

1. Setting up Codespaces for Teams
2. Introduction to Devcontainers
3. Creating a Setup Script
4. Creating a Codespace Prebuild
5. Jupyter Notebook on a Codespace

Setting up Codespaces for Teams

The general template for Codespaces involves getting a machine setup to develop on a single repository. This may still not be of much use when using in teams when variables like conda environments, package management and git hooks come to play. You still need to go into a new Codespace and spend some time performing an initial setup before you can use the repository.

Also, GitHub by-default blocks private repositories from being cloned in a Codespace. If you have some of your team’s private repositories that you would like to switch between in a single Codespace, you need to setup additional configurations. Else, you will be working on one Codespace per repository. While there are use-cases to work on different projects independently, we’ll consider a different scenario as given below.

Let’s create a shared python development environment in GitHub Codespaces based on the following scenario / requirements.

  • Access and work with 2 (inter-dependent) repositories in a single Codespace. Both repositories are internal (or private) organization repositories and are python packages. This means that a direct pip install <package_name> for instance, won’t work as the public internet (and PyPi) does not know that these packages exist outside your organization. Let the first package be a dependency for the second package repository. This means that the second package won’t work in an environment where the first doesn’t exist and is installed.

For example, let the first package be apple and the second package be apple-juice . apple can have public dependencies like pandas , Django, etc. And apple-juice can also have public dependencies like fastapi, but also has a dependency apple. You can import apple and import apple-juice inside some python application. apple-juice will import from apple implicitly.

  • A single conda (or equivalently a mamba ) environment is available on starting the Codespace where both the package repositories are installed in an editable form. This means that live changes to the package code will affect its installed functionality, thereby having no separation between development and usage.

For example, say apple has a function apple.squeeze() which is used by apple-juice. Then editing this function inside the apple codebase will affect what apple-juice sees of that function live.

Illustration of Conda Environments for managing dependencies. Image by NBIS (National Bioinformatics Infrastructure Sweden)
  • Utilities such as awscli, npm, pnpm, black, flake8, isortetc. are also installed and available for usage. awscli is used to interface with Amazon Web Services from the terminal. npm and pnpm are package managers for JavaScript bundles (which can sometimes be useful even inside Python projects, as we’ll see). The remaining utilities assist with linting and formatting in Python.
  • VSCode IDE Settings and Recommended Extensions come pre-installed. This ensures that productivity hacks are shared between colleagues.

For example, if your organization supports GitHub Copilot subscriptions, then having the Codespace initialize with the Copilot extension is a cherry on top.

I’ve used Copilot for over a year now, and can’t recommend it enough. It allows me to focus on the meat of the problem when programming and avoid writing boilerplates. Fun fact: Copilot can also write documentation when prompted.

Introduction to Devcontainers

A devcontainer (development container) is a development environment that is defined and packaged as a Docker container, which can be used by developers to ensure that everyone working on a project is using the same development environment. Devcontainers can be used with various programming languages, including Python, and can include tools like code editors, compilers, linters, and other development tools. By defining a development environment as a devcontainer, developers can ensure that everyone working on a project has access to the same dependencies and tools, making it easier to collaborate and debug code issues.

Devcontainers can be defined using a configuration file, which specifies the tools and dependencies needed for a particular development environment. Visual Studio Code, a popular code editor, includes built-in support for devcontainers, allowing developers to easily create and use a devcontainer for their projects. In the context of Python development, a devcontainer can include tools like a specific version of Python, the pip package manager, and tools like Black and Flake8 for code formatting and style checks. Overall, devcontainers are a powerful tool for ensuring consistency in development environments and improving collaboration among developers.

Note that devcontainers can be used without GitHub Codespaces. It is an independent project. You can create shared python development environments by distributing a devcontainer file with your team, and they can work out of their local systems. GitHub codespaces only provides a managed infrastructure service for running devcontainers.

To launch a development environment using a devcontainer locally, you will need to have Docker installed on your local machine. Here are the steps to launch a Python development environment using a devcontainer in Visual Studio Code:

  1. Open Visual Studio Code and navigate to the root directory of your Python project.
  2. Create a .devcontainer directory at the root of your project, if it doesn't already exist.
  3. Create a devcontainer.json file inside the .devcontainer directory, and specify the Docker image and any additional tools and dependencies that you want to include in the devcontainer.
  4. Open the Command Palette in Visual Studio Code (using Ctrl+Shift+P on Windows/Linux or Cmd+Shift+P on macOS) and select "Remote-Containers: Reopen in Container".
  5. Visual Studio Code will rebuild the Docker container based on the devcontainer.json file and open a new window inside the container. You can now use this window to develop your Python code using the specified tools and dependencies.

Note that the specific steps for creating and launching a devcontainer may vary depending on your project’s specific requirements and the tools and dependencies you want to include in the devcontainer. However, using devcontainers can be an efficient way to set up a consistent development environment for your Python project and ensure that all developers are using the same tools and dependencies.

Devcontainers for GitHub Codespaces lends itself to be free to add more than one json file that has multiple devcontainer configurations. One of these can be chosen when instantiating a Codespace.

In the example of apple and apple-juice, we can either add the same devcontainer at the repository root of both codebases, or we can create a third repository called orchard in which the Codespace will clone into and install both apple and apple-juice.

Here’s a format of a devcontainer for a codespace to enable apple and apple-juice repositories. Let’s assume these are part of the dotlas organization. For ex: github.com/dotlas/apple

{
"image": "mcr.microsoft.com/devcontainers/universal:2",
"customizations": {
"codespaces": {
"repositories": {
"dotlas/apple": {
"permissions": "write-all"
},
"dotlas/apple-juice": {
"permissions": "write-all"
}
}
}
}
}

Note that the Codespace will gain write permissions to the repositories in question. These permissions can be changed based on the level of authorization given to various members of a team. To view more information about permissions, view the GitHub docs on managing repo access to codespaces.

The image key is used to pull and build a basic container. In this case we are using Microsoft’s universal image for Codespaces (which comes with tools like conda , git , nvm, etc. pre-installed). There is a marketplace of multiple images from various open source organization and others which can also be used.

If one of the team members who will use the Codespace do not have permissions to the specific repository mentioned in devcontainer.json, then they will not be able to clone that specific repository within the Codespace, despite a write permission at the container level.

Let’s say you want to add some VSCode Extensions pre-configured when the Codespace pops up. You can specify this within /customizations -> vscode. Where customizations is a top-level JSON key as seen above.

{
"image": "mcr.microsoft.com/devcontainers/universal:2",
"customizations": {
"vscode": {
"extensions": [
// Basic Extensions
"github.codespaces",
"github.copilot",

// Python Extensions
"ms-python.python", // Python Language Support
"ms-python.vscode-pylance", // Python Language Support
"njpwerner.autodocstring", // Function Docstring Templater
"bungcip.better-toml", // TOML Syntax Highlighting

// Shell Extensions
"bmalehorn.shell-syntax", // Shell Syntax Highlighting

// Helpers
"grapecity.gc-excelviewer", // Previewing CSVs
"randomfractalsinc.geo-data-viewer", // View Maps in VSCode
"yzhang.markdown-all-in-one", // Markdown helpers
"gruntfuggly.todo-tree", // TODO Tree - for viewing code tasks
"streetsidesoftware.code-spell-checker", // Natural Language Spell Checker
"amazonwebservices.aws-toolkit-vscode", // Viewing AWS Resources
"mark-tucker.aws-cli-configure" // Configuring AWS CLI made easy
]
}
}
}

Adding this devcontainer will allows us to create a Codespace where we can clone the repositories we require and also have the extensions working on the VSCode session we’re using to remotely access this Codespace. While this doesn’t meet the goals of the scenario we described earlier, we are certainly getting close.

Creating a Setup Script

While a new Codespace does come with some tools installed, and has the ability to clone the repositories we need, we can still automate the setup process. For example, the ideal scenario is to enter the codespace and just get to work! No conda initialization, no installing packages or initializing hooks, etc. With the devcontainer setup so far, we would still need to enter the Codespace and run some commands to get make our workspace feel like home.

A devcontainer can be configured to run a script (bash script / python script, etc.) at various stages of creating a Codespace. View the Lifecycle Docs for more information on the various creation stages. Note that the script can only run during creation and does not run every-time a Codespace is resumed. In our scenario, we’ll use the onCreateCommand key in our devcontainer. This signals to the codespace that once it receives the image and builds the barebones of the container, we want the script to run immediately after that. A postCreateCommand will run the script after the Codespace is created and the user session has begun within the remote VSCode IDE. We shall use a bash script to setup the remaining parts of our shared development environment. We will store it in the project repository’s root under the scripts directory. For ex. github.com/dotlas/orchard/main/scripts/setup.sh

{
"image": "mcr.microsoft.com/devcontainers/universal:2",
// Setup - Runs on creation and installs all dependencies
"onCreateCommand" : "/workspaces/orchard/scripts/setup.sh"
}

In the setup.sh script, we will create a bash script which will initialize various aspects of our development environment. In our example scenario, we shall initialize a condaenvironment called cider which will have our packages.

#!/bin/bash

# declare globals

# directory where all the repos will be downloaded
MAIN_DIR="/workspaces"

# directory where hive scripts will be stored
SCRIPT_DIR="$MAIN_DIR/orchard/scripts"

# list of repos to download and optionally install
REPOS=("apple" "apple-juice")

# name of the conda environment
ENV_NAME="cider"

# python version
PYTHON_VERSION="3.10"

Note how apple precedes apple-juice since apple is its dependency and needs to be downloaded and installed first. We shall check if the environment exists, and create it if it does not.

# Check if the environment already exists
if (conda env list | grep $ENV_NAME); then
echo "$ENV_NAME environment already exists"
else
echo "Creating $ENV_NAME environment"
conda create --name $ENV_NAME python=$PYTHON_VERSION -y
fi

# Activate the environment through the conda shell
source /opt/conda/etc/profile.d/conda.sh
conda activate $ENV_NAME

# Update the pip version for installing editable packages
python -m pip install --upgrade pip

A few points of note:

  • The Codespace can run the command conda env list because conda comes pre-installed with the Microsoft Universal image for Codespaces.
  • We need to source into the conda.sh because conda activate commands in bash / zsh to enable environments requires initialization, and then a restart of the terminal. We want to perform the setup in such a way that this “new launch” of a terminal happens when the user enters the Codespace and launches for the first time with the condaenvironment ready-to-go.
# Download (clone) the repos

### REPOS=("apple" "apple-juice") declared previously
### MAIN_DIR="/workspaces" declared previously

cd $MAIN_DIR

for repo in "${REPOS[@]}"; do
if (ls | grep $repo); then
echo "$repo already exists - download skipped"
else
echo "Downloading $repo"
git clone https://github.com/dotlas/$repo
fi
done

Finally, we can install the repositories in an editable fashion. We first check if the repos are already installed successfully, and only install them if they are not found in the current condaenvironment.

# Check if the repos are installed, and install them if not

### REPOS=("apple" "apple-juice") declared previously
### MAIN_DIR="/workspaces" declared previously

cd $MAIN_DIR
for repo in "${REPOS[@]}"; do
if (python -m pip show $repo | grep -q "Name: $repo")
then
echo "$repo already installed"
else
echo "$repo not installed"

# Reset pip cache to avoid errors
mv ~/.cache/pip ~/.cache/pip.bk

# Install the repo
echo "Installing $repo"
cd $repo
python -m pip install -e .
cd $MAIN_DIR
fi
done

Now we can install any additional dependencies as well

# initialize python linters and formatters
python -m pip install black isort ruff pre-commit bandit

# install pnpm package manager for JS bundles
npm i -g pnpm

# Install AWS CLI
python -m pip install awscli boto3

Finally, remember to initialize the conda environment in the shell so that the user can view and use the conda environments.

# initialize conda environment to bash, so that new terminal has `base` environment loaded
conda init bash >/dev/null

All of the above snippets constitute setup.sh broken down into different functionalities. Feel free to spruce up setup.sh with additional tools, technologies or configurations. You can also store secrets as environment variables when your Codespace is created through GitHub Actions. This is done by storing a secret at the repository level or organization level in GitHub. This can be done through the Github UI or API.

For example, if you save ORCHARD_AWS_KEY in GitHub as a secret, then it can be accessed within your Codespace through the environment variable $ORCHARD_AWS_KEY. Secrets like this can be used to make additional configurations.

Once your setup script and devcontainer are ready, you can create a Codespace and watch for a few mins while GitHub gets it ready for you. Anyone from your team can now enter this repository (orchard) and use it to work on apple and apple-juice. When they create the Codespace for the first time, the setup script runs and then further syncs to the repositories will need to be done through git pull from within the Codespace.

Debugging devcontainers can be a bit slow when done through Codespaces. It might make sense to debug locally, and when you have a working container — then add to the repository to make a Codespace from. Remember to enter the VSCode session once changes are made and try out a few commands from the terminal to ensure that everything is working as intended.

Creating a Codespace Prebuild

A prebuild lets you create a Codespace from a devcontainer (which in turn may setup shop using a setup script as mentioned in the previous section), and store it so that new Codespace “creations” do not take as much time. It’s a way to cache your built development environment so that it can spawn quickly. A prebuild is created through a GitHub action (CI) and can be configured to run manually, on a schedule, on every push or to a devcontainer configuration change.

Option to setup prebuilds from Github UI. Image by Author.

Jupyter Notebook on a Codespace

Despite creating a Jupyter Notebook on Localhost, Codespaces points you to the machine’s web UI. Image by Author.

One way to run Jupyter notebooks on a Codespace is by installing Jupyter in the anaconda environment created prior and running jupyter notebook or jupyter lab from there. A new browser window / tab will take you to your Jupyter session. You can also optionally start a Codespace purely on Jupyter, similar to the way Jupyterhub works. This means, you don’t SSH into the machine, but only access the machine through the Jupyter UI and terminal commands are first received through HTTP and passed to the underlying OS through the Jupyter system.

A Jupyter Notebok template option provided by GitHub to use as a Codespace. Image by Author.

It is however recommended if you’re running large ML or Data engineering jobs to use Jupyterhub. Jupyterhub allows you to share Jupyter Notebook instances between your team by orchestrating variable size docker containers, or just simply deploy on VMs. The setup time required to get Jupyterhub up and running is significantly greater than using GitHub Codespaces. That is of course, more of an apples to oranges comparison. Ba-dum-tss! (cause Jupyter is orange, and we used the apple repository for Codespaces, get it? :P)

Nevertheless, you could also run Jupyter notebooks locally when devcontainers are used to build containers on local systems instead of on the cloud with Codespaces.

Thanks for reading!

P.S, don’t go looking for dotlas/apple and dotlas/apple-juice since they regrettably do not exist.

--

--