Beginners guide to data manipulation for routinely collected data using Python

About this resource

These introductory coding resources are provided as Jupyter notebooks and a set of associated example data files. Jupyter is free software which is available here: Project Jupyter | Home, or through Anaconda Navigator: Navigator Anaconda Navigator | Anaconda.

A useful introduction to using Jupyter notebooks can be found here: How to Use Jupyter Notebook: A Beginner’s Tutorial – Dataquest.

The notebooks include annotation, cells containing Python code, and example output. The notebooks can just be used as reference documents, or the code can be edited and run using the example data. Some of the example data provided (Encounters2.csv and medications2.csv) are edited versions of the Synthea synthetic patient data, which are available here: Home | Synthea. The notebooks are intended as a very basic guide on how to get started, but include links to further resources.

To run the code in the Jupyter notebooks you will need to download the example data files, and then edit the file paths given in the notebooks to wherever you have saved the data.

The notebooks have been made available in two formats, HTML (useful just to view code), and Jupyter .IPYNB file format (needed to run the code). The notebooks give an introduction to:

Merging data files
Subsetting and selecting data
Working with dates
Sorting, aggregating and reshaping data
Recoding data

The resources can be found below:

Example data files >

Jupyter notebooks >