Coding

Python for journalists

Instructor: Winny de Jong


About this course

The course Python for Journalists is meant for journalists looking to learn the most common uses of Python for data journalism. During four modules the course teaches you how to set up Python and all Python-related tools on your own computer. Next you'll learn how to clean up messy datasets using the Pandas library. In the third module you'll learn how to analyse data, again using the Pandas library. In the fourth and final module you'll learn how to automatically download data from the web, by using both the Beautiful Soup and Requests libraries to dabbling in webscraping.

This Python for Journalists course is meant for those who dabbled in Python, but somehow didn't persevere; and for those who can't wait to dive in head first... Though no programming knowledge is required, it helps if you know what a terminal or command prompt is and if you are familiar with Excel.

For all modules except module 1 Set up, there is a Jupyter Notebook available to follow along during the course. Each notebook contains exercises and explanations. Next to these notebooks, you'll find cheatsheets with the used Python commands for modules 2, 3 and 4. Happy Pythoning!

1 Set Up

This module revolves around installing the right tools on your laptop. To follow along in the coming modules, you'll need Python 3, and several Python libraries like Requests, Pandas and BeautifulSoup installed. Jupyter Notebooks come highly recommended. It's recommended that you install all of this software in one go, using the Anaconda distribution. This first module does not include a Jupyter Notebook.

On your computer:

Install the Anaconda distribution to install Python 3, libraries Requests, Pandas, and BeautifulSoup, and Jupyter Notebooks all at once on your computer. Note: choose for the Anaconda installation that includes Python 3, at the time of writing that would be Python 3.6. Extra preparation: If you want to make sure you have a solid foundation to build up on, you might want to learn about the Python syntax first. Here are some places where you can learn about different data types in Python, which might help before continuing with this course: (Since the following tutorials overlap, choosing one is highly recommended.)

Online beginner tutorials at LearnPython.org Digital book Python for you and me

2 Clean data

In this second module we'll show you how to into your Python conda environment; how to start a Jupyter Notebook. Once that's out of the way, you'll learn how to import a CSV-file into your Jupyter Notebook, to get ready for some data cleaning. Among other things you'll learn how to search and replace values inside a column; how to change the datatype of a column; and how to extract data from a column to populate a new column. This module includes both a Jupyter Notebook (empty and completed) and a cheatsheet - all named 'clean data'.

3 Analyse data

In this thirth module, you'll learn how to analyse data using the Pandas library. You'll learn how to explore your dataset, looking at summary statistics - count, median, mean, percentiles, standard deviation etc. - for each column. Next we'll look into how to sort, filter, sum and count values in columns. Finally you'll learn how to group data, creating (for those familiar with Excel) pivot tables, using the Pandas library. This module includes both a Jupyter Notebook (empty and completed) and a cheatsheet - all named 'analyse data'.

4 Scrape data

The final module revolves around scraping data using both the Requests and the BeautifulSoup libraries. Though in practice you'll likely first want to scrape data, to later clean and analyse those numbers, this module is last for training purposes. The modules on cleaning and analysing data introduced you to Python, Pandas and Jupyter Notebooks. Paving the way for some basic webscraping, including a for loop to collect data as efficient as possible. Finishing this module you should be able to write some basic webscrapers to collect data from the internet. This module includes both a Jupyter Notebook (empty and completed) and a cheatsheet - all named 'scrape data'.

In a nutshell

Skill level

Beginner

Duration

4 hr 24 min

Subtitles

English

Students enrolled

N/A

Course introduction


Meet your instructor: Winny de Jong


Winny de Jong

Data journalist and data journalism trainer

Biography

Winny de Jong is a data journalist. She works at the Dutch national broadcast, NOS. When not at work, she shares the things she loves in her Data Journalism Newsletter.

Similar courses

Covering issues
Doing Journalism with Data: First Steps, Skills and Tools

Paul Bradshaw, Steve Doig, Simon Rogers, Nicolas Kayser-Bril, Alberto Cairo

Data visualisation
Charting Tools for the Newsroom

Maarten Lambrechts

Resources
Managing Data Journalism Projects

Jacopo Ottaviani

Data analysis
Mistakes We Made So You Don't Have To: Data Visualisation, Journalism and the Web

Jonathon Berlin

Best practices
Bulletproof Data Journalism

Stijn Debrouwere

Data collection
Cleaning Data in Excel

Maarten Lambrechts

subscribe figure