Text Analysis Pedagogy Institute

Logo for Text Analysis Pedagogy Institute from Constellate

Welcome to the Text Analysis Pedagogy Institute

The ability to read and understand text-based data is essential to future success in academics and employment. The Text Analysis Pedagogy (TAP) Institute helps instructors and librarians learn and teach text analysis and data skills in any discipline.

The TAP Institute offers a FREE series of events and classes for anyone interested in teaching text analysis. The courses are taught using ITHAKA’s Constellate and are designed to be progressive, so you will benefit from taking a single course or the entire series, no matter your skill level.

The courses

Taught by leading text-analysis experts, these free courses are designed as open educational resources for a collaborative community of librarians and instructors. A certificate of completion is available after each course.

Python Basics

If you’ve never programmed before, this course is a great introduction. Taught from a non-technical perspective, this course will help you start writing your first code and unlock the potential of text analysis.

Instructor: Nathan Kelber, Director of the Text Analysis Pedagogy (TAP) Institute and Education Manager at Constellate.

Note: This course is offered in identical sessions at different times to accommodate time zone variations. Please choose accordingly.

July 10 – 14
Registration closed

Pandas Basics

Pandas is a Python library designed for working with tabular data. In this three-day course, you will learn the two fundamental objects in Pandas: Pandas Series and DataFrame. You will also learn how to index and select data from a DataFrame, how to slice and filter a DataFrame, and how to work with missing values in a DataFrame.

Instructor: Zhuo Chen, Text Analysis Instructor at Constellate. She holds a Ph.D. degree in Linguistics from the Graduate Center of the City University of New York.

July 17, 19, 21
Registration closed

Pandas Intermediate

Pandas is a Python library designed for working with tabular data. After completing Pandas Basics, take this course to learn how to use the summary functions and maps in Pandas, how to group and sort data, how to work with time series data, and how to create simple plots in Pandas.

Instructor: Zhuo Chen, Text Analysis Instructor at Constellate. She holds a Ph.D. degree in Linguistics from the Graduate Center of the City University of New York.

July 24, 26, 28
Registration closed

spaCy 1

SpaCy is one of the more widely used libraries for performing natural language processing (NLP) in Python. It allows for you to leverage pre-designed pipelines for processing texts or create custom pipelines tailored to your own use case. This course will introduce you to all the basics of spaCy. It is designed to get students up and running with the library’s core features so that they can continue with either spaCy 2 or spaCy 3 at the TAP Institute. By the end of this course, students will have a basic understanding of the spaCy library and how to use it in their own texts.

Instructor: William Mattingly, a postdoc at the Smithsonian Institution’s Data Science Lab, where he applies machine learning and natural language processing to archival records at the Smithsonian and United States Holocaust Memorial Museum.

July 17, 19, 21
Registration closed

spaCy 2

In this course, we will learn about spaCy’s EntityRuler and SpanRuler for traditional text analysis based on rules. We will also learn about building custom spaCy components. By the end of this course, students will have the tools necessary to begin building their own pipelines in spaCy.

Instructor: William Mattingly, a postdoc at the Smithsonian Institution’s Data Science Lab, where he applies machine learning and natural language processing to archival records at the Smithsonian and United States Holocaust Memorial Museum.

July 24, 26, 28
Registration closed

spaCy 3

In this course, we will learn about machine learning conceptually and how to train custom models in a spaCy pipeline. We will specifically learn about training a custom named entity recognition model and a textcat model for performing text classification. By the end of this course, students will have a basic understanding of machine learning, best practices for annotating data, and how to train a machine learning model with spaCy.

Instructor: William Mattingly, a postdoc at the Smithsonian Institution’s Data Science Lab, where he applies machine learning and natural language processing to archival records at the Smithsonian and United States Holocaust Memorial Museum.

July 31, Aug 2, 4
Registration closed

Finding word meaning through context

J.R. Firth once wrote “You shall know the meaning of a word by the company it keeps.” In this workshop, we learn how to find the company that words keep. We will introduce basic methods for working with text files in Python, as well as some common statistical measures for determining which words are characteristic of which texts. Then we will learn how to find “collocates”, or the words that appear near any given term in a text.

Instructor: J.D. Porter, Digital Humanities Specialist at the Price Lab for Digital Humanities at the University of Pennsylvania.

July 17, 19, 21
Registration closed

Web-scraping toolkit

The data we need is often stored on web pages or behind APIs and getting access to it involves tools beyond just core Python. This course provides an overview of common strategies, tools, and skills needed for a variety of web scraping projects. Whether you are using non-coding tools like Google Sheets or more advanced Python modules, this course will jump-start your skills to start on a web scraping project.

Instructor: Elizabeth Wickes, lecturer at the School of Information Sciences at the University of Illinois, where she teaches programming, data curation, and information technology courses.

July 24, 26, 28
Registration closed

Tools for taming text: RegEx and XPath

XPath and Regular Expressions are two powerful techniques that can aid data extraction from text, websites, XML documents, and more. Regular Expressions are ideal for matching patterns within free text while XPath is an expression language for extracting content from HTML and XML documents. This course aims to introduce learners to the appropriate context for using each tool, foundational to intermediate syntax for each, and finally how to appropriately use them together within Python. Learners will also be provided some take home challenges to practice their skills after the course.

Instructor: Elizabeth Wickes, lecturer at the School of Information Sciences at the University of Illinois, where she teaches programming, data curation, and information technology courses.

July 31, August 2, 4
Registration closed

Teaching data literacy

An introduction to teaching data literacy for librarians, faculty, and staff. By the end of this course, you will design a ready-to-offer webinar or class for your institution based on a course of your choice from TAP Institute.

Instructor: Nathan Kelber, Director of the Text Analysis Pedagogy (TAP) Institute and Education Manager at Constellate.

Note: This course is offered in identical sessions at different times to accommodate time zone variations. Please choose accordingly.

August 7, 9, 11
Registration closed

Raves from past TAP Institute attendees

“I enjoyed the hands-on approach and technical skills workshops offered… It was incredibly rewarding to begin my education.”
“Having access to lessons and materials was very useful! It makes things less intimidating when you have model codes to work with.”
“I never would have thought I would learn so much in so little time!!”

About the Text Analysis Pedagogy Institute

The Text Analysis Pedagogy (TAP) Institute is hosted by ITHAKA and supported by free access to Constellate, a text analysis platform that integrates access to scholarly content and open educational resources into a cloud-based lab to help instructors more easily and effectively learn and teach text analysis and data skills.

Constellate is part of ITHAKA’s portfolio of non-profit services, along with trusted resources like JSTOR and Portico. ITHAKA’s services are all aligned around a shared mission to improve access to knowledge for people around the world as affordably and sustainably as possible.