Natural Language Processing in the Social Sciences

Download as PDF

Course Description

Digital communications (including social media) are the largest data sets of our time, and most of them are text. Social scientists need to be able to digest small and big data sets alike, process them and extract psychological insight. This applied and project-focused course introduces students to a Python codebase developed to facilitate text analysis in the social sciences (see dlatk.wwbp.org -- knowledge of Python is helpful but not required). The goal is to practice these methods in guided tutorials and project-based work so that the students can apply them to their own research contexts and be prepared to write up the results for publication. The course will provide best practices, as well as access to and familiarity with a Linux-based server environment to process text, including the extraction of words and phrases, topics, and psychological dictionaries. We will also practice the use of machine learning based on text data for psychological assessment, and the further statistical analysis of language variables in R. The course has no computer science prerequisites. Familiarity with Python, SSH, and basic Linux is helpful but not required ¿ they will be minimally introduced in the course, as will SQL (databases) and Jupyter notebooks. Understanding regression, basic familiarity with R, and the ability to wrangle your data into spreadsheet form are expected. For more information, please see psych290.stanford.edu, where you will be able to access the google form to apply for the class.

Cross Listed Courses

Grading Basis

ROP - Letter or Credit/No Credit

Min

3

Max

3

Course Repeatable for Degree Credit?

No

Course Component

Seminar

Enrollment Optional?

No

Programs

SYMSYS195T is a completion requirement for:
  • (from the following course set: )