Natural Language Processing in the Social Sciences
Download as PDF
Course Description
Digital communications (including social media) are the largest data sets of our time, and most of them are text. Social scientists need to be able to digest small and big data sets alike, process them and extract psychological insight. This applied and project-focused course introduces students to a Python codebase developed to facilitate text analysis in the social sciences (see dlatk.wwbp.org -- knowledge of Python is helpful but not required). The goal is to practice these methods in guided tutorials and project-based work so that the students can apply them to their own research contexts and be prepared to write up the results for publication. The course will provide best practices, as well as access to and familiarity with a Linux-based server environment to process text, including the extraction of words and phrases, topics, and psychological dictionaries. We will also practice the use of machine learning based on text data for psychological assessment, and the further statistical analysis of language variables in R. The course has no computer science prerequisites. Familiarity with Python, SSH, and basic Linux is helpful but not required ¿ they will be minimally introduced in the course, as will SQL (databases) and Jupyter notebooks. Understanding regression, basic familiarity with R, and the ability to wrangle your data into spreadsheet form are expected. For more information, please see psych290.stanford.edu, where you will be able to access the google form to apply for the class.
Cross Listed Courses
Grading Basis
ROP - Letter or Credit/No Credit
Min
3
Max
3
Course Repeatable for Degree Credit?
No
Course Component
Seminar
Enrollment Optional?
No
Programs
SYMSYS195T
is a
completion requirement
for:
- (from the following course set: )