Machine Learning Methods for Computational Social Science

Aug 20, 2021, 10:00 am10:00 am
Zoom Meeting:
Event Description

Contributing to the rising popularity of computational social science, this dissertation presents new methods grounded in machine learning for solving several important problems in political science.

In Chapter 2, adapted from coauthored work in Fifield et al. (2020a), we present a new al­ gorithm for sampling redistricting plans from arbitrary distributions. We formulate redistricting as a graph-cut problem and adapt an image segmentation algorithm from the computer vision literature to construct a Metropolis-Hastings style algorithm for sampling graph partitions. We then validate our algorithm using a small-scale map for which all possible redistricting plans can be enumerated, finding that our method samples from the true distribution. Lastly, we apply our algorithm to a more realistic redistricting problem using data from New Hampshire.

In Chapter 3, adapted from coauthored work with June Hwang and Kosuke Imai, we de­ velop a fully-automated video processing system for encoding information in political cam­ paign advertisement videos . Our approach applies state-of-the-art algorithms to replicate a subset of variables in the human-labeled Wesleyan Media Project (WMP) data, performing tasks including video summarization, facial recognition, text recognition, speech recognition, audio classification, and text classification. We validate our method using the WMP data from the 2012 and 2014 election cycles, finding that machine coding is competitive with human coding for most of the variables considered in our study.

In Chapter 4, adapted from coauthored work in Tarr and Imai (2021), we adapt the support vector machine (SVM) algorithm to address the balancing problem in causal inference. We first establish SVM as a kernel balancing method by showing that the soft-margin SVM dual problem computes weights which balance functions in a reproducing kernel Hilbert space. We then show that the SVM cost parameter controls a trade-off between balance and sample sizeallowing us to use path algorithms to give exact characterizations of how balance and causal effect estimates change over the path. We validate our method using simulation data, showing that our algorithm is competitive with leading balancing methods. Finally, we conduct an empirical study using the right heart catheterization data from Connors et al. (1996).