MInDS: Using Large Language Models to Screen for Depression

Published in IEEE MIT Undergraduate Research Technology Conference, 2024

Abstract

Depression is a debilitating, yet underdiagnosed mental illness due in part to the subjectivity of current screening process and other time/resource restrictions. Large language models (LLMs) can potentially address these difficulties. Using the Extended Distress Analysis Interview Corpus dataset, containing 105 interview transcripts, we propose MInDS, an automated and modular LLM inferencing pipeline to optimize depression screening. Our results indicate that LLMs can effectively screen for depression with a 0.8 balanced accuracy. LLM predictions generated with a shortened transcript can perform similarly to predictions generated with the entire transcript. Our findings may aid the future development of LLMs for depression screening.

About this Paper

This work was completed during the Summer 2024 WPI Data Science Research Experience for Undergraduates (REU) program. I had the privilege of working with Prof. Elke Rundensteiner, Ph.D. students Becks Lopez, Avantika Shrestha, and fellow undergraduate researchers Nicholas Small (Providence College) and Janya Bhaskar (University of Colorado, Boulder). As my formal introduction to academic research, this REU was an incredibly transformative experience. This program really cemented my desire to pursue graduate studies and a career in research.

Throughout this program, we examined how LLMs were currently being used in the depression screening process, and noticed how new LLMs were being developed essentially faster than the field could test their depression screening capabilities. In order to combat this, we first developed a an automated, modular, parallelized, and scalable tool called MInDS to assist researchers in efficiently testing a variety of LLMs in the depression screening pipeline. By maintaining consistent conditions across tests, our tool helps produce more robust and comparable results when evaluating multiple LLMs.

We then used this tool, in combination with the Distress Analysis Interview Corpus (DAIC) dataset, Llama 3 and Gemma 2 to test the out of the box screening performance of these models. Initial accuracy scores were good, but inferencing times were very long. We then developed a greedy search algorithm to recursively select the most important questions in the dataset and shorten the transcripts to ~half the length of the originals. A more detailed methodology can be found in the complete text.

In addition, I continued this research with Prof. Elke Rundensteiner and Avantika Shrestha as an independent study project, where we utilized the MInDS tool we developed to compare the depression screening performance of out of the box pretrained LLMs to models we fine-tuned.

Find our paper archived here

Download Paper | Download Bibtex