Understanding psychiatric illness through natural language processing (UNDERPIN): Rationale, design, and methodology

the UNDERPIN Collaborators

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


Introduction: Psychiatric disorders are diagnosed through observations of psychiatrists according to diagnostic criteria such as the DSM-5. Such observations, however, are mainly based on each psychiatrist's level of experience and often lack objectivity, potentially leading to disagreements among psychiatrists. In contrast, specific linguistic features can be observed in some psychiatric disorders, such as a loosening of associations in schizophrenia. Some studies explored biomarkers, but biomarkers have yet to be used in clinical practice. Aim: The purposes of this study are to create a large dataset of Japanese speech data labeled with detailed information on psychiatric disorders and neurocognitive disorders to quantify the linguistic features of those disorders using natural language processing and, finally, to develop objective and easy-to-use biomarkers for diagnosing and assessing the severity of them. Methods: This study will have a multi-center prospective design. The DSM-5 or ICD-11 criteria for major depressive disorder, bipolar disorder, schizophrenia, and anxiety disorder and for major and minor neurocognitive disorders will be regarded as the inclusion criteria for the psychiatric disorder samples. For the healthy subjects, the absence of a history of psychiatric disorders will be confirmed using the Mini-International Neuropsychiatric Interview (M.I.N.I.). The absence of current cognitive decline will be confirmed using the Mini-Mental State Examination (MMSE). A psychiatrist or psychologist will conduct 30-to-60-min interviews with each participant; these interviews will include free conversation, picture-description task, and story-telling task, all of which will be recorded using a microphone headset. In addition, the severity of disorders will be assessed using clinical rating scales. Data will be collected from each participant at least twice during the study period and up to a maximum of five times at an interval of at least one month. Discussion: This study is unique in its large sample size and the novelty of its method, and has potential for applications in many fields. We have some challenges regarding inter-rater reliability and the linguistic peculiarities of Japanese. As of September 2022, we have collected a total of >1000 records from >400 participants. To the best of our knowledge, this data sample is one of the largest in this field. Clinical Trial Registration: Identifier: UMIN000032141.

Original languageEnglish
Article number954703
JournalFrontiers in Psychiatry
Publication statusPublished - 2022 Dec 1


  • biomarker
  • language
  • machine learning
  • natural language processing (computer science)
  • neurocognitive disorders
  • psychiatric disorders

ASJC Scopus subject areas

  • Psychiatry and Mental health


Dive into the research topics of 'Understanding psychiatric illness through natural language processing (UNDERPIN): Rationale, design, and methodology'. Together they form a unique fingerprint.

Cite this