Natural Language Processing (NLP)

Research

Language is our primary and convenient means of communication. Textual content, which can range from social media discussions to product reviews to private physician notes, present naturally occurring data that can be used to create computational models to better represent, enhance, and, ultimately, understand such data.

We are currently working on several key problems to reach the above goal:

Curriculum Learning for Natural Language Processing

Deep neural networks can effectively tackle many tasks, from recognizing and reasoning about objects in images to playing strategy games to modeling valid sequences of words in human language. However, these models could be computationally expensive to train, even with fast hardware. In addition, statistical and machine learning models suffer from spurious data (those with potentially wrong labels), resulting in biased prediction and catastrophic errors. How can we efficiently train models that are robust to the biases imposed by spurious data? The importance of error-free resources cannot be overstated as errors can inversely affect interpretations of the data, models developed from the data, and decisions made based on the data. We are currently investigating schedulers that dynamically schedule training data points for more efficient and effective training, and detect spurious instances in datasets. Such schedulers can uncover the salient characteristics of these learners (networks) and their learning materials (training instances), and can improve the quality of existing resources, which is important for accurate and fair benchmarking.

Relevant publications

Ling-CL: Understanding NLP Models through Linguistic Curricula
Moahmed Elgaar, Hadi Amiri. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP’23). [PDF]

Complexity-Guided Curriculum Learning for Text Graphs
Nidhi Vakil, Hadi Amiri. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP’23) (Findings). [PDF]

HuCurl: Human-induced Curriculum Discovery
Moahmed Elgaar, Hadi Amiri. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23). [PDF]

Curriculum Learning for Graph Neural Networks: A Multiview Competence-based Approach
Nidhi Vakil, Hadi Amiri. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23). [PDF]

Generic and Trend-aware Curriculum Learning for Relation Extraction in Graph Neural Networks
Nidhi Vakil, Hadi Amiri. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL’22). [PDF]

Repeat before Forgetting: Spaced Repetition for Efficient and Effective Training of Neural Networks
Hadi Amiri, et al. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17).

Spotting Spurious Data with Neural Networks
Hadi Amiri, et al. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL’18).

Neural Self-Training through Spaced Repetition
Hadi Amiri. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL’19).

Clinical Decision Support

Most medical information, from referral letters to physician notes to scientific articles, are locked in unstructured text and are not readily accessible. NLP and Machine Learning techniques offer potential means to support clinicians with evidence and insight extracted from such data. We are investigating novel decision support systems to accelerate diagnosis for undiagnosed patients. Using patient data from the Undiagnosed Diseases Network (UDN), a nationwide program established by the National Institutes of Health to facilitate research on undiagnosed and rare diseases, we are investigating new and disease-agnostic deep learning technology to classify and triage patient applications, and pinpoint disease-causing gene variants through effective representation of multimodal patient data and reference materials about rare diseases.

Relevant publications

MedDec: A Dataset for Extracting Medical Decisions from Discharge Summaries
Mohamed Elgaar, Jiali Cheng, Nidhi Vakil, Hadi Amiri, Leo Anthony Celi. In Findings of the Association for Computational Linguistics ACL 2024 (Findings of ACL’24). [PDF]

CogniVoice: Multimodal and Multilingual Fusion Networks for Mild Cognitive Impairment Assessment from Spontaneous Speech
Jiali Cheng, Mohamed Elgaar, Nidhi Vakil, Hadi Amiri. In Proceedings of INTERSPEECH 2024 (INTERSPEECH’24). [PDF]

Attentive Multiview Text Representation for Differential Diagnosis
Hadi Amiri, Mitra Mohtarami, Isaac S. Kohane. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL’21).

Machine Learning of Patient Characteristics to Predict Admission Outcomes in the Undiagnosed Diseases Network
Hadi Amiri and Issac S. Kohane. In the Journal of the American Medical Association (JAMA). 2021.

Social Media Surveillance

User Generated Content (UGC) can be used to obtain low-cost and high-resolution views into population behavior. We are investigating effective online surveillance systems that can monitor population health and behavior at scale to detect (health-related) trends and outbreaks, and identify opportunities for decision making or intervention. The results can provide complementary information to the knowledge available in national surveys, and inform policy evaluation and improvement.

Relevant publications

Emerging Topic Detection for Organizations from Microblogs
Chen Yan, et al. In Proceedings of ACM SIGIR conference on research and development in Information Retrieval (SIGIR’13).
Short Text Representation for Detecting Churn in Microblogs
Hadi Amiri and Hal Daumé III. In Proceedings of the Thirtieth Conference on Artificial Intelligence (AAAI’16).
Toward Large-scale and Multi-facet Analysis of First-person Alcohol Drinking
Hadi Amiri, et al. In Proceedings of American Medical Informatics Association (AMIA’18).
Online Searching and Social Media to Detect Alcohol Use Risk at Population Scale
Elissa R. Weitzman, et al. In American Journal of Preventive Medicine (AJPM’20).