NLP NLU AI Semantics

About Me

I’m an NLP engineer with a strong academic background in Linguistics, Math, and Computer Science. I hold a PhD from the University of Rochester, where I was advised by Len Schubert and my research focused on mining commonsense knowledge and organizing it in a structured way to reflect the relational understanding between entities as well as between knowledge and action. I also worked on annotating, detecting and recovering ellipsis and anaphoric expressions, mentored by Aaron Steven White.

Projects

Lexical knowledge and object schema acquisition

Jo wanted to hang up the frame. Should we expect him to go looking for a nail or a TV?

I worked on obtaining commonly assumed knowledge about objects from various sources, such as online dictionaries and large language models, and representing the acquired knowledge in a formalized knowledge representation, called a schema. My work focused on capturing lexical knowledge—such as hypernyms, parts, materials, and usage information—to enable human-like reasoning in AI.

Commonsense reasoning dataset

Which is less likely to be, made at least partially, of a material that is a constituent of a ping pong paddle: a computer mouse or a tuning fork?

I created a manually curated dataset of binary-choice questions about shared materials between objects. Each entry in the dataset was carefully reviewed to challenge language models to consider detailed information on artifact parts and material compositions.

Meaning in ellipsis and anahora

What do you mean by 'you think so'?

I also worked on making the underlying meaning of semantically compressed forms explicit in text. The scope of the project extends beyond elliptical utterances, such as verb phrase ellipsis, to include non-elliptical referring expressions, such as null complement anaphora.

Neg(ation)-raising

megaattitude.io/projects/mega-negraising

Jo doesn't think that Bo left. Does Jo think that Bo didn't leave?

As part of FACTS.lab’s MegaAttitude Project, I built lexicon-scale datasets from crowdsourcing and studied the semantic contribution of lexical items and syntactic structures.

Papers

2026

Hannah YoungEun An. (2026) Surfacing implicit knowledge for language understanding. Doctoral dissertation, University of Rochester, United States.

2025

Hannah Y. An and Lenhart K. Schubert. (2025). Large Language Models as a Tool for Mining Object Knowledge. paper

Jiacan Yu, Hannah Y. An, and Lenhart K. Schubert. (2025). Language Models Benefit from Preparation with Elicited Knowledge. dataset paper

2024

William Gantt, Shabnam Behzad, Hannah YoungEun An, Yunmo Chen, Aaron Steven White, Benjamin Van Durme, and Mahsa Yarmohammadi. (2024). MultiMUC: Multilingual Template Filling on MUC-4. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 349-368, St. Julian’s, Malta. Association for Computational Linguistics. paper

2023

James Allen, Hannah An, Ritwik Bose, Will de Beaumont, and Choh Man Teng. (2023). COLLIE: a broad-coverage ontology and lexicon of verbs in English. In Language Resources and Evaluation, 57(1), pages 57-86. Springer Nature. paper

2020

James Allen, Hannah An, Ritwik Bose, Will de Beaumont, and Choh Man Teng. (2020). A Broad-Coverage Deep Semantic Lexicon for Verbs. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3243–3251, Marseille, France. European Language Resources Association. paper

Hannah Youngeun An and Aaron Steven White. (2020). The lexical and grammatical sources of neg-raising inferences. In Proceedings of the Society for Computation in Linguistics, pages 386-399, New York, New York. Association for Computational Linguistics. paper poster

Education

University of Rochester

PhD Computer Science

2019 - 2026

Specialized in Artificial Intelligence, I worked on making implicit content explicit and acquiring lexical knowledge for and/or from AI systems to support the reasoning process. I was advised by Lenhart Schubert.

University of Rochester

MS Computer Science

2019 - 2021

I worked on aligning and developing ontologies with a deep semantic lexicon to support reasoning about commonsense knowledge, supervised by James Allen.

University of Rochester

MS Computational Linguistics

2017 - 2019

During my first two years in Rochester, many courses taught me key skills required in NLP and ML. As a part of the master’s degree, I completed a final project ‘Neg-raising inference and its syntactic contexts’ supervised by Aaron Steven White.

University of Washington

MA Linguistics

2012 - 2017

I graduated with Departmental Honors in Linguistics, with an honors thesis ‘Characteristic determiners vs. adjectives in Korean’ supervised by Toshiyuki Ogihara.

University of Washington

BS Applied and Computational Mathematical Sciences

2012 - 2017

My undergraduate degree with specialization in Scientific Computing and Numerical Algorithms prepared me to acquire a strong background in Math. Independent of my major program, I completed a minor in Philosophy as well.

Teaching

Teaching Assistant: CSC 247/447 Natural Language Processing (Spring 2021, University of Rochester)
Teaching Assistant: CSC 442 Introduction to Artificial Intelligence (Fall 2020, University of Rochester)
Teaching Assistant: CSC 247/447 Natural Language Processing (Spring 2020, University of Rochester)
Teaching Assistant: LIN 110 Introduction to Linguistic Analysis (Spring 2018, University of Rochester)
Korean Tutor @ CLUE Academic Support Program (Fall 2016 - Spring 2017, University of Washington)

Hannah YoungEun An