Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

A Closer Look at the Triad in Data-Driven Vision and Language: Curation, Representation, and Learning

Abstract Details

2024, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Building advanced Vision and Language (V&L) systems can offer significant societal benefits. For instance, V&L systems with visual question answering capabilities enable visually impaired individuals to perform daily tasks more independently; multimodal web agents streamline our daily activities, such as booking flights or shopping online; embodied robots enhance the efficiency and automation of manufacturing systems. However, developing such sophisticated V&L models is challenging due to the need for an integrated understanding of visual and linguistic information. This integration is particularly complex as it requires models not only to recognize and interpret detailed visual cues but also to understand and generate contextually relevant text. At its core, data plays an essential role in learning such integrated understanding. The effectiveness of V&L systems relies on how well data is curated, represented, and utilized for learning. In this dissertation, we thus aim to advance V&L systems through the lens of data. First, we discuss “data curation” to enrich training materials and benchmarks for V&L models. Second, we delve into “data representation” to encode visual and linguistic information from data into meaningful representations. Third, we explore “data learning” to enable models to acquire V&L knowledge from data. In short, we investigate three different aspects (i.e., curation, representation, and learning) of data to improve V&L understanding. We believe this comprehensive study greatly contributes to the development of advanced V&L models, ultimately providing substantial benefits to our society.
Wei-Lun Chao, Dr. (Advisor)
Yu Su, Dr. (Committee Member)
Andrew Perrault, Dr. (Committee Member)
231 p.

Recommended Citations

Citations

  • Kil, J. (2024). A Closer Look at the Triad in Data-Driven Vision and Language: Curation, Representation, and Learning [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1721217385422572

    APA Style (7th edition)

  • Kil, Jihyung. A Closer Look at the Triad in Data-Driven Vision and Language: Curation, Representation, and Learning. 2024. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1721217385422572.

    MLA Style (8th edition)

  • Kil, Jihyung. "A Closer Look at the Triad in Data-Driven Vision and Language: Curation, Representation, and Learning." Doctoral dissertation, Ohio State University, 2024. http://rave.ohiolink.edu/etdc/view?acc_num=osu1721217385422572

    Chicago Manual of Style (17th edition)