Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Jihyung_Kil_Dissertation.pdf (29.64 MB)
ETD Abstract Container
Abstract Header
A Closer Look at the Triad in Data-Driven Vision and Language: Curation, Representation, and Learning
Author Info
Kil, Jihyung
ORCID® Identifier
http://orcid.org/0009-0005-7044-2781
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1721217385422572
Abstract Details
Year and Degree
2024, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Abstract
Building advanced Vision and Language (V&L) systems can offer significant societal benefits. For instance, V&L systems with visual question answering capabilities enable visually impaired individuals to perform daily tasks more independently; multimodal web agents streamline our daily activities, such as booking flights or shopping online; embodied robots enhance the efficiency and automation of manufacturing systems. However, developing such sophisticated V&L models is challenging due to the need for an integrated understanding of visual and linguistic information. This integration is particularly complex as it requires models not only to recognize and interpret detailed visual cues but also to understand and generate contextually relevant text. At its core, data plays an essential role in learning such integrated understanding. The effectiveness of V&L systems relies on how well data is curated, represented, and utilized for learning. In this dissertation, we thus aim to advance V&L systems through the lens of data. First, we discuss “data curation” to enrich training materials and benchmarks for V&L models. Second, we delve into “data representation” to encode visual and linguistic information from data into meaningful representations. Third, we explore “data learning” to enable models to acquire V&L knowledge from data. In short, we investigate three different aspects (i.e., curation, representation, and learning) of data to improve V&L understanding. We believe this comprehensive study greatly contributes to the development of advanced V&L models, ultimately providing substantial benefits to our society.
Committee
Wei-Lun Chao, Dr. (Advisor)
Yu Su, Dr. (Committee Member)
Andrew Perrault, Dr. (Committee Member)
Pages
231 p.
Subject Headings
Computer Engineering
;
Computer Science
Keywords
Vision and Language, Data Curation, Data Representation, Learning from Data
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Kil, J. (2024).
A Closer Look at the Triad in Data-Driven Vision and Language: Curation, Representation, and Learning
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1721217385422572
APA Style (7th edition)
Kil, Jihyung.
A Closer Look at the Triad in Data-Driven Vision and Language: Curation, Representation, and Learning.
2024. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1721217385422572.
MLA Style (8th edition)
Kil, Jihyung. "A Closer Look at the Triad in Data-Driven Vision and Language: Curation, Representation, and Learning." Doctoral dissertation, Ohio State University, 2024. http://rave.ohiolink.edu/etdc/view?acc_num=osu1721217385422572
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1721217385422572
Download Count:
91
Copyright Info
© 2024, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.