Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
44064.pdf (7.31 MB)
ETD Abstract Container
Abstract Header
Title-based video summarization using attention networks
Author Info
Li, Changwei
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659518397170147
Abstract Details
Year and Degree
2022, MS, University of Cincinnati, Engineering and Applied Science: Electrical Engineering.
Abstract
The rapid advances in video storage, processing and streaming services, improvements of cellular communication speed, enhancement of mobile phone cameras and increase in social media engagement led to explosive growth in the number of videos generated every minute. Therefore, content-based video searching, browsing, and information retrieval technologies have received significant attention in recent years adapting to the massive number of videos generated. Video summarization techniques are among methodologies which can help users browse the video fast and retrieve information more efficiently by either solely extracting key-frames/segments or assembling the important segments further as video skims, highlights or summaries. In this research, the current video summarization pipeline, collected datasets, and related evaluation metrics are reviewed. Furthermore, various video summarization models which rely on the fusion of video title and visual features using attention networks will be proposed and evaluated using publicly available datasets: 1. A baseline video summarization model which uses correlation among visual features of video frames using attention network is studied. The training procedure and evaluation metrics will be compared against similar recent studies. 2. Extracting Video Title embeddings using pre-trained language models, various methodologies for integrating video title information in the baseline model are studied and evaluated. Re-shaping self-attention to cross-attention, a model which takes advantage of correlation among video title and frame visual features is proposed. Given that the correlation of visual frames in long sequences does not necessarily provide video storyline, the fusion of title information in the proposed model improved the video summarization performance as expected. 3. Finally, to further improve the performance of the proposed model, loss function is modified to combine the accuracy of frame-level score predictions and segment-level score predictions. Optimizing the proposed loss, the model tries to predict the frame scores as accurate as possible while not deviating from the segment importance scores desired. The performance of the proposed model increased the summarization performance (F1-Score) by 1.1% on TvSum dataset and 2.2% on SumMe dataset.
Committee
Mehdi Norouzi, Ph.D. (Committee Member)
Xuefu Zhou, Ph.D. (Committee Member)
Wen-Ben Jone, Ph.D. (Committee Member)
Pages
71 p.
Subject Headings
Electrical Engineering
Keywords
Supervised video summarization
;
Key-frame extraction
;
Text-visual cross-attention
;
Key-shot extraction
;
Query based Summarization
;
Self-Attention
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Li, C. (2022).
Title-based video summarization using attention networks
[Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659518397170147
APA Style (7th edition)
Li, Changwei.
Title-based video summarization using attention networks.
2022. University of Cincinnati, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659518397170147.
MLA Style (8th edition)
Li, Changwei. "Title-based video summarization using attention networks." Master's thesis, University of Cincinnati, 2022. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659518397170147
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ucin1659518397170147
Download Count:
110
Copyright Info
© 2022, some rights reserved.
Title-based video summarization using attention networks by Changwei Li is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by University of Cincinnati and OhioLINK.