Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Title-based video summarization using attention networks

Abstract Details

2022, MS, University of Cincinnati, Engineering and Applied Science: Electrical Engineering.
The rapid advances in video storage, processing and streaming services, improvements of cellular communication speed, enhancement of mobile phone cameras and increase in social media engagement led to explosive growth in the number of videos generated every minute. Therefore, content-based video searching, browsing, and information retrieval technologies have received significant attention in recent years adapting to the massive number of videos generated. Video summarization techniques are among methodologies which can help users browse the video fast and retrieve information more efficiently by either solely extracting key-frames/segments or assembling the important segments further as video skims, highlights or summaries. In this research, the current video summarization pipeline, collected datasets, and related evaluation metrics are reviewed. Furthermore, various video summarization models which rely on the fusion of video title and visual features using attention networks will be proposed and evaluated using publicly available datasets: 1. A baseline video summarization model which uses correlation among visual features of video frames using attention network is studied. The training procedure and evaluation metrics will be compared against similar recent studies. 2. Extracting Video Title embeddings using pre-trained language models, various methodologies for integrating video title information in the baseline model are studied and evaluated. Re-shaping self-attention to cross-attention, a model which takes advantage of correlation among video title and frame visual features is proposed. Given that the correlation of visual frames in long sequences does not necessarily provide video storyline, the fusion of title information in the proposed model improved the video summarization performance as expected. 3. Finally, to further improve the performance of the proposed model, loss function is modified to combine the accuracy of frame-level score predictions and segment-level score predictions. Optimizing the proposed loss, the model tries to predict the frame scores as accurate as possible while not deviating from the segment importance scores desired. The performance of the proposed model increased the summarization performance (F1-Score) by 1.1% on TvSum dataset and 2.2% on SumMe dataset.
Mehdi Norouzi, Ph.D. (Committee Member)
Xuefu Zhou, Ph.D. (Committee Member)
Wen-Ben Jone, Ph.D. (Committee Member)
71 p.

Recommended Citations

Citations

  • Li, C. (2022). Title-based video summarization using attention networks [Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659518397170147

    APA Style (7th edition)

  • Li, Changwei. Title-based video summarization using attention networks. 2022. University of Cincinnati, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659518397170147.

    MLA Style (8th edition)

  • Li, Changwei. "Title-based video summarization using attention networks." Master's thesis, University of Cincinnati, 2022. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1659518397170147

    Chicago Manual of Style (17th edition)