FREE ELECTRONIC LIBRARY - Theses, dissertations, documentation

Pages:   || 2 | 3 | 4 |

«Caption Analysis and Recognition For Building Video Indexing Systems Fu Chang†, Guey-Ching Chen†, Chin-Chin Lin†‡ and Wen-Hsiung Lin† † ...»

-- [ Page 1 ] --

Caption Analysis and Recognition

For Building Video Indexing Systems

Fu Chang†, Guey-Ching Chen†, Chin-Chin Lin†‡ and Wen-Hsiung Lin†

Institute of Information Science, Academia Sinica

Dept. of Electrical Engineering, National Taipei University of Technology, Taipei, Taiwan

E-Mail: {fchang, ching64, erikson, bowler}@iis.sinica.edu.tw


In this paper, we propose several methods for analyzing and recognizing Chinese video captions, which constitute a very useful information source for video content.

Image binarization, performed by combining a global threshold method and a window-based method, is used to obtain clearer images of characters; and a caption-tracking scheme is used to locate caption regions and detect caption changes.

The separation of characters from possibly complex backgrounds is achieved by using size and color constraints, and by cross-examination of multi-frame images. To segment individual characters, we use a dynamic split and merge strategy. Finally, we propose a character recognition process using a prototype classification method, supplemented by a disambiguation process using support vector machines, to improve recognition outcomes. This is followed by a post-process that integrates multiple recognition results. The overall accuracy rate for the entire process applied to test video films is 94.11%.

Key Words: background removal, caption tracking, character segmentation, character recognition, image binarization, support vector machines, prototype classification 1 Introduction The rapid increase in the use of digital videos in recent years has raised the need for an archival storage system. Such a system, in turn, requires an effective indexing technique for retrieving and browsing video content (Aslandogan and Yu 1999). Since video captions are rich sources of information, caption-based retrieval has become a popular focus for research into video content retrieval.

The aim of this paper is to provide systematic methods for analyzing and recognizing video captions. We limit ourselves to finding horizontal captions of a single color that stay the same throughout a number of successive frames. The video captions, moreover, are in Chinese and most characters in the captions are either light with dark borders, or dark with light borders. Chinese characters are unique in three ways. First, there are thousands of commonly used characters in Chinese, compared to the 26 capital and 26 lower-case letters in English.

Second, Chinese characters are much more complicated than English letters. Third, many Chinese characters are composed of more than one component.

To deal with these complexities, we must solve the following problems. First, we must locate text regions in video frames. Second, to separate text from background, we must first binarize video images; that is, turn colored pixels into black and white pixels. Third, because captions extend across frames, we need to detect caption changes within successive frames.

Fourth, as video captions are often embedded in complicated backgrounds, we need to solve the background-foreground separation problem. Fifth, character segmentation is crucial for satisfactory character recognition. Sixth, we require a classifier that recognizes each individual character obtained in the segmentation process. Finally, after character recognition, we need a post-processing method to select the best possible recognition results from successive frames that contain the same text. In general, Chinese characters require delicate treatment in the image binarization, character segmentation, and recognition processes. However, Chinese captions are easier to locate than Western language captions because Chinese characters are often more complicated than Western letters.

The remainder of this paper is organized as follows. In Section 2, we discuss related works and the main concepts of our proposed methods. In Section 3, we propose a caption-tracking scheme for acquiring spatial and temporal information from video frames. Section 4 introduces a method for removing backgrounds, and Section 5 describes character segmentation. This is followed, in Section 6, by a description of character recognition. In Section 7, we present a post-processing scheme that integrates multiple recognition results. Section 8 contains the end-to-end performance of the entire caption analysis and recognition process. Finally, in Section 9, we present our conclusion and discuss four major types of error in our results.

2 Background In this section, we discuss related works for analyzing and recognizing video captions and present our main ideas for solving the seven major problems listed in the introduction.

For an overview of caption analysis and recognition, readers can refer to the survey papers of Lienhart (2003) and Doermann et al. (2003). Other references can be found on the following website: http://www.videoanalysis.org/Research_Topics/Text_Localization__Text_Segmen/ text_localization__text_segmen.html.

There are a number of ways to find text in video frames, including those that classify textual and non-textual regions by their textural features (Kim et al. 2000; Li et al. 2000;

Lienhart and Wernicke 2002), by connected-components (Antani et al. 2000; Lin et al. 2001;

Jain and Yu 1998; Shim JC et al. 1998), and by the spatial distribution of object edges (Kuwano et al. 2000; Smith and Kanade 1997; Wu et al. 1999). Kuwano et al. propose a method to identify text in individual video frames that overcomes some shortcomings found in the method proposed by Smith and Kanade (1997). In this method, they first calculate color gradients between neighboring pixels to obtain edges, and then examine adjacent edges with opposite gradient signs to obtain edge-pairs. Text regions can be detected by checking the spatial properties of edge-pairs. Meanwhile, Wu et al. (1999) use Gaussian filters to extract character strokes from text regions and then group these strokes into rectangular bounding boxes. This approach is appropriate for high-resolution scanned images, but not for video images that are often in low resolution. Kuwano et al’s method, on the other hand, is suitable when a significant contrast exists between characters and their background. We focus on this aspect because dark Chinese characters usually have light fringes, and vice versa. Furthermore, we use the edge detection method proposed by Kuwano et al. to locate text regions because of its lower processing cost and high accuracy rate.

To transform color video images into binary images, some works combine locally adaptive thresholding and global thresholding to obtain better binary results (Kamada and Fujimoto 1999; Li and Doermann 1999). These works, however, do not have a quantitative basis for evaluating their performance. Our solution to the binarization problem combines the use of the global thresholding method proposed by Otsu (1979) and a window-based thresholding method proposed by one of the authors of this paper (Chang et al. 1999, Chang, 2001). To evaluate the performance of our hybrid thresholding method, we derive two classifiers from our machine learning method. One classifier uses as training samples the caption characters binarized using Otsu’s method, and the other uses the characters binarized using our hybrid method. We apply each classifier to a set of test samples that has been binarized using the corresponding thresholding method; that is, either Otsu’s method or the hybrid method. Recognition accuracy constitutes the quantitative basis for evaluating both Otsu’s and our binarization methods.

For detecting caption changes, Sato et al. (1999) propose a temporal segmentation method, while Lin et al. (2001) propose a method that compares character contours across successive frames. Li et al. (2000) compare intensity and texture across successive frames, and Lienhart and Wernicke (2002) compare vertical and horizontal projection profiles. The last two approaches track stationary as well as moving captions. Our approach, however, is limited to stationary captions. For this purpose, we improve on Lin et al’s method by using images derived from both character contours and edge pairs, since these two features are well preserved by the above mentioned sharp contrast between characters and their fringes.

For background removal, some methods use spatial characteristics to filter non-text regions (Kuwano et al. 2000; Wong and Chen 2000; Wu et al. 1999), while others remove non-text regions by means of the temporal redundancy of captions (Hua et al. 2002; Sato et al.

1999). Although some approaches combine spatial characteristics and temporal redundancy to remove non-textual objects, they have difficulty dealing with non-textual objects that are similar to textual objects in spatial characteristics (Lin et al. 2001; Shim et al. 1998). Lienhart and Wernicke (2002) combine text color and temporal redundancy to remove non-textual objects. In this paper, we employ a three-stage process that gradually removes background pixels via spatial characteristics, color constraints and multi-frame information. The accuracy rate increases dramatically, from 30.27% to 92.85%, as we proceed from the first to the last stage.

For character segmentation, Lu (1995) describes algorithms for character segmentation, some of which rely solely on pixel projection profiles. These algorithms achieve good segmentation results for Western letters. Lee et al. (1996) use recognition results to select the most suitable segmented points on gray-scale images. Sato et al. (1999) first segment characters by way of a vertical projection profile, and then use a character recognizer to refine the segmented candidates. For Chinese characters that are composed of an indefinite number of components, we propose an algorithm that performs dynamic split and merge operations based on certain parameters derived from the textlines containing those characters. Although this method does not require any recognition process, it achieves a very high accuracy rate of 97.87%.

Most works on video captions do not provide a classification method for recognizing caption characters. Many rely on a commercial recognizer to conduct recognition (Lienhart and Effelsberg 2000; Lienhart and Wernicke 2002; Sato et al. 1999; Wu et al. 1999). To build a classifier based on machine learning methods using caption characters as part of the training data, we propose a two-stage method that combines a prototype classification method with the support vector machine (SVM) method (Vapnik 1995). SVM is an effective classification method, but is extremely slow in the training process when the number of class types (character types, in our case) is large. The two-stage method rectifies this by decomposing the original problem into two sub-problems. Experiment results show that this method achieves comparable accuracy rates on test data to the SVM and the nearest neighbor (NN) method (Dasarathy, 1991), but has a much lower training cost than SVM and runs faster than both SVM and NN in the testing process.

There are two methods for character recognition post-processing: selection of the best possible recognition results from successive frames (Mita and Hori 2001), and recognition outcome refinement via a dictionary (Sato et al. 1999). The first method works for our application, but the second is not so easily adapted to the Chinese language without first solving the parsing problem, since Chinese words are not separated by white spaces like words in Western languages. For the first method, a recognition score is assumed to exist. In our adaptation, we use ranks rather than the recognition scores of candidates.

As well as providing solutions to the stated problems, this paper provides a systematic solution for detecting, segmenting and recognizing video captions. Not many works in the literature present such a complete solution. To the best of our knowledge, the only exceptions are Sato et al. 1999 and Lin et al. 2001. Sato et al. focus on the Roman alphabet, so its solutions for character segmentation and recognition are not suitable for Chinese characters. Although Lin et al’s work focuses on Chinese captions, it lacks a systematic method for character segmentation. Also, its character recognizer is restricted to characters of a single font. Our method has a multiple-font recognizer constructed from machine learning techniques, and has a 94.11% recognition rate obtained from 26 test films containing 114,590 characters, compared to Lin et al’s 84.1% obtained from 6 test films containing 7,818 characters.

3 Caption Tracking Caption tracking is comprised of three stages: The first locates text regions within single video frames using spatial edge information; the second uses a binarization method to classify pixels within a text region as either foreground, or background; and the third stage uses previously obtained binarized images and edge information to detect caption changes across successive frames.

3.1 First Stage – Locating Caption Regions We choose the edge-detection method proposed by Kuwano et al. (2000) to locate caption regions. This method calculates color gradients between neighboring pixels. If the gradient is larger than a threshold T (T = 140), it is regarded as an edge. When two adjacent edges with opposite gradient signs (i.e., one has a positive and the other a negative gradient) are found within a certain distance D, they form an edge-pair. Parameter D is set at (frame width)× (20/352). The factor 20/352 stems from the requirement that if the frame width is 352, D must be 20. When a horizontal scan line contains at least M edge-pairs, it is said to be an M-line. M is set at (frame height)×(6/240). When more than N (N = 10) contiguous M-lines exist, an area consisting of these M-lines is regarded as a caption region.

Figure 1a is an image captured from one of our test video frames. The area of dense edge-pairs is marked with a white rectangle. The number of edge-pairs per scan line is shown in Figure 1b.

–  –  –

Pages:   || 2 | 3 | 4 |

Similar works:

«Reminder: This page may be searched by using the Ctrl-F command. Abbey Steven 120 S. LaSalle St IL 60601 Chicago DISCLOSED CLIENTS J. P. Morgan Chase Bank, NA Abbinante Anthony 333 W. Wacker Drive, Suite 1100 IL 60606 Chicago DISCLOSED CLIENTS Diageo NA Acosta Rolando 6336 N. Cicero Avenue., Suite 202 IL 60646 Chicago DISCLOSED CLIENTS Alef Park Property, LLC Bond Companies BP Bovis Global Alliance Chicago Mercantile Exchange, Inc. Clovis Investments, LLC Damen Capital, LLC Elizabeth Santos...»

«LA SIGUIENTE LISTA DE PREGUNTAS (Frequently asked questions) CON SUS CORRESPONDIENTES RESPUESTAS CONSTITUYEN UN ANEXO A LAS RECOMENDACIONES PREPARADAS POR EL CONSEJO GENERAL DE LA ABOGACIA PARA QUIENES EJERCEN LA PROFESION ¿Qué es el blanqueo de capitales? Tradicionalmente, la introducción en el mercado de dinero y bienes de procedencia ilícita. La nueva normativa ha incorporado, la utilización o mera posesión de esos bienes. Igualmente, a los solos efectos administrativos, los bienes...»

«1 DOI: 10.1590/TEM-1980-542X2015v213705 Revista Tempo | 20150000 English presence in the Brazilian Empire: Edward Johnston & Co. and the exports trade, 1842–1852 Carlos Gabriel Guimarães[1] Abstract This study aimed at analyzing the organization and the exporting activity of the British trade firm, Edward Johnston & Co. in Brazil, from 1842 to 1852. When arriving in Rio de Janeiro, in 1821, Edward Johnston worked at the British trade firm, F. Le Breton & Co. In 1827, as the manager of the...»

«ILLUSTRATED GUIDE TO THE TIBETAN TERRIER Published by the Tibetan Terrier Club of America, Inc., a member of the American Kennel Club. Text by the Illustrated Guide Committee: Jacque Liewer, Chairman, Margy Pankiewicz and Gary Carr. Permission granted to use the Tibetan Terrier Illustrations By Angela Mulliner of Great Britain 1999. Illustration from CANINE TERMINOLOGY, by Harold R. Spira @ 1982 Used with permission from Dogwise Publishing, www.dogwise.com. Original drawings are the property of...»

«Cloaking the Sexual Self: Understanding Queer Fans’ Strategic Participation within Hip Hop Culture Lukmon Babajide May 4, 2013 Acknowledgements I would like to express my special thanks and gratitude to a few people. This paper would not be the polished product that it is today if it were not for these key contributors. First, I would like to give a special thanks to my thesis advisor and my primary reader Dr. Taft for encouraging my research and helping me throughout the whole process....»

«Translation Between Database Query Languages Christian Dias Løgberg Kongens Lyngby 2008 IMM-MASTER-2008-29 Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 reception@imm.dtu.dk www.imm.dtu.dk Abstract Relational algebra and relational calculus are the mathematical foundations of modern database usage. However, it is not possible to use the mathematical concepts directly in a production...»

«DAIRE MEETS EVER a short story by Alyson Noël When she’s not crossing and uncrossing her legs, fussing with her cuticles, and/or riffling through her bag, Jennika tries to engage me in a conversation I’d rather not have. A wad of gum smacking between her back teeth when she says, “Reminds me of Vegas.” I survey the place. Taking in walls glossed with pale green paint—the color of cucumber meat—the color of calm—along with worn linoleum grey tiled floors left to bubble in the...»

«CASE NUMBER: 07/2016 DATE OF HEARING: 09 MARCH 2016 JUDGMENT RELEASE DATE: 07 JUNE 2016 HASANE COMPLAINANT Vs TALK RADIO 702 RESPONDENT TRIBUNAL: JUDGE R MOKGOATLHENG (CHAIRPERSON) PROF HENNING VILJOEN (DEPUTY CHAIRPERSON) PROF VICTORIA BRONSTEIN FOR THE COMPLAINANT: The complainant was present. FOR THE RESPONDENT: Ms Khahliso Mochaba, Group Human Capital and Regulatory Affairs Executive, Primedia Broadcasting, accompanied by Mr Tebogo Mokoena, Regulatory Affairs Officer. _ Complaint that...»

«SOCIAL CAPITAL ASSESSMENT TOOL Anirudh Krishna and Elizabeth Shrader1 Prepared for the Conference on Social Capital and Poverty Reduction The World Bank Washington, D.C. June 22-24, 1999 Respectively ak36@cornell.edu and Eshrader@worldbank.org. Given an extremely short deadline for putting this paper together, we each decided to take the primary responsibility for separate parts. Part I of this paper has been largely drafted by Krishna, while Part II is mostly Shrader’s contribution....»

«POLISH POETRY JULIA HARTWIG RYSZARD KRYNICKI EWA LIPSKA PIOTR MATYWIECKI KRYSTYNA MIŁOBĘDZKA JAN POLKOWSKI JAROSŁAW MAREK RYMKIEWICZ PIOTR SOMMER BOHDAN ZADURA ADAM ZAGAJEWSKI The poems presented in this catalogue were born out of the conviction that words are free, and that poetry is the word used justly. THE POLISH BOOK INSTITUTE INSTYTUT KSIĄŻKI ul. Zygmunta Wróblewskiego 6 31-148 Kraków t: +48 12 61 71 900 f: +48 12 62 37 682 office@bookinstitute.pl bookinstitute.pl Warsaw Section...»

«United States Recreation Department of Agriculture Opportunity Forest Service Guide Hiking Pacific Northwest Region Umpqua National Forest Cottage Grove Ranger District UMP-13-03 Table of Contents GENERAL INFORMATION Environmental Care Drinking Water Ecosystem Management Trail Tips and Etiquette ATV Use Permitted HIKING Adams Mountain #1419 Bohemia Mountain #1440 Bohemia National Recreation Trail #1407 Brice Creek Trail #1403 Crawfish Trail #1421 Crawfish Lake Trail #1409 Fairview Creek Trail...»

«EN EL TRIBUNAL SUPREMO DE PUERTO RICO In re: 2012 TSPR 81 185 DPR Osvaldo Pérez Marrero Número del Caso: CP-2010-4 Fecha: 18 de abril de 2012 Abogado de la Parte Querellada: Lcdo. Plinio Pérez Marrero Oficina de la Procuradora General: Lcda. Irene Soroeta Kodesh Procuradora General Lcda. Edna Evelyn Rodríguez Procuradora General Auxiliar Materia: Conducta ProfesionalLa suspensión será efectiva el 24 de abril de 2012 fecha en que se le notificó al abogado de su suspensión inmediata....»

<<  HOME   |    CONTACTS
2016 www.theses.xlibx.info - Theses, dissertations, documentation

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.