WWW.THESES.XLIBX.INFO
FREE ELECTRONIC LIBRARY - Theses, dissertations, documentation
 
<< HOME
CONTACTS



Pages:   || 2 |

«International Journal of Innovative Research in Computer and Communication Engineering (An ISO 3297: 2007 Certified Organization) Vol. 3, Issue 10, ...»

-- [ Page 1 ] --

ISSN(Online): 2320-9801

ISSN (Print): 2320-9798

International Journal of Innovative Research in Computer

and Communication Engineering

(An ISO 3297: 2007 Certified Organization)

Vol. 3, Issue 10, October 2015

Email Summarization-Extracting Main

Content from the Mail

Mubashir Alam1,Mohit Kakkar2

M.Tech Student, Dept. of CSE, Desh Bhagat University, Mandi Gobindgarh, Punjab, India

Assistant Professor, Dept. of CSE, Desh Bhagat University, Mandi Gobindgarh, Punjab, India

ABSTRACT:A summary of document is a shorter text conveys the most important information from the sources.Summary of the text must contains important information from the documents. This paper presents the design and implementation of a system to summarize e mail messages. The system uses the subject and contents of the e mail message to classify e mails based on user’s activities and generate summary of each incoming message.

KEYWORDS: Email, Email summarization, Summarization, Incoming messages, Spam filtering

I.INTRODUCTION

A summary can be defined as a text that is produced from one or more texts, that contain a significant portion of the information in the original text(s), and that is no longer than half of the original text(s). According to, text summarization is the process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or user) and task (or tasks). Summarization is the process of reducing a text document in order to create a summary that retains the most important points of the original document. As the problem of information overload has grown, and as the quantity of data has increased, so has interest in automatic summarization. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax. An example of the use of summarization technology is search engines such as Google.With the ever increasing popularity of emails, email overload becomes a major problem for many email users. Users spend a lot of time reading, replying and organizing their emails. To help users organize their email folders, many forms of support have been proposed, including Spam filtering, email classification and email visualization. In this paper, we discuss a different form of Support email summarization. The goal is to provide a concise, informative summary of emails contained in a folder, thus saving the user from browsing through each email one by on. Email summarization can also be valuable for users reading emails with mobile devices. Given the small screen size of handheld devices, efforts have been made to redesign the user interface. However, providing a concise summary may be just as important.

–  –  –

One of the common existing methods is to manually archive messages into folders with a view of reducing the number of information objects a user must process at any given time. However, this is an insufficient solution as folder names are not necessarily a true reflection of their content and their creation and maintenance can impose a significant burden on the user [1].There are several examples of email classifiers that attempt to sort out mails into folders, semi-automatically such as:Ishmail [3]: It automatically sorts email messages into folders and orders them by importance.

• Commercial email clients [4]: Most popular commercial email clients like Procmail, Eudora, Mozilla Thunderbird, Microsoft Outlook and Outlook Express also support message filing according to user defined rule sets.

• IBM’s MailCat [5]: It adapts dynamically to observed users’ mail-filling habits and provides a list of three folders most likely to be appropriate for a given message.

–  –  –

Jan Ulrich, Giuseppe Carenini, Gabriel Murray and Raymond Ng,2009,Regression-Based Summarization [03].

In this paper we present a regression-based machine learning approach to email thread summarization. The regression model is able to take advantage of multiple gold-standard annotations for training purposes, in contrast to most work with binary classifiers. We also investigate the usefulness of novel features such as speech acts. This paper also introduces a newly created and publicly available email corpus for summarization research. We show that regressionbased classifiers perform better than binary classifiers because they preserve more information about annotator judgments. Inour comparison between different regression-based classifiers, we found that Bagging and Gaussian Processes have the highest weighted recall.

Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou, 2007,Summarizing Email Conversations with Clue Words[07].

In this paper, we propose a new framework for email summarization. One novelty is to use a fragment quotation graph to try to capture an email conversation. The second novelty is to use clue words to measure the importance of sentences in conversation summarization. Based on clue words and their scores, we propose a method called CWS, which is capable of producing a summary of any length as requested by the user. We provide a comprehensive comparison of CWS with various existing methods on the Enron data set. Preliminary results suggest that CWS provides better summaries than existing methods.





Vishal Gupta, 2010, A Survey of Text Summarization Extractive Techniques[04].

This survey paper is concentrating on extractive summarization methods. An extractive summary is selection of important sentences from the original text. The importance of sentences is decided based on statistical and linguistic features of sentences. Many variations of the extractive approach have been tried in the last ten years. However, it is hard to say how much greater interpretive sophistication, at sentence or text level, contributes to performance. Without the use of NLP, the generated summary may suffer from lack of cohesion and semantics. If texts containing multiple topics, the generated summary might not be balanced. Deciding a proper weight of individual features is very important as quality of final summary is depending on it. We should devote more time in deciding feature weights.

Kirti Bhatia, Dr. Rajendar Chhillar, 2014,A Statistical Approach to perform Web Based Summarization[05].

This research focuses on developing a statistical automatic text summarization approach, K-mixture probabilistic model, to enhancing the quality of summaries. Sentences are ranked and extracted based on their semantic relationships significance values. The objective of this research is thus to propose a statistical approach to text summarization.

In this present work we have defined feature based evaluation approach to perform the document summarization. We have connected the work with web page extraction. In the feature phase, the statistical information is being extracted to perform the summarization.

III.PROBLEM FORMULATION

One of the common existing methods is to manually archive messages into folders with a view of reducing the number of information objects a user must process at any given time. However, this is an insufficient solution as folder names are not necessarily a true reflection of their content and their creation and maintenance can impose a significant burden on the user.

There are several examples of email classifiers that attempt to sort out mails into folders, semi-automatically such

as:

• Ishmail: It automatically sorts email messages into folders and orders them by importance.

• Commercial email clients: Most popular commercial email clients like Procmail, Eudora, Mozilla Thunderbird, Microsoft Outlook and Outlook Express also support message filing according to user defined rule sets.

–  –  –

Vol. 3, Issue 10, October 2015

• IBM’s MailCat [5]: It adapts dynamically to observed users’ mail-filling habits and provides a list of three folders most likely to be appropriate for a given message.

• Magi [6]: This system records each email interaction and uses a learning algorithm to classify new messages based on the user’s prior behavior Email summarization can be viewed as a special case of multi-document (MD) summarization. Radev et al. develop MEAD which gives a score to each sentence based on its similarity to the TFIDF centroid of the whole document set and other properties such as position in a document, sentence length and inter-sentence similarity. Erkan etal. develop the LexPageRank to rank sentences based on the eigenvector centrality. They compute a sentence link-age matrix as the sentence similarity and use this matrix with the well-known PageRank algorithm. Wan et al.generate an affinity graph from multiple documents and use this graph for summarization. They consider both the in-formation richness and the sentence novelty based on the sentence affinity graph. However, MD summarization methods, when applied to emails, do not take into account the key differences between emails and conventional documents. Key differences include the referential structure of conversations, the existence of hidden emails and the high variability of writing styles. Section 6 will compare CWS with MEAD for email summarization. Rambow et al. apply a machine learning approach to email summarization. They use RIPPER as a classifier to determine which sentences should be included in a summary.

Features used for learning include linguistic features, and features describing the email and the threading structure.

Such an approach requires a large number of positive examples and cannot produce summaries with varying length based on the users request. It is also not clear how this approach can handle hidden emails. Section 6 will com-pare CWS with RIPPER.Wan et al. study decision-making summarization for email conversations. Email threading is used.

Among the various sets of features explored, their experiments show that a centroid based method is effective. In our earlier studies, we focus on the re-construction of hidden emails. The focus here is completely different in generating summaries of conversations, regardless of whether there are hidden emails or not. In, we use a precedence graph to reconstruct hidden emails. The fragment quotation graph here is different in at least two ways. First, the nodes are different as a fragment quotation graph creates nodes for both new and hidden fragments. More importantly, the edges in a precedence graph capture textual ordering of the nodes within one hidden email, whereas the edges in a fragment quotation graph reflect the referential relationship among multiple emails. As for extracting conversations, Yeh et al.

study how to use quotation matching to construct email threads. Their experiments show a higher recall than the header-based threading method. This supports our use of the quotation graph as a representation of email conversation.

Shrestha et al. propose methods to automatically identify the question-answer pairs from an email thread. Their method may be useful in building the conversation structure for the purpose of email summarization. Agrawal et al. extract social networks from newsgroups. Stolfo et al. study the behaviour model of email users based on the social network analysis among email correspondences. They develop an email mining toolkit and use it to identify target emails without analysing the email content.We have designed and developed a system that summarizes email messages and also groups emails into activities.

IV.OBJECTIVE

Summarization is a hard problem of Natural Language Processing because, to do it properly, one has to really understand the point of a text. This requires semantic analysis, discourse processing, and inferential interpretation (grouping of the content using world knowledge). The last step, especially, is complex, because systems without a great deal of world knowledge simply cannot do it. Therefore, attempts so far of performing true abstraction--creating abstracts as summaries--have not been very successful. Fortunately, however, an approximation called extraction is more feasible today. To create an extract, a system needs simply to identify the most important/topical/central topic(s) of the text, and return them to the reader. Although the summary is not necessarily coherent, the reader can form an opinion of the content of the original. Most automated summarization systems today produce extracts only. Summarist is an attempt to develop robust extraction technology as far as it can go and then continue research and development of techniques to perform abstraction. This work faces the depth vs. robustness tradeoff: either system analyze/interpret the input deeply enough to produce good summaries (but are limited to small application domains), or they work robustly over more or less unrestricted text (but cannot analyze deeply enough to fuse the input into a true summary, and hence perform only topic extraction). In particular, symbolic techniques, using parsers, grammars, and semantic representations, do not scale up to real-world size, while Information Retrieval and other statistical techniques, being based on word counting and word clustering, cannot create true summaries because they operate at the word (surface)

–  –  –

Vol. 3, Issue 10, October 2015 level instead of at the concept level.To date, summarist produces extract summaries in five languages (and has been linked to translation engines for these languages in the Must system). Work is underway both to extend the extractbased capabilities of summarist and to build up the large knowledge collection required for inference-based abstraction.

V.METHODODLOGY

The steps included in the methodology are given as  The System will first parses the query language in natural language and finds the major parts in the string.

 Then first it will look for the table and then it parses the string.

 After parsing it will construct the parse tree of the abstracted symbols.

 Once the parse tree is generated will analyze the prioritization and the frequency of the abstracted symbols.

 All these symbols and keywords will be documented in a table.

 Now we will analyze the user requirement of summarization.

 Finally we will extract all the sentences having the same keywords respective to the priority and the user requirement.



Pages:   || 2 |


Similar works:

«TÉRMINOS Y CONDICIONES DEL ANUNCIANTE – SERVICIO DE EMAIL MARKETING DE CRITEO Criteo S.A., sociedad constituida conforme a la ley francesa e inscrita en el registro mercantil de París con el número 484 786 249 y capital social de 1.179.646,70 €, cuya sede principal se encuentra en 32 Rue Blanche, 75009 París, Francia (Criteo) y el Cliente acuerdan los siguientes Términos y condiciones (“Términos y condiciones”) relativos a la provisión del servicio de email marketing. Criteo S.A....»

«Gaining Weight 101 Tips To Gain Weight for The Skinner Guy Legal Notice:The author and publisher of this Ebook and the accompanying materials have used their best efforts in preparing this Ebook. The author and publisher make no representation or warranties with respect to the accuracy, applicability, fitness, or completeness of the contents of this Ebook. The information contained in this Ebook is strictly for educational purposes. Therefore, if you wish to apply ideas contained in this Ebook,...»

«United States Court of Appeals For the Eighth Circuit _ No. 16-1432 _ United States of America lllllllllllllllllllll Plaintiff Appellee v. Emily Protsman lllllllllllllllllllll Defendant Appellant Appeal from United States District Court for the Northern District of Iowa Cedar Rapids Submitted: June 13, 2016 Filed: July 21, 2016 Before RILEY, Chief Judge, MURPHY and SHEPHERD, Circuit Judges. SHEPHERD, Circuit Judge. Emily Protsman appeals the district court’s1 revocation of her term of...»

«Revista AquaTIC, nº 37 – 2012 90 Revista científica de la Sociedad Española de Acuicultura Revista AquaTIC, nº 37, pp. 90-98. Año 2012 ISSN 1578-4541 Financiación de la investigación en la acuicultura española Mª Montserrat Cruz González, Francisco Javier Sánchez Sellero Departamento de Organización de Empresas y Marketing. Facultad de Ciencias Económicas y Empresariales. Universidad de Vigo (España) E-mail: mcruz@uvigo.es Resumen En este artículo caracterizamos el proceso de...»

«INSTALLATION INSTRUCTION Duorail™ System Low Voltage LM2-T4, LM2-T8 SAFETY INSTRUCTION • Read all instructions.• Turn off power at main switch before installing or modifying the system.• Do not install the system: • -Within six inches of any curtain or combustible materials • -Less than 5 feet above a floor • -In a damp or wet location • -Concealed, or extended through building walls. • After first 1/2 hour of operation, switch off and check all connections for excessive heat....»

«Activity Guide  for  Shanté Keys and  the New Year’s  Peas Book by Gail Piernas-Davenport & illustrated by Marion Eldridge © 2007 Albert Whitman & Company Guide Compiled by Gail Piernas-Davenport for Educators in Grades K-4 © 2008 2010 Based on the State of Illinois Learning Standards Multicultural Resources for a Multicultural World         Contents     Page  Introduction  2  Subject Area Activities     Fine Arts    o Party Hat  3  o...»

«A PROCESS FOR RISK-INFORMED DECISION-MAKING Gareth W. Parry Michele Laur Michael D. Tschiltz Office of Nuclear Reactor Regulation U. S. Nuclear Regulatory Commission Washington DC, 20555 Susan E. Cooper Michael C. Cheok Office of Nuclear Regulatory Research U. S. Nuclear Regulatory Commission Washington DC, 20555 Evelyn Wight WPI Gaithersburg, MD, 20879 SUMMARY The use of PRA results in decision-making has been addressed in Regulatory Guide 1.174, which describes an integrated risk-informed...»

«AIRBRUSH 101. A simple guide to understanding airbrush terminology, types of airbrushes, and their recommended uses This guide is offered to help airbrush users select the best airbrush for their application, and to provide important usage/maintenance information. Comprised and provided compliments of BADGER AIR-BRUSH CO. “Your own personal airbrush department” AIRBRUSH TERMINOLOGY, TYPES, SELECTION, AND OTHER BASIC INFO ACTION – refers to trigger functions of the airbrush SINGLE ACTION...»

«1 CRAWFORD’S “EPILOGUE OF AN EPILOGUE,” WRITTEN SPECIFICALLY FOR THIS GUEST ARTIST LINK “PRESENT AND FUTURE LIGHT” (A further riff and up, of a rich up art... and musings on a life in transit) by Jerry L. Crawford There should be no sequel to a MemoirPAST LIGHT was released in 2012. Since then, two friends reacted to my Memoir with comments that inspired me to add a brief, still fresher look at my view of the present and future. Playwright and close friend, Julie Jensen, spent a...»

«August LAKE VILLA School District #41 Newsletter District 41 serves the communities of Lake Villa, Venetian Village, West Miltmore, Ingleside, Lindenhurst, Round Lake Beach and Round Lake Heights. Mission Statement: To empower all students to thrive as life-long learners who are confident, socially conscious, and prepared to meet the challenges of a global society. Mensaje del Superintendente Bienvenido de nuevo Lake Villa # 41 familias! Una vez más, gracias por invitarme a su comunidad y dar...»

«Composition of bulk silicate Earth and global geodynamics Jun Korenaga Department of Geology and Geophysics Yale University March 23, 2007 @ Hawaii Geoneutrino Workshop Overview • Motivation: Thermal evolution of Earth • Global mass balance and the composition of bulk silicate Earth (BSE) • Thermal evolution revisited • Average plate velocity ~4cm/yr • Global heat flux ~44TW Q. How was it like in the past? Global heat balance equation T average internal temperature C heat capacity of...»

«The Scrum Papers: Nut, Bolts, and Origins of an Agile Framework Scrum, Inc. One Broadway, 14th Floor Cambridge, MA 02142 Jeff.sutherland@scruminc.com Version 1.1 – 2 Apr 2012 Dedication 3 Introduction 5 Forward: Ikujiro Nonaka and The Scrum Way 8 Chapter 1: Introduction to Scrum 10 Scrum Primer Version 1.2 11 Rolling out Agile at a large Enterprise 33 Capturing Extreme Business Value: 1000% Annual Return on Investment in Scrum Trainers. 44 Chapter 2: The First Scrum 47 Agile Development:...»





 
<<  HOME   |    CONTACTS
2016 www.theses.xlibx.info - Theses, dissertations, documentation

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.