Font Size: a A A

Keyword Extraction Using A Graph-Based Approach

Posted on:2020-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:R K T a r i q u e K h a n Full Text:PDF
GTID:2428330575956326Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Due to the increasing rate of text over the Internet,it is very complicated to retrieve the relevant information regarding to the user.To overcome these types of problems more research work has been done in information retrieval and text analytics so far and it is the trending topic for research regarding the keyword extraction.There are many types of data regarding the observations and analysis such as graphical data and others.Data can also be generated by the user,by considering social media,Wikipedia or any other resources.Most of the people generate their own data by Twitter(social media,considered as one of the most popular platforms for crawling the short text,because it contains 140 characters per tweet).Keyword extraction is a process where a text is given to the computer and the computer returns a set of keywords that recommended topical words and phrases from the content of documents.Keyword extraction helps the reader to understand the summary or at least the core idea of the document without reading the whole document As a result,the prospect readers do not waste their valuable time reading the irrelevant documents comprehensively.Generally,by searching the keywords,users could find related posts to an event.Keyword extraction methods are being applied to many areas especially when we extract key.words in the area of information retrieval.This has a particular interest because people retrieve significant information based on keywords.In this thesis,we have used a graph-based keyword extraction algorithm over four different datasets collected from Twitter on different terms.By the preprocessing of datasets through NLTK we will set more optimized data,and the co-occurrence graph also generated by this dataset.Moreover,we have also shown whether the study of co-occurrences allows keeping track of the structure of each text,however,it is more tedious to handle and often leads to messy visualizations.There are many libraries there for visualization,python is giving more reliability for plotting because it provides many built-in libraries.TextRank algorithm is a graph-based keyword extraction algorithm,it follows the Google PageRank algorithm but somehow it is different from that by the words and links.TextRank calculates the score of every relevant word and by that score,we can find more important words of the corpus,further,it also finds the precision of those relevant words.Word cloud is also enhancing its popularity by the visualization,by its different look there are many word clouds are present over the internet.The data for the experimental evaluation of the proposed work is done by the real data set,crawled from Twitter.
Keywords/Search Tags:Graph-Based
PDF Full Text Request
Related items