News Graph

Key information extration from text and graph visilization. Inspired by TextGrapher.

Project Introduction

How to represent a text in a simple way is a chanllenge topic. This peoject try to extraction key information from the text by NLP methods, which contain NER extraction, relation detection, keywords extraction, frequencies words extraction. And finally show the key information in a graph way.

How to use

Utilizes spaCy to extract named entities such as persons, organizations, and locations from news articles. Relationship Extraction: Identifies relationships between entities using pretrained model and generates JSON files representing node-edge relationships.

each model and NER code genrate seprate json files
then we find the differnce between two json files and genrate a final json file.
This json file will be used for BFS seacrhing and ploting of graph
This Final json file have the ner TAG,NODE LABEL,and relationship between nodes.

from news_graph import NewsMining
content = 'Input you text here'
Miner = NewsMining()
Miner.main(content)

This will generate the graph.html.

Overview of model and NER extraction

Example Demo

Node coloring

Red:Location
Blue:Person
Green:organization
Grey:other

Loading the SpaCy Model

The following line initializes the SpaCy language model for English language processing:

nlp = spacy.load('en_core_web_lg')

The model loaded here is 'en_core_web_lg', which is a large English language model trained on web text data.

Defining the NewsMining Class

The code defines a Python class named NewsMining, encapsulating functionality related to news mining: Initializing the NewsMining Class The constructor method (init) initializes various attributes of the NewsMining class:

Additional Methods

The code snippet also includes additional methods such as clean_spaces, remove_noisy, and collect_ners, which perform tasks like cleaning text, removing noisy characters, and collecting named entities, respectively.

Explanation of Python Code for News Mining Extracting Triples The extract_triples method takes a sentence as input and returns Subject-Verb-Object (SVO) triples:

def extract_triples(self, sent):
    svo = []
    tuples = self.syntax_parse(sent)
    child_dict_list = self.build_parse_chile_dict(sent, tuples)
    for tuple in tuples:
        rel = tuple[-1]
        if rel in self.SUBJECTS:
            sub_wd = tuple[1]
            verb_wd = tuple[3]
            obj = self.complete_VOB(verb_wd, child_dict_list)
            subj = sub_wd
            verb = verb_wd.text
            if not obj:
                svo.append([subj, verb])
            else:
                svo.append([subj, verb + ' ' + obj])
    return svo

Extracting Keywords

The extract_keywords method extracts the top 10 keywords from a list of word-postag pairs:

def extract_keywords(self, words_postags):
        return self.textranker.extract_keywords(words_postags, 10)

Main Method for News Mining

The main method is a placeholder for the main functionality of news mining:

Getting Extracted Events and NER Results

The get_events method returns the extracted events and Named Entity Recognition (NER) results:

def get_events(self):
        return self.events, self.result_dict

Instantiating the NewsMining Class The NewsMining class is instantiated as news_miner:

Call the class object

news_miner = NewsMining()

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
KG-graph		KG-graph
VIS/dist		VIS/dist
__pycache__		__pycache__
app/__pycache__		app/__pycache__
BFS.py		BFS.py
LICENSE		LICENSE
README.md		README.md
app.py		app.py
config_app.yaml		config_app.yaml
data.pkl		data.pkl
data.txt		data.txt
diff.json		diff.json
difference.py		difference.py
docker-compose.debug.yml		docker-compose.debug.yml
docker-compose.yml		docker-compose.yml
draw.py		draw.py
events.json		events.json
events.txt		events.txt
final.json		final.json
find_ner.py		find_ner.py
flow.png		flow.png
grap.png		grap.png
grap02.png		grap02.png
graph.html		graph.html
graph_color_put.html		graph_color_put.html
graph_coloring.json		graph_coloring.json
graph_data.json		graph_data.json
graph_data_from_kg.json		graph_data_from_kg.json
graph_show.html		graph_show.html
graph_show.py		graph_show.py
graph_visualization.html		graph_visualization.html
json_form.py		json_form.py
main.py		main.py
main_kg.py		main_kg.py
matlab.py		matlab.py
merge_json.py		merge_json.py
ner_plot.py		ner_plot.py
news_graph.py		news_graph.py
news_list.pkl		news_list.pkl
open_pickle.py		open_pickle.py
plot_mat.py		plot_mat.py
query_data.json		query_data.json
query_graph.json		query_graph.json
result_dic.json		result_dic.json
test.html		test.html
test_draw.py		test_draw.py
test_json.json		test_json.json
test_two.html		test_two.html
textrank.py		textrank.py
updated_file_two.json		updated_file_two.json
updated_file_two_final.json		updated_file_two_final.json
weight.py		weight.py

License

kernel-loophole/KG-graph

Folders and files

Latest commit

History

Repository files navigation

News Graph

Project Introduction

How to use

Overview of model and NER extraction

Example Demo

Node coloring

Loading the SpaCy Model

Defining the NewsMining Class

Additional Methods

Extracting Keywords

Main Method for News Mining

Getting Extracted Events and NER Results

Call the class object

About

Topics

Resources

License

Stars

Watchers

Forks

Languages