NLP-knowledge-graph

Data Source The articles from HSBC website

Step 1：Grab the text on the example url

Find the article in https://insights.hsbc.co.uk/content/hsbc/gb/en_gb/wealth/insights/macro-outlook/china-insights/china-insights-2019-09-11/ In this html file, clean up some taggings and extract useful information (the content of the article) I got some tabs content in the header and footer, but removed them in the dataframe Finally, I splitted the sentences from the whole article and make a list of sentences.

Step 2：Text pre-processing

In the pre-processing procedure, I set some basic rules to make the text format consistent. For example, convert all text in lowercase, remove space and stemming. I didn't remove numbers and punctuation here because there are some percentage numbers in the article. After the pre-processing, I generate a list of words for POS tagging.

Step 3：Visualize the text tree

Construct the tree of text and make sure the POS tagging and the dependency going a right direction. We can also visualize the relationship with layer.

Step 4：Name Entities Recognition

By using spacy, build up a NLP model with 'en_core_web_sm'. Put the passage in the model and process NER (Name Entities Recognition). Finally generate the label and dependency of each entity.

Step 5：Recognize the name entities using the whole article

Using the spacy displacy function to recognize all dependency and name entities' label.

Step 6：set up a get_entities and get_relation function to get the nodes and edge of the knowledge graph

(1) get_entities In this function, get the names from the sentence. I do some checkings on whether it's compound noun or modifier here, to make sure not missing a part of the name. Then identify the subject and object and put them into a list.

(2) get_relation In this function, get the dependency pattern and match the entities we've found above.

Step 7：Construct the knowledge graph

Set up a dataframe to store the nodes and edge of the graph. Draw the knowledge graph with the Network library

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
Revised Knowledge Graph.ipynb		Revised Knowledge Graph.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-knowledge-graph

About

Releases

Packages

Languages

dantelok/NLP-knowledge-graph

Folders and files

Latest commit

History

Repository files navigation

NLP-knowledge-graph

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages