Natural Language Processing with Python NLTK part 6

Natural Language Processing with Python NLTK part 6 - Named Entity Recognition

Natural Language Processing - NER

Named entities are specific reference to something. As a part of recognizing text NLTK has allowed us to used the named entity recognition and recognize certain types of entities. Those types are as follows

NE Type	Examples
ORGANIZATION	Georgia-Pacific Corp., WHO
PERSON	Eddy Bonte, President Obama
LOCATION	Murray River, Mount Everest
DATE	June, 2008-06-29
TIME	two fifty a m, 1:30 p.m.
MONEY	175 million Canadian Dollars, GBP 10.40
PERCENT	twenty pct, 18.75 %
FACILITY	Washington Monument, Stonehenge
GPE	South East Asia, Midlothian

Source: http://www.nltk.org/book/ch07.html

Simple example on NER:

import nltk
from nltk.tokenize import word_tokenize, sent_tokenize

para = " America is a country. John is a name. "

sent = sent_tokenize(para)

for s in sent:
    word = word_tokenize(s)
    tag = nltk.pos_tag(word)
    namedEntity = nltk.ne_chunk(tag)
    namedEntity.draw()

The two sentences will be tagged and the named entities will be identified by the Python NLTK library. The result is like this.

The NLTK identifies America and John as named entities. What if the named entity has two words like Sri Lanka, Saudi Arabia etc. This we use a simple things to get that as a single named entity. That is enabling binaries.

So for this I have changed the sentence of the previous code to para = " Saudi Arabia is a country. John Peters is a name. "

para = " Saudi Arabia is a country. John Peters is a name. "

sent = sent_tokenize(para)

for s in sent:
    word = word_tokenize(s)
    tag = nltk.pos_tag(word)
    # making binary = true
    namedEntity = nltk.ne_chunk(tag, binary=True)
    namedEntity.draw()

The output without enabling binary is as follows.

The result after making binary = True.

You can see that all the named entities are grouped together.

Some Stuff

Search This Blog

Natural Language Processing with Python NLTK part 6 - Named Entity Recognition

Natural Language Processing - NER

Labels

Popular posts from this blog

Natural Language Processing with Python NLTK part 5 - Chunking and Chinking

Natural Language Processing with Python NLTK part 2 - Stop Words

Natural Language Processing with Python NLTK part 4 - PoS tagging