Skip to main content

Natural Language Processing with Python NLTK part 4 - PoS tagging

Natural Language Processing 


PoS tagging or Part of Speech tagging is a commonly used mechanism. This will allow NLTK to tag the words that is in your corpus and give the tags accordingly. There are many tags predefined by the NLTK and here are the list.

Number
Tag
Description
1.
CC
Coordinating conjunction
2.
CD
Cardinal number
3.
DT
Determiner
4.
EX
Existential there
5.
FW
Foreign word
6.
IN
Preposition or subordinating conjunction
7.
JJ
Adjective
8.
JJR
Adjective, comparative
9.
JJS
Adjective, superlative
10.
LS
List item marker
11.
MD
Modal
12.
NN
Noun, singular or mass
13.
NNS
Noun, plural
14.
NNP
Proper noun, singular
15.
NNPS
Proper noun, plural
16.
PDT
Predeterminer
17.
POS
Possessive ending
18.
PRP
Personal pronoun
19.
PRP$
Possessive pronoun
20.
RB
Adverb
21.
RBR
Adverb, comparative
22.
RBS
Adverb, superlative
23.
RP
Particle
24.
SYM
Symbol
25.
TO
to
26.
UH
Interjection
27.
VB
Verb, base form
28.
VBD
Verb, past tense
29.
VBG
Verb, gerund or present participle
30.
VBN
Verb, past participle
31.
VBP
Verb, non-3rd person singular present
32.
VBZ
Verb, 3rd person singular present
33.
WDT
Wh-determiner
34.
WP
Wh-pronoun
35.
WP$
Possessive wh-pronoun
36.
WRB
Wh-adverb

The python code is as easy as it was with the earlier cases.



import nltk
from nltk.tokenize import word_tokenize

sent = "This is about the life. LIfe is awesome."
sent_words = word_tokenize(sent)
print(nltk.pos_tag(sent_words))


sent_two = "run work give shoot"
print(nltk.pos_tag(word_tokenize(sent_two)))

sent_three = "is am I are who when"
print(nltk.pos_tag(word_tokenize(sent_three)))

The result wil be as follows:



Popular posts from this blog

Natural Language Processing with Python NLTK part 5 - Chunking and Chinking

Natural Language Processing Using regular expression modifiers we can chunk out the PoS tagged words from the earlier example. The chunking is done with regular expressions defining a chunk rule. The Chinking defines what we need to exclude from the selection. Here are list of modifiers for Python: {1,3} = for digits, u expect 1-3 counts of digits, or "places" + = match 1 or more ? = match 0 or 1 repetitions. * = match 0 or MORE repetitions $ = matches at the end of string ^ = matches start of a string | = matches either/or. Example x|y = will match either x or y [] = range, or "variance" {x} = expect to see this amount of the preceding code. {x,y} = expect to see this x-y amounts of the preceding code source: https://pythonprogramming.net/regular-expressions-regex-tutorial-python-3/ Chunking import nltk from nltk.tokenize import word_tokenize # POS tagging sent = "This will be chunked. This is for Test. World is awesome. Hello world....

Natural Language Processing with Python NLTK part 1 - Tokenizer

Natural Language Processing Starting with the NLP articles first we will try the  tokenizer  in the NLTK package. Tokenizer breaks a paragraph into the relevant sub strings or sentences based on the tokenizer you used. In this I will use the Sent tokenizer, word_tokenizer and TweetTokenizer which has its specific work to do. import nltk from nltk.tokenize import sent_tokenize, word_tokenize, TweetTokenizer para = "Hello there this is the blog about NLP. In this blog I have made some posts. " \ "I can come up with new content." tweet = "#Fun night. :) Feeling crazy #TGIF" # tokenizing the paragraph into sentences and words sent = sent_tokenize(para) word = word_tokenize(para) # printing the output print ( "this paragraph has " + str(len(sent)) + " sentences and " + str(len(word)) + " words" ) # print each sentence k = 1 for i in sent: print ( "sentence ...

Natural Language Processing with Python NLTK part 6 - Named Entity Recognition

Natural Language Processing - NER Named entities are specific reference to something. As a part of recognizing text NLTK has allowed us to used the named entity recognition and recognize certain types of entities. Those types are as follows NE Type Examples ORGANIZATION Georgia-Pacific Corp. ,  WHO PERSON Eddy Bonte ,  President Obama LOCATION Murray River ,  Mount Everest DATE June ,  2008-06-29 TIME two fifty a m ,  1:30 p.m. MONEY 175 million Canadian Dollars ,  GBP 10.40 PERCENT twenty pct ,  18.75 % FACILITY Washington Monument ,  Stonehenge GPE South East Asia ,  Midlothian Source:  http://www.nltk.org/book/ch07.html Simple example on NER: import nltk from nltk.tokenize import word_tokenize, sent_tokenize para = " America is a country. John is a name. " sent = sent_tokenize(para) for s in sent: word = word_tokenize(s) tag = nltk . pos_tag(word) n...