Natural Language Processing
PoS tagging or Part of Speech tagging is a commonly used mechanism. This will allow NLTK to tag the words that is in your corpus and give the tags accordingly. There are many tags predefined by the NLTK and here are the list.
Number
|
Tag
|
Description
|
1.
|
CC
|
Coordinating conjunction
|
2.
|
CD
|
Cardinal number
|
3.
|
DT
|
Determiner
|
4.
|
EX
|
Existential there
|
5.
|
FW
|
Foreign word
|
6.
|
IN
|
Preposition or subordinating conjunction
|
7.
|
JJ
|
Adjective
|
8.
|
JJR
|
Adjective, comparative
|
9.
|
JJS
|
Adjective, superlative
|
10.
|
LS
|
List item marker
|
11.
|
MD
|
Modal
|
12.
|
NN
|
Noun, singular or mass
|
13.
|
NNS
|
Noun, plural
|
14.
|
NNP
|
Proper noun, singular
|
15.
|
NNPS
|
Proper noun, plural
|
16.
|
PDT
|
Predeterminer
|
17.
|
POS
|
Possessive ending
|
18.
|
PRP
|
Personal pronoun
|
19.
|
PRP$
|
Possessive pronoun
|
20.
|
RB
|
Adverb
|
21.
|
RBR
|
Adverb, comparative
|
22.
|
RBS
|
Adverb, superlative
|
23.
|
RP
|
Particle
|
24.
|
SYM
|
Symbol
|
25.
|
TO
|
to
|
26.
|
UH
|
Interjection
|
27.
|
VB
|
Verb, base form
|
28.
|
VBD
|
Verb, past tense
|
29.
|
VBG
|
Verb, gerund or present participle
|
30.
|
VBN
|
Verb, past participle
|
31.
|
VBP
|
Verb, non-3rd person singular present
|
32.
|
VBZ
|
Verb, 3rd person singular present
|
33.
|
WDT
|
Wh-determiner
|
34.
|
WP
|
Wh-pronoun
|
35.
|
WP$
|
Possessive wh-pronoun
|
36.
|
WRB
|
Wh-adverb
|
The python code is as easy as it was with the earlier cases.
import nltk from nltk.tokenize import word_tokenize sent = "This is about the life. LIfe is awesome." sent_words = word_tokenize(sent) print(nltk.pos_tag(sent_words)) sent_two = "run work give shoot" print(nltk.pos_tag(word_tokenize(sent_two))) sent_three = "is am I are who when" print(nltk.pos_tag(word_tokenize(sent_three)))
The result wil be as follows: