Natural Language Processing - Lemmatizing
Stemming and Lemmatizing process goes in hand in hand. Both of these process do the same thing but in different way. In stemming we considered to cut off the last part of the word and get a meaningful word but in lemmatizing it is more considered upon getting a more meaningful word by removing infectious part and returning the vocabulary word. Lets understand with a simple example.
from nltk.stem import PorterStemmer, WordNetLemmatizer # lemmatizing verbs words_verbs = ["run", "ran", "running", "gave", "took", "shot"] print("*************Stemming verbs********************") for w in words_verbs: # Stemming the words print(PorterStemmer().stem(w)) print("*************Lemmatizing verbs********************") for w in words_verbs: # lemmatize the words print(WordNetLemmatizer().lemmatize(w, pos="v")) # lemmatizing nouns print("*************Stemming nouns********************") words_nouns = ["goons", "clocks", "machines", "wolves", "shelves"] for x in words_nouns: print(PorterStemmer().stem(x)) print("*************Lemmatizing nouns********************") for x in words_nouns: print(WordNetLemmatizer().lemmatize(x, pos="n")) print("*************Lemmatizing adjectives********************") words_adjective = ["better", "slower", "slowest", "strongest", "busiest"] for x in words_adjective: print(WordNetLemmatizer().lemmatize(x, pos="a"))
In the following example I have lemmatized verbs, nouns and adjectives separately to show the effect. The result was as following. You can now clearly see the difference between the stemming and lemmatizing.