Natural Language Processing with Python NLTK part 2

Natural Language Processing with Python NLTK part 2 - Stop Words

Natural Language Processing

Stop words are the words which we ignore due to the fact that they do not generate any specific meaning to the sentence. Words like the, is, at etc. can be removed to extract the meaning of the sentence more easily. So NLTK has introduced us a stop words filter we can easily use. Let's see how it works.

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

sent = "As you can see this is the blog of myself which is written by Anjula"

w = word_tokenize(sent)

# set English stop words
stop_words = set(stopwords.words('english'))

# list of standard stop words in English
print(stop_words)

# making empty arrays to store stop words and others
stop_words_in_sent = []
non_stop_words = []

# Loop through to get the stop words
for x in w:
    if x not in stop_words:
        non_stop_words.append(x)
    else:
        stop_words_in_sent.append(x)

# print result
print(non_stop_words)
print(stop_words_in_sent)

The code is simple as that the output will be as follows:

Some Stuff

Search This Blog

Natural Language Processing with Python NLTK part 2 - Stop Words

Natural Language Processing

Labels

Popular posts from this blog

Natural Language Processing with Python NLTK part 5 - Chunking and Chinking

Natural Language Processing with Python NLTK part 6 - Named Entity Recognition