Skip to main content

Natural Language Processing with Python NLTK part 5 - Chunking and Chinking

Natural Language Processing


Using regular expression modifiers we can chunk out the PoS tagged words from the earlier example. The chunking is done with regular expressions defining a chunk rule. The Chinking defines what we need to exclude from the selection.

Here are list of modifiers for Python:

  • {1,3} = for digits, u expect 1-3 counts of digits, or "places"
  • + = match 1 or more
  • ? = match 0 or 1 repetitions.
  • * = match 0 or MORE repetitions
  • $ = matches at the end of string
  • ^ = matches start of a string
  • | = matches either/or. Example x|y = will match either x or y
  • [] = range, or "variance"
  • {x} = expect to see this amount of the preceding code.
  • {x,y} = expect to see this x-y amounts of the preceding code
source: https://pythonprogramming.net/regular-expressions-regex-tutorial-python-3/

Chunking


import nltk
from nltk.tokenize import word_tokenize

# POS tagging
sent = "This will be chunked. This is for Test. World is awesome. Hello world."

print(nltk.pos_tag(word_tokenize(sent)))

# creating a regular expression for chunking verbs and nouns
chunkRule = r"""chunk: {<NN.?>*<NNS.?>*<NNP.?>*<NNPS.?>*<VB.?>*<VBD.?>*<VBG.?>*<VBN.?>*<VBP.?>*<VBZ.?>*}"""

My_parser = nltk.RegexpParser(chunkRule)
chunked = My_parser.parse(nltk.pos_tag(word_tokenize(sent)))

print(chunked)


This will give the output as follows:


If you have matplotlib installed you can use a simple code to graphically view the tree which is easier to understand.

My_parser = nltk.RegexpParser(chunkRule)
chunked = My_parser.parse(nltk.pos_tag(word_tokenize(sent)))

# draw the tree
chunked.draw()
print(chunked)

This will draw the tree as follows.





Chinking

chinking is the process of excluding. In this example we will be chunking all the tags and excluding nouns.

import nltk
from nltk.tokenize import word_tokenize

sent = "This will be the day that I will chink all the nouns. Everything will be there.Except nouns"

print(nltk.pos_tag(word_tokenize(sent)))

# first chunk everything and chink only nouns
chunkRule = r"""Chunk: {<.*>+}
                        }<NN.?|NNS|NNP|NNPS>+{"""

MyParser = nltk.RegexpParser(chunkRule)
chunked = MyParser.parse(nltk.pos_tag(word_tokenize(sent)))

chunked.draw()
print(chunked)

Outputs:







Popular posts from this blog

Natural Language Processing with Python NLTK part 1 - Tokenizer

Natural Language Processing Starting with the NLP articles first we will try the  tokenizer  in the NLTK package. Tokenizer breaks a paragraph into the relevant sub strings or sentences based on the tokenizer you used. In this I will use the Sent tokenizer, word_tokenizer and TweetTokenizer which has its specific work to do. import nltk from nltk.tokenize import sent_tokenize, word_tokenize, TweetTokenizer para = "Hello there this is the blog about NLP. In this blog I have made some posts. " \ "I can come up with new content." tweet = "#Fun night. :) Feeling crazy #TGIF" # tokenizing the paragraph into sentences and words sent = sent_tokenize(para) word = word_tokenize(para) # printing the output print ( "this paragraph has " + str(len(sent)) + " sentences and " + str(len(word)) + " words" ) # print each sentence k = 1 for i in sent: print ( "sentence ...

Design Patterns 1 : Introduction

Design Patterns : Introduction So its the holiday time and thought of starting with the Design patterns. In this post I'll talk about What are design patterns?, What good to us using them?, Why and when use them? and many more. So why wait? Lets start the journey to Design patterns. What are Design patterns? So over the years when programmers tried to build systems that can solve problems often they encountered problems that were difficult to overcome. So after finding a solution what they did was presenting it as an future guideline where other programmers when encountered the same problem can easily surpass that. These are what we called as Design Patterns. To see how it all started we have to go back in the past. History of Design Patterns? All these patterns buzz started in about 1977/79 when  Christopher Alexander  showed interest in using using pattern in architecture. That's right not in computer, in architectural ...