Natural Language Toolkit (NLTK)

To work on Natural Language Processing (NLP), we need a library. One popular Python library is Natural Language Toolkit or NLTK. As mentioned in the NLTK website,

NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

Some of the terms like stemming, semantic reasoning are new to me. I'll explore them in future. For now I wanted to test this library so I took the example provided in that website.

First I imported nltk library

import nltk

Then, formed a sample sentence

sentence = "I'm learning Natural Language Processing. Natural Language Processing is also called as NLP"

Then, tokenize the text into words

tokens = nltk.word_tokenize(sentence)

When I ran this code, I got this error.

LookupError: ***************************************************

Resource punkt not found. Please use the NLTK Downloader to obtain the resource:

import nltk 

nltk.download('punkt') 

Attempted to load tokenizers/punkt/english.pickle

***************************************************

It turns out it was not able to find punkt resource. So as mentioned in the error message, executed the below download command and that resolved the error.

nltk.download('punkt')

Finally the output of tokens variable looks like this:

['I',
 "'m",
 'learning',
 'Natural',
 'Language',
 'Processing',
 '.',
 'Natural',
 'Language',
 'Processing',
 'is',
 'also',
 'called',
 'as',
 'NLP']

As you can notice, it has taken each and every word in the sentence into an array. Essentially I did text processing using NLTK library. Of course, this is just the beginning. A long way to go and I need to explore a lot more features of this library and how that can be "applied" to NLP. I'm feeling happy that I have taken another step forward in the world of ML! As I continue to explore the NLP further, I'll keep sharing my learning here.

Comments

Popular posts from this blog

Ordinal Encoder, OneHotEncoder and LabelBinarizer in Python

Data Visualization using Pandas - Univariate Plots