Natural Language Processing (NLP)

So far I learnt how to apply Machine Learning for specific problems. I have primarily taken Classifier examples to understand the ML and how it is applied. In all these examples, I worked on "clean data". That means, data was already processed and it was available for ML algorithms. In reality, it will never be the case. We have to extract data and do a lot of processing before that can be applied to the ML models. This pre-processing turns out to be a major chunk of work in a ML project.

So as a next step, I need to start learning these techniques which involves data extraction, cleansing etc... There are specific terminology for these processes which I will start using as I go through the learning. For now, I'll keep it in a layman's language.

When I was discussing this with my colleague who is a ML engineer, he suggested me to get into NLP. He said 'if you get into NLP, you will end up working on end-to-end project from data extraction to prediction/classification'. It was a sound piece of advice! With that I started learning NLP!

What is NLP or Natural Language Processing? Till now the input data which was fed to ML models were discrete and numeric in nature. In contrast, what if you want to take human speech and natural texts as input data? Obviously the ML models do not understand these as it is and on top of that, human speech and natural texts are most complex in nature. So how do we deal with such data so that machine can understand. That's exactly what NLP is for! Of course, I have tried to put in a layman's language. But I hope you understand (may be at 10,000 ft level) what NLP is all about!

Example of natural language texts
Reviews in Google Map - an example of natural language texts

In next series of posts, you will see articles related to NLP. I'm excited to get my hands dirty in the world of NLP!

Comments

Popular posts from this blog

Ordinal Encoder, OneHotEncoder and LabelBinarizer in Python

Data Visualization using Pandas - Univariate Plots

Stemming in Python