Skip to content

This repository offers a beginner's guide to Natural Language Processing using NLTK in Python, covering essential techniques from text segmentation to named entity recognition.

Notifications You must be signed in to change notification settings

ThomasHeim11/NLP-Beginner-Guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

Introduction 🌟

Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans through natural language💬 In this repository, we explore basic NLP tasks using the NLTK (Natural Language Toolkit) library in Python🐍.

image

📚 In this repository, you'll find code examples that demonstrate various NLP techniques using NLTK. The examples cover the following topics:

Segmentation ✂️

In NLP, breaking text into sentences and words is a common initial step. NLTK provides tools to facilitate this.

import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize, word_tokenize

text = "Millions of people across the UK and beyond have celebrated..."
sentences = sent_tokenize(text)
words = word_tokenize(sentences[2])
print(sentences)
print(words) python

Punctuation Removal✨

Eliminating punctuation is often necessary for various text processing tasks.

import re

text = re.sub(r"[^a-zA-Z0-9]", " ", sentences[2])
print(text)

Tokenization🧙‍♂️

Tokenization involves splitting text into individual words.

from nltk.tokenize import word_tokenize

words = word_tokenize(text)
print(words)

Removal of Stop Words🔇

Stop words are common words frequently removed in NLP analysis.

nltk.download('stopwords')
from nltk.corpus import stopwords

words = [w for w in words if w not in stopwords.words("english")]
print(words)

Stemming and Lemmatization🌱

Stemming and lemmatization are linguistic processes to reduce words to their base forms.

nltk.download('averaged_perceptron_tagger')
pos_tags = nltk.pos_tag(words)
print(pos_tags)

Part of Speech Tagging🏷️

Part of speech tagging involves labeling words with their grammatical attributes.

nltk.download('averaged_perceptron_tagger')
pos_tags = nltk.pos_tag(words)
print(pos_tags)

Named Entity Recognition🌟

Named Entity Recognition (NER) identifies named entities within text.

nltk.download('words')
from nltk import ne_chunk

ner_tree = ne_chunk(pos_tags)
print(ner_tree)

Examples🌠

Here are a few examples showcasing Named Entity Recognition:

text = "Twitter CEO Elon Musk arrived at the Staples Center..."
ner_tree = ne_chunk(pos_tag(word_tokenize(text)))
print(ner_tree)

Feel free to explore and expand upon these exercises to deepen your understanding of NLP concepts and NLTK library utilization📚✨

Happy learning!

Getting Started🚀

To run the code examples in this repository, make sure you have Python and NLTK installed. You can install NLTK using the following command:

pip install nltk

Thank you! 🙌

If you appreciated this, feel free to follow!🌟🔮

Thomas HeimLinkedin

About

This repository offers a beginner's guide to Natural Language Processing using NLTK in Python, covering essential techniques from text segmentation to named entity recognition.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published