This project implements a sentiment analysis model using LSTM (Long Short-Term Memory) neural networks to classify Twitter posts into different sentiment categories.
The model analyzes Twitter posts and classifies them into four sentiment categories:
- Positive
- Negative
- Neutral
- Irrelevant
The project uses a Twitter entity sentiment analysis dataset containing:
- Training data: 74,681 tweets
- Testing data: 999 tweets
Each tweet is labeled with a sentiment category and includes the text content.
The model uses a bidirectional LSTM architecture with the following components:
- Embedding layer
- Dropout layer (0.5)
- Bidirectional LSTM layer (150 units)
- Dense layer with ReLU activation (32 units)
- Output layer with softmax activation (4 units)
The text preprocessing pipeline includes:
- Removing extra whitespace
- Removing special characters
- Removing single characters
- Converting to lowercase
- Tokenization
- Lemmatization
- Stopword removal
- Removing words shorter than 3 characters
- Removing duplicates
The model was trained with:
- Learning rate: 0.0001
- Optimizer: Adam
- Loss function: Categorical crossentropy
- Batch size: Default
- Epochs: 40
The model achieved:
- Training accuracy: ~95%
- Validation accuracy: ~88%
- Python 3.x
- TensorFlow
- Keras
- NLTK
- Pandas
- NumPy
- scikit-learn
- Install the required dependencies
- Prepare your dataset in the same format as the example
- Run the Jupyter notebook to train and evaluate the model
- Use the trained model to predict sentiment on new tweets
This project is open source and available under the MIT License.
Created by Yossef Mohammed
- Email: [email protected]
- Phone: +20 112 607 8938