This project aims to classify SQL queries as either malicious or non-malicious using various machine learning algorithms. The dataset used for training the models was obtained from Kaggle, and it can be found at SQL Injection Dataset.
The following algorithms were implemented and evaluated for classification:
- Long Short-Term Memory (LSTM)
- Support Vector Machine (SVM)
- Random Forest
- Decision Trees
- Logistic Regression
To preprocess the data, we used the CountVectorizer to tokenize and vectorize the SQL queries. The dataset consists of three columns:
- Serial Number
- Query
- Label (0 for non-malicious, 1 for malicious)
After evaluating the performance of different algorithms, the Support Vector Machine (SVM) classifier achieved the highest accuracy. Therefore, SVM was selected as the final model for deployment.
The web application was built using Streamlit, allowing users to input SQL queries and receive predictions on their maliciousness. The application has also been deployed on Streamlit Cloud for accessibility.
To run this application locally, follow these steps:
-
Clone this repository to your local machine:
git clone https://github.com/thegeek36/sql-injection-classifier.git
-
Navigate to the project directory:
cd sql-injection-classifier
-
Create and activate a virtual environment (recommended):
python -m venv venv
-
Windows:
venv\Scripts\activate
-
macOS and Linux:
source venv/bin/activate
-
-
Install dependencies:
pip install -r requirements.txt
-
Run the Streamlit application:
streamlit run app.py
This project provided valuable insights into machine learning techniques for classifying SQL queries. Although it was a small-scale project, it served as an opportunity to revisit and reinforce fundamental concepts in machine learning.
Thank you for your interest in this project!