Skip to content

lixx21/simple_object_localization_app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

simple_object_localization_app

       This project is to localization and predict an object in the image note: this project only detect cucumber, eggplant, and mushroom due the dataset that I used only contains those object. I also using flask as a backend to create an API and html as an interface to make a web from it.

Dataset

       You can get the dataset from Kaggle - Image Localization Dataset, The dataset contains object image with jpg format and xml file is contains annotation from the corresponding images.

image

Notebook

       I built the model in .ipynb file, I used google colab to helped me built the model and this is the explanation about the .ipynb file:

  1. I test to plot image with the bounding box, I done this using xml.etree.ElementTree library to extract xml fit corresponding image, I extract xmin, ymin, xmax, and ymax from xml file and plot the bounding box around the image using cv2.rectangle() with xmin, ymin, xmax, and ymax from the xml files, and this is the result

image

  1. Then I read all xml files to extract label, xmin, ymin, xmax, and ymax from those xml files and append them into list. I encode the categorical value into numerical value {"cucumber": 0, "eggplant": 1, "mushroom": 2}, I also read all image files and append the image into list
  2. I used np.array() to convert the lists of image files and outputs (contains label, xmin, ymin, xmax, and ymax)
  3. Then I split inputs and outputs array into x_train, x_test, y_train, and y_test, using sklearn.model_selection.train_test_split() with parameters as follows test_size = 0.3 and random_state = 42)
  4. Because y_train and y_test has 5 values contains (label, xmin, ymin, xmax, and ymax) I seperate label with other values (coordinate xmin, ymin, xmax, and ymax to build the bounding box) because our model will have 2 outputs (labels and bounding box coordinate) and 1 input (image array).
  5. I encode the labels using tf.keras.utils.to_categorical()
  6. For the model I used pretrained model MobileNetV2 with input_shape = (224,224,3), with 3 classes, weight = 'imagenet' and include_top = False
  7. then I added pretrained model into my own layers, I also compile the model with optimizers = Adam(lr=1e-4), loss function has 2 loss for classification is categorical_crossentropy and for bounding box is mse, also in metrics I used 2 metrics, for classification is accuracy and bounding box is mse. Then I fit the model with 50 epochs, and I get this result

image

  1. I saved the model to used in API later
  2. I test the model to predict image and got predict object localization as follows:

image

Web APP

       For the web app I have:

  1. app.py for my backend and build API
  2. static folder for save static files like image and predicted image
  3. template folder to save html or front end folder

       Here's the result

WhatsApp Image 2023-01-26 at 01 40 13

Releases

No releases published

Packages

No packages published

Languages