Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not sure how to interpret the output #9

Open
bartbutenaers opened this issue May 14, 2023 · 14 comments
Open

Not sure how to interpret the output #9

bartbutenaers opened this issue May 14, 2023 · 14 comments

Comments

@bartbutenaers
Copy link

Hi @jveitchmichaelis,

Thank you for sharing these models!

When I run a coco ssd model (from the coral.ai site) via tfjs, then I get as expected 4 output sensors as prediction result: scores, classes, bboxes, and detected object count.

I can load your models (i.e. edgetpu.tflite) without problems in tfjs, but afterwards the object detection output contains only one tensor. That tensor contains an array of 3087 sub-arrays (each containing 85 integer numbers):

image

I see that you use python instead of TensorflowJs, so I assume you don't use tfjs... But do you perhaps have an idea how I can interpret this output, or what I might be doing wrong? The tfjs-tflite package has status "work in progress" so perhaps it contains some bugs...

Thanks!!!

Bart

@jveitchmichaelis
Copy link
Owner

jveitchmichaelis commented May 14, 2023 via email

@bartbutenaers
Copy link
Author

Hi Josh,

Thanks for the pointers to the code!!

Although I am not very familiar with Python, the code was very illuminating. I am getting closer to a solution, but my bounding boxes do not fit my objects yet. So I would appreciate if you could help me a bit more with this...

Can you please confirm whether my assumptions are correct:

  1. I have 1 tensor in the output, whose data array contains this information?

    image

  2. The bounding boxes are in the format [center_x, center_y, width, height]?

  3. The bounding box coordinates are relative to the resized image dimensions (like the model requires), so I need to transform the coordinates like this: coordinate * original_input_image_dimension / resized_model_image_dimension?

  4. The confidence numbers are percentages, so not values between 0 and 1?

  5. I get a score per class (for each of the 80 classes) and I need to find the class with the highest score, which is the class of the detected object.

Or perhaps you have any other info that could help me determine why my bounding boxes do not match my objects inside the image...

@bartbutenaers
Copy link
Author

@jveitchmichaelis,

I don't get the bounding boxes fixed when running your model in tfjs ;-(

When debugging your code, I see that the x contains all floats:

image

But when I debug my own code, I see that your model works with int32 data:

image

And all values in my output are indeed integers...

I would appreciate a lot if you could give me some advice based on your knowledge!! Because I think you do some extra postprocessing - next to the things I listed above - but not all Python code is entirely clear to me

Thanks!
Bart

@jveitchmichaelis
Copy link
Owner

jveitchmichaelis commented May 19, 2023 via email

@bartbutenaers
Copy link
Author

Hi Josh,
That is very kind of you!

For example if I use this image, the I get this output tensor:

image

In the following file you can find the tensor's data array:
tensordata.txt

@bartbutenaers
Copy link
Author

Hi @jveitchmichaelis,

I tried lots of things, but can't get this running unfortunately...
I assume I do something really wrong, because I even get probability percentages above 100:

image

It would be very appreciated if you could find some free time to have a look at my tensordata.txt file above.
Thanks!!!

@jveitchmichaelis
Copy link
Owner

jveitchmichaelis commented Aug 22, 2023

Hi Bart, One thing that seems obviously wrong is your tensor data is all integers. It should be float data - they represent probabilities (mostly). Are you casting somewhere?

The model does use integer math but there is some scaling that happens to the output tensor which converts back to float.

# Scale output

And see a few lines above, you should also scale the input to the model.

If you're just using the weights on their own (without scaling the input image or the predictions) I guess this won't work.

The shape looks good though!

@bartbutenaers
Copy link
Author

Hello Josh,
yes indeed it is all integers instead of floats. Don't know how that happens. I 'assume' Tfjs does it somewhere under the cover...

The only input processing I do is a bilinear resizing of my input image tensor, to resize the image to the resolution requirements of your model.

I also have a normalization preprocessing part, but that is not being executed for your model since both the input image tensor and the requirement of your model is int32:

image

If I force this normalization to be executed for your model, then my prediction gives an exception

image

So I assume your input scaling does something different? Would be nice if you could explain it in pseudo code, so I can convert it to javascript. Because unfortunately my Python knowledge is a bit lacking...

@jveitchmichaelis
Copy link
Owner

jveitchmichaelis commented Aug 23, 2023

Sure, the process is described here: https://www.tensorflow.org/lite/performance/post_training_quantization

https://www.tensorflow.org/lite/performance/quantization_spec

See the bit at the bottom about representation for quantized tensors. You need to apply the scaling to both the image going in (float > int) and on the tensor coming out (int > float). So going in, rearrange for the int8 bit and going out, you want the real bit. If you're getting an int output then tfjs isn't doing it for you I think.

The scaling parameters are stored with the model as they're calibrated by running a bunch of images through. The Python code here reads them from the checkpoint.

This is not the usual 1/255 scaling you might do for a normal CNN. You need to do that first, and then apply the conversion to a scaled integer. I guess you could also roll it into one scale factor but sensible to separate it for clarity.

We do the 1/255 here:

def get_image_tensor(img, max_size, debug=False):

Another example here https://www.tensorflow.org/lite/performance/post_training_integer_quant

In this example they check for a uint8 input (ie the model spec, not if the image is 8 bit!) and if so they apply scaling. I've not had much luck looking for a tfjs version but there may be one somewhere. Note they read the "quantization" parameter to get the scaling values.

See at the top where test_images is defined, they also scale by 255.

I guess tfjs doesn't have an 8 bit datatype so you have int32?

Something like:

const norm_image = image.cast('float32').div(255);
const scale_image = norm_image.div(scale).add(zero_point);

const pred = model.infer(norm_image.cast('int32'));

const float_pred = pred.cast('float32').add(zero_point).mul(scale);

// now do NMS, filter low probs etc.

@bartbutenaers
Copy link
Author

Thanks for the clarification! Never heard of quantization before...

I only had time last night to implement scaling on the output tensor (i.e. detectionResult):

let max = imageTensor.max().cast('float32');
let  min = imageTensor.min().cast('float32');
let qmax = detectionResult.max().cast('float32');
let qmin = detectionResult.min().cast('float32');
let scaleFactor = max.sub(min).div(qmax.sub(qmin)).cast('float32');
let zeroPoint = qmin.sub(min.div(scaleFactor)).cast('float32');
let scaledDetectionResult = detectionResult.add(zeroPoint).mul(scaleFactor).cast('float32');

When applying this, the bounding boxes already start to making sense (when I filter out the ones with a low score):

image

Although there are still some things I don't get:

  • I have 19 detections for the same person.

  • Some of the scores have a value above 1:

    image

  • Some scores (see previous screenshot) seem to be identical, which look like duplicates to me. Not sure whether this is somewhere caused by own code...

  • The bounding boxes don't enclose the person exactly. Seems to be a bit shifted in both directions.

Perhaps this is caused because I didn't scale the input yet. Will try to find some time tonight for that...

@bartbutenaers
Copy link
Author

About scaling the input image tensor: how can you determine the scalefactor that you use in your code snippet BEFORE the prediction is executed? Because to determine the scaleFactor you need qmin and qmax, which are based on the prediction result that you don't have yet. Hmm I assume my factor calculation is not correct :-(

@jveitchmichaelis
Copy link
Owner

jveitchmichaelis commented Aug 24, 2023 via email

@jveitchmichaelis
Copy link
Owner

jveitchmichaelis commented Aug 24, 2023 via email

@bartbutenaers
Copy link
Author

BTW I could not find a way to get the quantization parameters from the model in Tfjs. So I have asked in the Tensorflow forum for help. Hopefully somebody of their community can provide me the golden tip...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants