Results using ST-P3 and VAD metrics #166

wljungbergh · 2024-02-27T09:22:04Z

Hi,

As mentioned here you are not using the same definition of the L2 metric as other works (e.g., VAD or ST-P3).

However, you present UniADs numbers using your new metric against ST-P3s numbers using the old metric. This caused us (and surely many others) to assume that UniAD uses the same definition as in ST-P3 (for example VAD makes this assumption when comparing to your method - see here).

This greatly underrepresents UniADs performance, unfortunately. To reduce confusion, it would greatly help if you present your numbers using both definitions.

We ran the numbers for you, and not so surprisingly, your scores improved a lot. Especially at the longer horizons. We include VAD's numbers to highlight that the definition makes a big difference when comparing methods against each other.

Here are your displacement values when using both metric definitions:

Method	L2 (m) 1s	L2 (m) 2s	L2 (m) 3s
ST-P3	1.33	2.11	2.90
UniAD (your metric)	0.48	0.96	1.65
UniAD (old ST-P3 metric)	0.42	0.64	0.91
VAD-Tiny	0.46	0.76	1.12
VAD-Base	0.41	0.70	1.05

FYI, we simply changed the code from

for i in range(len(value)):
    row_value.append("%.4f" % float(value[i]))

to

for i in range(len(value)):
    row_value.append("%.4f" % float(value[:i+1].mean()))

The text was updated successfully, but these errors were encountered:

YTEP-ZHI · 2024-02-27T13:15:48Z

Hi @wljungbergh, thank you so much for sharing this valuable information. Could you please make a pull request for the code change? I'd like to present your verification in the README of this project for clarification, especially the table about the performance displacement. Your efforts are greatly appreciated. Thanks again.

wljungbergh · 2024-03-01T08:02:38Z

No worries, glad to help. I'd be happy to make a pull request, but I will take up this again sometime after March 7th for obvious reasons. ✍️ 📄

wljungbergh · 2024-03-18T15:12:38Z

Alright! Now I have some time on my hands. Do you want to change the evaluation to match those of SPT3 and VAD? Or do you want it to be configurable?

Also, where would be the most appropriate place to present the results? The TRAIN_EVAL.md file or in the main README?

YTEP-ZHI · 2024-03-18T15:20:28Z

Hi @wljungbergh ! I prefer it to be configurable. How about distinguishing the two evaluation protocols by the paper name? One can specify the evaluation strategy using evaluation_strategy='stp3' or 'uniad' in their config, and the default value is 'uniad'.

YTEP-ZHI · 2024-03-18T15:23:37Z

I will update the README and other instruction files accordingly. No need to worry about it.

YTEP-ZHI · 2024-03-18T15:47:40Z

Also, I want to know whether the misalignment issue also happens on the collision rate metric. Is the stp3 metric cumulative while uniad is non-cumulative? Have you looked into that before?

wljungbergh · 2024-03-18T17:00:07Z

Yes. They do the same for L2 and collision-rate (i.e., cumulative). As you can see here, they create one planning metric for each of the timesteps they are using and then take the mean over the values they have added (see here). You do this non-cumulative, just as with the L2 definition.

And yes, if we change that definition too, UniAD achives lower CR than other works.

YTEP-ZHI · 2024-03-19T03:31:35Z

Understood. After modifying the code, could you please make a complete table for both L2 and CR metrics under the stp3 and uniad strategies like the one you posted?

It would be of great help if the table included L2 and CR results for STP3, UniAD, and VAD under the two different evaluation strategies.

wljungbergh · 2024-03-19T10:58:06Z

Added a PR #173. Cross-posting the table here:

Method	L2 1s (m)	L2 2s (m)	L2 3s (m)	CR 1s (%)	CR 2s (%)	CR 3s (%)
ST-P3 (legacy ST-P3 metric)	1.33	2.11	2.90	0.23	0.62	1.27
UniAD (your current metric)	0.51	0.98	1.65	0.10	0.15	0.61
UniAD (legacy ST-P3 metric)	0.42	0.64	0.91	0.07	0.10	0.22
VAD-Tiny (legacy ST-P3 metric)	0.46	0.76	1.12	0.21	0.35	0.58
VAD-Base (legacy ST-P3 metric)	0.41	0.70	1.05	0.07	0.17	0.41

YTEP-ZHI · 2024-03-19T11:03:12Z

Great! Thanks for your help.

YTEP-ZHI self-assigned this Mar 18, 2024

YTEP-ZHI added the enhancement New feature or request label Mar 18, 2024

wljungbergh mentioned this issue Mar 19, 2024

Add option to use legacy planning metric definition #173

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results using ST-P3 and VAD metrics #166

Results using ST-P3 and VAD metrics #166

wljungbergh commented Feb 27, 2024

YTEP-ZHI commented Feb 27, 2024

wljungbergh commented Mar 1, 2024

wljungbergh commented Mar 18, 2024

YTEP-ZHI commented Mar 18, 2024 •

edited

YTEP-ZHI commented Mar 18, 2024

YTEP-ZHI commented Mar 18, 2024 •

edited

wljungbergh commented Mar 18, 2024

YTEP-ZHI commented Mar 19, 2024

wljungbergh commented Mar 19, 2024 •

edited

YTEP-ZHI commented Mar 19, 2024

Results using ST-P3 and VAD metrics #166

Results using ST-P3 and VAD metrics #166

Comments

wljungbergh commented Feb 27, 2024

YTEP-ZHI commented Feb 27, 2024

wljungbergh commented Mar 1, 2024

wljungbergh commented Mar 18, 2024

YTEP-ZHI commented Mar 18, 2024 • edited

YTEP-ZHI commented Mar 18, 2024

YTEP-ZHI commented Mar 18, 2024 • edited

wljungbergh commented Mar 18, 2024

YTEP-ZHI commented Mar 19, 2024

wljungbergh commented Mar 19, 2024 • edited

YTEP-ZHI commented Mar 19, 2024

YTEP-ZHI commented Mar 18, 2024 •

edited

YTEP-ZHI commented Mar 18, 2024 •

edited

wljungbergh commented Mar 19, 2024 •

edited