Semantic segmentation is performed on the whole image over five different classes. Due to the dominant background presence, we do not use the total accuracy as a measure, but instead use mean intersection over union (mean IoU).

Infants and infant seats, as well as children and child seats are treated as two different instances, i.e. the model should learn to separate the child from the child seat. This also means that adults, children and infants should all be classified as a person, i.e. one label for all of them.

Below is the public leaderboard for semantic segmentation for different training data and vehicles. We use the following abbreviations for the classes:

BG = background
IS = infant seat
CS = child seat
Person = Adult passenger, child or baby
Object = everyday object

Train car all means that one model was trained on each vehicle. The general performance of the method is evaluated on the test set of each vehicle without the test performance of the vehicle it was trained on. Consequently, we calculate the mean of the means of the performances across all vehicles for the overall performance of the method.

If a single car is mentioned as the car the model was trained on, then a single model was trained only on the mentioned car and the performance of this model on the test images of all unseen/unknown vehicles is evaluated. Consequently, we calculate the mean of the means of the performances across all vehicles without the test performance of the vehicle it was trained on.

Filters: RBG Grayscale Depth Additional

	Name	Train Car	mean IoU	IoU (per class)	Paper	Code	RGB	Gray	Depth	Additional	Team	Title	Conference
	SVIRO-Team	X5	43.71	BG: 85.70 IS: 17.91 CS: 38.61 Person: 67.69 Object: 8.63			No	Yes	No	Yes
	MDSP	X5	48.17	BG: 90.68 IS: 38.33 CS: 25.97 Person: 76.58 Object: 9.30			No	Yes	No	Yes	Hochschule Mannheim		Under review