All of the submitted evaluations will be processed automatically. Consequently, it is important to follow the guidelines below to ensure a smooth and correct processing of your submitted files. Feel free to contact us if you discover problems, have questions or suggestions on how we can improve it.

You can download an example submission file for classification (data is completely randomized).

Make sure that the predictions are according to the pixel coordinates of the provided images. If you resize the images during training and evaluations, you need to make sure to provide the results on the non-resized versions.

For some tasks, not all images will be used for the evaluation. For object detection and instance segmentation we skip images with no objects and for keypoint estimation, we skip the images with no humans.

Submission file

For all the tasks, you will need to upload a .zip file containing all your results. You can submit results evaluated on the test set for a single vehicle or for multiple vehicles. The structure of your .zip file needs to be in the following format:

new_sota_results.zip
│
│
├──────────── aclass
│             ├────────────────────────── 0.json
│             ├────────────────────────── 1.json
│             ├────────────────────────── 2.json
│             .
│             .
│             .
│             └────────────────────────── 499.json
│
│
│
├──────────── hilux
│             ├────────────────────────── 0.json
│             ├────────────────────────── 1.json
│             ├────────────────────────── 2.json
│             .
│             .
│             .
│             └────────────────────────── 499.json
│
│
│
└──────────── x5
              ├────────────────────────── 0.json
              ├────────────────────────── 1.json
              ├────────────────────────── 2.json
              .
              .
              .
              └────────────────────────── 499.json

The .zip folder should not contain a main folder in which the different vehicle subfolders are located in. If you want to test it manually: double click on your zip file and make sure that you only see the vehicle folders. An easy way to do create such a zip file (on Linux and assuming you have installed zip) is by going into the folder containing the predictions for the different vehicles and use the following command for creating a zip filed called predictions.zip using all the folder in the directory:

 zip -r predictions.zip *

Make sure that the vehicle folders are named as follows (without the double quotes):

"aclass", "escape", "gsf", "hilux", "i3", "model3", "tiguan", "tucson", "x5", "zoe"

The test images for each vehicle are enumerated from 0 to 499. For each test image, create a submission file, e.g. 5.json, 5.png or 5.npz (depending on the task). Do not use leading zeroes, i.e. 5.json is correct, but 005.json would be wrong.

You can perform 5 submissions per month for each task. You can evaluate a trained model on multiple cars in a single submission. For this, create in the submission file one folder for each vehicle you want to evaluate the results on. If you train an individual model on each vehicle, you can also use the latter approach to evaluate all your models on the test sets of all vehicles, i.e. each folder contains test predictions by the model trained on that car. For the latter, specify during the submission that you trained on all cars.

If you select all for the car the model was trained on, we assume that you trained one model on each vehicle. In this case, you want to evaluate the general performance of your model onto the test set of each vehicle. Consequently, we calculate the mean of the means of the performances across all vehicles for the overall performance of your model.

If you select a single car as the car the model was trained on, then we assume that you trained a single model on a single car and you want to evaluate how well this model performs on the test images of unseen/unknown vehicles. Consequently, for the overall performance of your model, we calculate the mean of the means of the performances across all vehicles without the test performance of the vehicle it was trained on.

Classification

Since you do not necessarily need to use the rectangular images provided by us to perform the classification, we decided it would be simplest to submit the classification results on the entire image. For this, create one .json file for each full image (not rectangular) and specify for each seat position the class prediction. The predicted classes should be integers from 0 to 6 (see data overview). Here is an example of such a file:

{
    "image_id": 3,
    "predictions": {
        "left": 1,
        "middle": 5,
        "right": 1        
    }    
}

Below is a code snippet to create such a .json file.

def save_classification(left, middle, right, sviro_id, save_path):

    # dict will be saved as json
    json_data = dict()

    json_data["image_id"] = int(sviro_id)

    json_data["predictions"] = {
        "left": int(left),
        "middle": int(middle),
        "right": int(right)
    }
 
    with open(save_path, 'w') as fp:
        json.dump(json_data, fp)

Semantic segmentation

Create a .png file for each image. The .png file should give for each single pixel the class prediction. This means that you need to specify an integer value from 0 to 4 for each pixel (see data overview).

Below is a code snippet to create such a .png file.

def save_semantic_mask(mask, save_path):

    # for each pixel get the channel (class) with heighest probability
    classification_mask = np.argmax(mask, axis=0).astype(dtype=np.uint16)

    # make it a PIL image
    classification_mask_image = Image.fromarray(classification_mask)

    # save image
    classification_mask_image.save(save_path)

Object detection

Create a .json file for each image and fill it according to the example below. The .json files contains the bounding boxes of all predictions, together with the class prediction and the score. The class should be an integer value from 0 to 4 (see data overview). The bounding box corner coordinates can/should be floats.

{
    "image_id": 337,
    "predictions": [
        {
            "class": 3,
            "score": 0.9377024173736572,
            "bbox": [
                378.9494323730469,
                244.74473571777344,
                606.0543212890625,
                588.4853515625
            ]
        },
        {
            "class": 2,
            "score": 0.6482189297676086,
            "bbox": [
                351.2763671875,
                255.3629913330078,
                602.16845703125,
                640.0
            ]
        },
        {
            "class": 1,
            "score": 0.4837550222873688,
            "bbox": [
                336.33746337890625,
                234.62564086914062,
                649.1395263671875,
                640.0
            ]
        } 
    ]
}

Here is a simple code snippet which produced the above output. The first coordinate point for the bounding box is the upper-left corner and the second one is the lower-right corner. Coordinates start at the top-left image corner.

def save_bbs(bbs, labels, scores, sviro_id, save_path):

    # dict will be saved as json
    json_data = dict()
    json_data["image_id"] = int(sviro_id)
    json_data["predictions"] = []

    # for each bb in the image
    for current_boundingBox, current_label, current_score in zip(bbs, labels, scores):

        # get the bb coordinates
        bb_x1 = current_boundingBox[0]
        bb_y1 = current_boundingBox[1]
        bb_x2 = current_boundingBox[2]
        bb_y2 = current_boundingBox[3]

        json_data["predictions"].append(
            {
                "class": current_label,
                "score": current_score,
                "bbox" : [bb_x1, bb_y1, bb_x2, bb_y2]
            }
        )

    with open(save_path, 'w') as fp:
        json.dump(json_data, fp)

Instance segmentation

Instance segmentation is the only task for which we need two submission files per test image. We need a .json file which is identical to the object detection .json file and a .npz (numpy compressed) file which contains the mask predictions. The mask corresponds to each object detection and it should be a boolean mask. In the code snippet below we chose a threshold of 0.5, but you are free to choose any other threshold value. Make sure that the .npz mask is accessible via the “masks” dictionary key, i.e. use masks as a keyword during saving. In order to save space, the masks are cast to uint8, but _bool is possible as well.

Below is a code snippet to create the .json and .npz file.

def save_bb_and_mask(bbs, labels, scores, masks, sviro_id, save_path):

    # dict will be saved as json
    json_data = dict()
    json_data["image_id"] = int(sviro_id)
    json_data["predictions"] = []

    # for each bb in the image
    for current_score, current_boundingBox, current_label in zip(scores, bbs, labels):

        # get the bb coordinates
        bb_x1 = current_boundingBox[0]
        bb_y1 = current_boundingBox[1]
        bb_x2 = current_boundingBox[2]
        bb_y2 = current_boundingBox[3]

        json_data["predictions"].append(
            {
                "class": current_label,
                "score": current_score,
                "bbox" : [bb_x1, bb_y1, bb_x2, bb_y2],
            }
        )

    # make the masks a boolean mask with a threshold of 0.5
    masks = (masks > 0.5).astype(np.uint8)

    # compress the masks
    np.savez_compressed(save_path / (str(sviro_id)), masks=masks)

    # define path for json
    save_path_json = save_path / (str(sviro_id) + ".json")

    # save the json
    with open(save_path_json, 'w') as fp:
        json.dump(json_data, fp)

Keypoint estimation

The file format for the keypoint estimation is similar to the one for object detection. Additionally, we need to include all the keypoints for each detected human. For each bone, you need to provide the predicted x and y coordinate, as well as the visibility. A visibility of 0 means that the bone is not visible and a visibility of 1 means the bone is visible. For each human, all the keypoints should be in a list of lists (one after the other). Here is a reduced example:

"keypoints": [[472.6414489746094, 415.9590759277344, 1.0], [479.02978515625, 385.6070251464844, 1.0], [478.23126220703125, 364.8398132324219, 1.0], [479.02978515625, 308.1293640136719, 1.0], ..., [449.48370361328125, 473.46826171875, 1.0]]

Keypoints should be predicted for all humans in the scene, i.e. babies, children and adults. Sceneries which do not contain humans are not considered during the evaluation.

For our benchmark, we only use 17 of the provided bones, namely the ones named:

["head", "clavicle_r", "clavicle_l", "upperarm_r", "upperarm_l",  "lowerarm_r", "lowerarm_l", "hand_r", "hand_l", "thigh_r", "thigh_l", "calf_r", "calf_l", "pelvis", "neck_01", "spine_02", "spine_03"]

Below is a code snippet for the .json file.

def save_bb_and_keypoints(bbs, labels, scores, keypoints, sviro_id, save_path):

    # dict will be saved as json
    json_data = dict()
    json_data["image_id"] = int(sviro_id)
    json_data["predictions"] = []

    # for each bb in the image
    for current_score, current_boundingBox, current_label, current_keypoints in zip(scores, bbs, labels, keypoints):

        # get the bb coordinates
        bb_x1 = current_boundingBox[0]
        bb_y1 = current_boundingBox[1]
        bb_x2 = current_boundingBox[2]
        bb_y2 = current_boundingBox[3]

        json_data["predictions"].append(
            {
                "class": current_label,
                "score": current_score,
                "bbox" : [bb_x1, bb_y1, bb_x2, bb_y2],
                "keypoints" : current_keypoints.tolist()
            }
        )

    # define path for json
    save_path_json = save_path / (str(sviro_id) + ".json")

    # save the json
    with open(save_path_json, 'w') as fp:
        json.dump(json_data, fp)