Skip to the content.

Overview

EDITVAL is a standardized benchmark for evaluating text-guided image editing methods across diverse edit types, validated through a large-scale human study.


Branching

EDITVAL consists of the following distinct components:

The attribute list A for ~100 images from MS-COCO can be downloaded from here. The format of the json file is as follows:

{
  "class_name" : {
    "image_id": { # image ids from MS-COCO
      "edit_attribute" : {
        "from" : ["initial state of attribute"],
        "to" : ["target states of attribute", ...]}}}
}

The complete list of edit attributes for evaluation currently is:

More Details on EditVal Dataset and Pipeline

EditVal benchmark contains 648 unique image-edit operations for 19 classes selected from MS-COCO spanning a variety of real-world edits. Edit operations span simple attribute categories like adding or replacing an object to more complex ones such as changing an action, camera viewpoint or replacing the position of an existing object.

Branching

MTurk Human Study

The template to run an MTurk study to evaluate the quality of the image editting methods is provided here.

Together with the template, an input csv file must be provided for the mturk study. Each row of the csv file represents one instance of edit, which contains these four inputs:

An example of an input csv file can be seen here. Below is an example of how the mturk study looks to the workers.

Original Image
Edited Image

The right image is supposed to apply the prompt "Change apple to orange" to the left image.

How well is the edit from the given prompt applied?

How well are the other properties (other than what the edit is targeting) of the main object (apple) preserved in the right image?

How well are the other properties (other than what the edit is targeting) of the main object (apple) preserved in the right image?

Leaderboards

The numbers below for the human study are calculated only on the first question of the template, which does not consider the changes to the rest of the image. This has been done in order to keep the results comparable to our automatic evaluation framework. For each instant in the human study, a score of 1.0 is given if the edit is Adequetly applied or Perfectly applied, and a score of 0.0 otherwise.

Human Study

Method Object Addition Object Replacement Position Replacement Positional Addition Size Alter Parts Average

Automatic Evaluation

Method Object Addition Object Replacement Position Replacement Positional Addition Size Alter Parts Average

Contact Us

Contact us at xxx@gmail.com if you wish to add your method to the leaderboards.