[AVTCORE] MEXaction2 action detection and localization dataset available

Bernard Merialdo <bernard.merialdo@eurecom.fr> Thu, 10 September 2015 14:36 UTC

Return-Path: <bernard.merialdo@eurecom.fr>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7FC691B5DF0; Thu, 10 Sep 2015 07:36:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.441
X-Spam-Level:
X-Spam-Status: No, score=0.441 tagged_above=-999 required=5 tests=[BAYES_50=0.8, HELO_EQ_FR=0.35, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, T_RP_MATCHES_RCVD=-0.01] autolearn=unavailable
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8QgdEXbdF8Os; Thu, 10 Sep 2015 07:36:55 -0700 (PDT)
Received: from smtp2.eurecom.fr (smtp3.eurecom.fr [193.55.113.213]) by ietfa.amsl.com (Postfix) with ESMTP id 85AD01B52D6; Thu, 10 Sep 2015 07:36:54 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="5.17,505,1437429600"; d="scan'208,217";a="1011265"
Received: from monza.eurecom.fr ([192.168.106.15]) by drago2i.eurecom.fr with ESMTP; 10 Sep 2015 16:29:55 +0200
Received: by monza.eurecom.fr (Postfix) id 1E1BE493; Thu, 10 Sep 2015 16:29:09 +0200 (CEST)
Delivered-To: trec-eurecom@eurecom.fr
Received: from [172.17.31.50] (xerus42.eurecom.fr [172.17.31.50]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by monza.eurecom.fr (Postfix) with ESMTPSA id EDFD5492 for <trec-eurecom@eurecom.fr>; Thu, 10 Sep 2015 16:29:08 +0200 (CEST)
From: Bernard Merialdo <bernard.merialdo@eurecom.fr>
Message-ID: <55F193B4.2060009@eurecom.fr>
Date: Thu, 10 Sep 2015 16:29:08 +0200
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.2.0
MIME-Version: 1.0
To: undisclosed-recipients: ;
Content-Type: multipart/alternative; boundary="------------070503010305020009090405"
Archived-At: <http://mailarchive.ietf.org/arch/msg/avt/Mi0Lim3G6IZdAR3nnxPbuLt5hoc>
X-Mailman-Approved-At: Thu, 10 Sep 2015 18:18:12 -0700
Subject: [AVTCORE] MEXaction2 action detection and localization dataset available
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 10 Sep 2015 14:36:57 -0000

(In behalf of Michel Crucianu and Jenny Benois-Pineau)


We are happy to make public the MEXaction2 action detection and 
localization dataset. A detailed description, including video samples, 
the evaluation procedure and baseline results, is available at
http://mexculture.cnam.fr/xwiki/bin/view/Datasets/Mex+action+dataset

A brief description follows.

The aim of the MEXaction2 dataset is to support the development and 
evaluation of methods for 'spotting' instances of short actions in a 
relatively large video database. For each action class, such a method 
should detect instances of this class in the video database and output 
the temporal boundaries of these detections, with an associated 
'confidence' score. This task can also be seen as 'action retrieval': 
the 'query' is an action class and the results are instances of the 
class, ordered by decreasing 'confidence' score.

The dataset contains videos from three sources:

  1. INA videos. A large collection of 117 videos (for a total of 77 
hours), was extracted from the archives of the Institut National de 
l'Audiovisuel (France). It contains videos produced between 1945 and 
2011.  The video content in this collection was divided into three 
parts: training, parameter validation and testing.

  2. YouTube clips. From additional videos collected from YouTube, we 
provide 588 short clips, each containing only one instance of an action 
to spot (everything within the boundaries of a clip belongs to an action 
instance). All these instances should be only used for training.

  3. UCF101 Horse Riding clips. We add as training instances for the 
HorseRiding class the Horse Riding clips from the UCF101 dataset. All 
these instances should be only used for training.

There are two annotated actions (see the abovementioned website for 
examples):

  1. BullChargeCape: in the context of a bull fight, the bull charges 
the matador who dangles a cape to distract the animal.

  2. HorseRiding: instances of one or several persons riding horses.

The numbers of annotated examples for the two actions are:
  BullChargeCape   1324
  HorseRiding       651

Beside the fact that the total amount of annotated video is relatively 
large compared to other existing datasets, this dataset is also 
interesting because it raises several difficulties:

  1. High imbalance between non-relevant video sequences and relevant 
ones (instances of an action of interest).

  2. High variability in point of view and background movement (and 
action duration for HorseRiding).

  3. Variability in image quality: old videos have lower resolution and 
are in black and white, while the newest ones are in HD.

Action detection and localization is evaluated as a retrieval problem: 
the system must produce a list of detections (temporal boundaries) with 
positive scores. Sorting these results by decreasing score allows to 
obtain precision/recall curves and to compute the Average Precision (AP) 
in order to characterize the detection performance.