Author: Sebastien Marcel
Contributors: Tim Cootes, James Ferryman, Andreas Lanitis,
Michael Nielsen, Thomas Moeslund
Date version: October 14, 2002
Overview
1 Specifications
1.1 List of actions/gestures
1.2 List of facial expressions
1.3 Setting
1.4 Person identities
2 Scenarios
2.1 Scenario A
2.2 Scenario B
2.3 Scenario C
2.4 Scenario D
3. Annotations
3.2.1 Example for face localization
3.2.2 Example for facial expressions
3.2.3 Example for face/hand gestures
3.2.4 Example for face/head direction
3.2.5 Example for actions
3.2.6 Other example of annotation (from Michael Nielsen)
3.3 Coarse annotation
1 Specifications
1.1 List of actions/gestures:
Sit down
Get up
Writing
Going to the board
Talking
Raising hand
Nodding
Shaking head
Yawning
Laughing
1.2 List of facial expressions:
Smile
Laugh
Angry
Neutral
1.3 Setting:
o map:
[Cam2]
---
--- ---
[6]
[5] [4]
<-- SEATS
----------------------------------
|
|
|
|
|
[Cam3]
| <-- TABLE
|
|
|
|
----------------------------------
[3]
[2] [1]
<-- SEATS
---
--- ---
[Cam1]
[Cam1] and [Cam2] are camera
fixed on the wall.
[Cam3] is an omnidirectional
camera
The dimension of the posters on the walls:
Poster 1: [Cam1] 131.5cm width x 91cm height
Poster 2: [Cam2] 91.5cm width x 121.5cm height
Matrix : [Cam2] 62cm width x 92.5cm height
1.4 Person Identities:
1 Darren Moore
2 Sebastien Marcel
(Moderator)
3 Pierre Wellner
4 James Ferryman
5 Fabien Cardinaux
6 Daniel Gatica-Perez
2 Scenarios
2.1 Scenario A: "Performing distinct Facial Expressions"
Actions: Sitting down, getting
up, smile, angry, neutral, looking at other participants
Duration/size: 500 seconds,
1.2 Gb (JPEG)
Each person (1 to 6) enters in the room one after each other, go to his place, presents themselves to the frontal camera, and sits down. Then each person looks at each person in front of him with a different facial expression:
Foreach person (1 2 3)
Foreach person_to_look_at
(4 5 6)
$person looks at $person_to_look_at with a neutral expression
end
Foreach person_to_look_at
(4 5 6)
$person looks at $person_to_look_at with a angry expression
end
Foreach person_to_look_at
(4 5 6)
$person looks at $person_to_look_at with a smile expression
end
end
Foreach person (4 5 6)
Foreach person_to_look_at
(1 2 3)
$person looks at $person_to_look_at with a neutral expression
end
Foreach person_to_look_at
(1 2 3)
$person looks at $person_to_look_at with a angry expression
end
Foreach person_to_look_at
(1 2 3)
$person looks at $person_to_look_at with a smile expression
end
end
2.2 Scenario B: "Performing face & hand gestures"
Actions: Sitting down, getting
up, raising hand, shaking head, nodding head, yawning, laughing
Duration/size: 333 seconds,
1.0 Gb (JPEG)
Each person (1 to 6) enters in the room one after each other, go to
his place,
presents himself to the frontal camera, and sit down.
Foreach person (1 2 3 4 5 6)
$person is raising left
hand
$person is raising right
hand
$person is shaking head
$person is nodding head
$person is yawning
$person is laughing
end
Forall person (1 2 3 4 5 6) at the same time
$person is raising left
hand
$person is raising Right
hand
$person is shaking head
$person is nodding head
$person is yawning
$person is laughing
end
2.3 Scenario C: "Going to the white board"
Actions: Sitting down, getting
up, going to the white board
Duration/size: 407 seconds,
1.2 Gb (JPEG)
Each person (1 to 6) enters in the room one after each other, go to
his place,
presents himself to the frontal camera, and sit down.
Foreach repetition (1 2)
Foreach person (1 2 3 4
5 6)
$person is getting up
$person is going to the white-board and write something
$person is going back to his seat
end
end
2.4 Scenario D: "Artificial meeting"
Actions: A little bit of
everything
Duration/size: 577 seconds,
2.5 Gb (JPEG)
Each person (1 to 6) enters in the room one after each other, go to
their place, presents themselves to the frontal camera, and sits
down.
2 reads the agenda
4, 5, 6 looks at him
1 and 3 writes
4 asks to 1 the agenda
4 smiles at 1
1 smiles at 4
1 gives to 4 the agenda
4 smiles at 1
6 asks to 3 for the agenda
6 smiles at 3
3 smiles at 6
3 gives to 6 the agenda
6 smiles at 3
2 is angry (nobody is listening !!!)
2 asks for a discussion
1 talks to 4
4 is nodding
6 talks to 2
2 is nodding
1 talks to 5
3 is yawning
5 is shaking head
6 talks to 3
3 is shaking head
2 get up
2 go to the white board
and write something
2 go back to his seat
6 get up
6 go to the white board
and write something
6 go back to his seat
2 asks for who vote NO ?
3 and 5 vote NO by raising
hands
2 asks for who vote YES ?
1, 2, 4, 6 vote YES by raising
hands
2 is neutral
1 is smiling
4 is laughing
6 is smiling
3 is angry
5 is angry
5 get up and leave the room
1, 2, 4, 6 are angry
3 get up and leave the room
3 come back and sit down (on seat of 5)
5 come back and sit down (on seat of 3) !! 5 had taken off his
sweater !!
1, 2, 3, 4, 5, 6 are smiling
2 get up
1, 3, 4, 5, 6 get up
4 leaves the room
1 leaves the room
3 leaves the room
2 leaves the room
6 leaves the room
5 leaves the room
3 Annotation
3.2.1 Example for face localization and identification
The face localisation and identification annotation should provide (x,y) coordinates of eyes' centres for each person (1, 2, 3, 4, 5, 6) in every image.
scenarioA_cam1_facexy.txt:
image0001.jpg 3 left_eye_center_x left_eye_center_y right_eye_center_x
right_eye_center_y
image0001.jpg 4 left_eye_center_x left_eye_center_y right_eye_center_x
right_eye_center_y
image0001.jpg 5 left_eye_center_x left_eye_center_y right_eye_center_x
right_eye_center_y
image0002.jpg 3 left_eye_center_x left_eye_center_y right_eye_center_x
right_eye_center_y
image0002.jpg 4 left_eye_center_x left_eye_center_y right_eye_center_x
right_eye_center_y
image0002.jpg 5 left_eye_center_x left_eye_center_y right_eye_center_x
right_eye_center_y
...
3.2.2 Example for facial expressions
scenarioA_cam1_faceexp.txt:
image0001.jpg 3 1
image0001.jpg 4 0
image0001.jpg 5 0
image0002.jpg 3 2
image0002.jpg 4 0
image0002.jpg 5 0
...
with 0 = unknown, 1 = neutral, 2 = smile, 3 = angry
3.2.3 Example for face/hand gestures
scenarioA_cam1_facehandgesture.txt:
image0001.jpg 3 1
image0001.jpg 4 0
image0001.jpg 5 0
image0002.jpg 3 2
image0002.jpg 4 0
image0002.jpg 5 0
...
with 0 = unknown, 1 = nodding, 2 = yawning, ...
3.2.4 Example for face/head direction
scenarioA_cam1_gaze.txt:
image0001.jpg 3 1
image0001.jpg 4 1
image0001.jpg 5 1
image0002.jpg 3 2
image0002.jpg 4 2
image0002.jpg 5 3
...
where image0001.jpg 3 1 means that 3 is looking at 1 and so on ...
3.2.5 Example for actions
scenarioA_cam1_actions.txt:
image0001.jpg 3 1 1
image0001.jpg 4 0
image0001.jpg 5 0
image0002.jpg 3 1 2
image0002.jpg 4 0
image0002.jpg 5 1 3
...
image0001.jpg 3 2 1
...
image0101.jpg 3 3 1
image0101.jpg 4 0
image0101.jpg 5 0
image0102.jpg 3 3 2
image0102.jpg 4 0
image0102.jpg 5 3 3
...
where :
image0001.jpg 3 1 1 means that 3 is starting
doing action 1
...
image0002.jpg 3 1 2 means that 3 is starting
doing action 2
...
image0002.jpg 5 1 3 means that 5 is starting
doing action 3
...
image0001.jpg 3 2 1 means that 3 is doing the
core of the action 1 (the
hand is raised at maximum)
...
image0101.jpg 3 3 1 means that 3 is ending
action 1
image0101.jpg 4 0 means that 4
is doing nothing to recognize
...
image0102.jpg 3 3 2 means that 3 is ending
action 2
...
image0102.jpg 5 3 3 means that 5 is ending
action 3
with:
first column is person id,
second column is start/end/center tag
third column is action type (raising hand, getting up, sitting down,
going to the board)
3.2.6 Other example of annotation (from Michael Nielsen)
When annotating positions it makes sense to let frames be on the row
axis, thus referring to all frames. But with actions it might be smarter
(and easier) to let each row be an action, and refer to intervals.
[e.]|[p.]|[g.]| [a. start] | [s. start]
| [s. end] | [a. end] |
1 | 3 | 1 |image0001 |image0030 |image0064|image0104|
2 | 1 | 3 |image0055 |image0102 |image0123|image0154|
etc.
e. = event number
a. = action
s. = stroke
p. = person
g. = gesture
Person 3 does gesture/action 1 during frames 1-104, the stroke/core
being during frames 30-64. Person 1 does nothing from 1-54, but then
does gesture/action 3 from 55-154, the stroke/core being 102-123.
In this way, only frames that mean something (kind of "key frames")
are
mentioned. Idle time is implicitly given by not mentioning an action
for
a person.
3.3 Coarse annotation
All images start at index 10000.
Example:
image10000
image10001
image10002
...
Please find herewith a coarse annotation to help you provide the exact annotation for scenarios A, B and C.
Scenario A
10000 to 11449: [Cam1, Cam2, Cam3] all participants enter the room
11688 to 12734: [Cam2, Cam 3] 1, 2, 3 looks neutral to 4, 5, 6
13270 to 13435: [Cam2, Cam 3] 1 looks angry to 4, 5, 6
14000 to 14180: [Cam2, Cam 3] 2 looks angry to 4, 5, 6
14370 to 14550: [Cam2, Cam 3] 3 looks angry to 4, 5, 6
14645 to 15050: [Cam2, Cam 3] 1, 2, 3 looks smiling to 4, 5, 6
16380 to 18190: [Cam1, Cam 3] 6 and 5 acts
18430 to 18875: [Cam1, Cam 3] 4 acts
19190 to 19282: [Cam1, Cam 3] 4 acts
20390 to 20540: [Cam1, Cam 3] 4 acts
20670 to 20820: [Cam1, Cam 3] 4 acts
21180 to 22613: [Cam1, Cam2, Cam3] all participants leave the room
Scenario B
10000 to 11480: [Cam1, Cam2, Cam3] all participants enter the room
12060 to 13900: [Cam2, Cam 3]
14060 to 15900: [Cam1, Cam 3]
16600 to 18342: [Cam1, Cam2, Cam3] all acts and all participants leave
the room
Scenario C
10000 to 11370: [Cam1, Cam2, Cam3] all participants enter the room
11850 to 13690: [Cam2, Cam 3]
13630 to 15250: [Cam1, Cam 3]
15370 to 17240: [Cam2, Cam 3]
17070 to 18960: [Cam1, Cam 3]
18970 to 20185: [Cam1, Cam2, Cam3] all participants leave the room