Skip to main content

Applied Cliplets-based half-dynamic videos as intervention learning materials to attract the attention of adolescents with autism spectrum disorder to improve their perceptions and judgments of the facial expressions and emotions of others



Autism spectrum disorders (ASD) are characterized by a reduced ability to understand the emotional expressions on other people’s faces. Increasing evidence indicates that children with ASD might not recognize or understand crucial nonverbal behaviors, which likely causes them to ignore nonverbal gestures and social cues, like facial expressions, that usually aid social interaction.


In this study, we used software technology to create half-static and dynamic video materials to teach adolescents with ASD how to become aware of six basic facial expressions observed in real situations.


This intervention system provides a half-way point via a dynamic video of a specific element within a static-surrounding frame to strengthen the ability of the six adolescents with ASD to attract their attention on the relevant dynamic facial expressions and ignore irrelevant ones.


Using a multiple baseline design across participants, we found that the intervention learning system provided a simple yet effective way for adolescents with ASD to attract their attention on the nonverbal facial cues; the intervention helped them better understand and judge others’ facial emotions.


We conclude that the limited amount of information with structured and specific close-up visual social cues helped the participants improve judgments of the emotional meaning of the facial expressions of others.


ASD is a neurodevelopmental condition defined by impairments across the areas of reciprocal social interaction problems with verbal and nonverbal communication and with repetitive and stereotyped behaviors (Boelte and Hallmayer 2013). People with ASD have a range of cognitive and affective difficulties recognizing feelings in themselves and others (Lacava et al. 2007). Although people with high-functioning autism (HFA) may perform better in recognizing basic emotions, they have difficulty understanding more complex emotions (Bauminger 2004; Capps et al. 1992; Hillier and Allinson 2002). In addition, among the most characteristic early symptoms of ASD are atypical eye contact and joint attention, which profoundly impair the development of social-communication skills (Senju and Johnson 2009). Several eye-tracking studies of young children with ASD illustrate an emerging consensus that detailed characterizations at the level of eye movements in response to fixating and tracking visual stimuli are important (Falck-Ytter et al. 2013). Children with ASD typically have behavioral difficulties that suggest problems with visual attention; it is unclear whether this attention deficit causes the other symptoms of ASD or is a consequence of the disorder (Koldewyn et al. 2013). Research (Durham University News 2013) has indicated that children with ASD might be missing crucial nonverbal indicating behaviors, which likely causes children with ASD to not be able to recognize or understand nonverbal gestures and social cues, like facial expressions, that usually aid social interaction. Missing these cues generally has a negative effect on their social interaction skills and the flow of their communication (Mundy et al. 1986) because people with ASD cannot interpret other people’s facial expressions and emotional states, or understand the intentions and internal activities of others (Krasny et al. 2003). They also cannot respond with appropriate gestures, postures, or proximity (Ryan and Ni Charragain 2010)—a defect that researchers have called the Theory of Mind ability: the skill to view things from other people’s perspective and to understand the mental states of others (Smith 2006)—i.e., the ability to empathize (Baron-Cohen and Belmonte 2005; Baron-Cohen et al. (1985). Therefore, children with ASD, who normally pay more attention to inanimate objects than to faces, need to be taught specific verbal and nonverbal indicating behaviors involved in social interactions, and must learn to pay attention to the faces of people they meet and talk to, in order to understand social emotional behavior (Martins and Harris 2006; McPartland et al. 2011).

Interventions to improve specific perception judgments of facial expressions

Blum-Dimaya et al. (2010) reported that using facial pictures and video training taught children with ASD to develop social communication skills and to focus on the specific visual representation and facial cues to judge others’ emotions. Some researchers also believe that people with ASD have an impaired ability to understand complex emotional and social information from facial stimuli (Baron-Cohen et al. 2001). Facial expressions are a key determinate of nonverbal cues in social development and the ability to interact with others (Back et al. 2007; Baron-Cohen et al. 1997). Moreover, understanding social situations requires paying attention to other people and to the subtle social cues they generate. However, people with ASD lack these abilities, especially an intuitive awareness and ability to judge the facially expressed emotions of others.

Golan et al. (2010), using the Transporters DVD ( as a learning tool to attract the attention of children with ASD, focused on the expressions of animated human faces on toy trains, buses, and other vehicles (transporters) that are characters in a video story which illustrates, names, and describes emotions in some common social situations. They reported that their participants’ ability to judge others’ emotions significantly improved. Thus, evidence shows that video-based interventions (VBIs) such as Video Modeling (VM) have been therapeutically effective for teaching functional, social, and behavioral skills to children with ASD (Ayres and Langone 2005; Bellini and Akullian 2007; Corbett and Abdullah 2005). Although VBIs are advantageous for promoting the motivation of children with ASDs to learn, the children still have difficulty dynamically adjusting the size of their attentional focus and switching the locus of their attention (Elsabbagh et al. 2009; Facoetti et al. 2008; Ibanez et al. 2008; Kikuchi et al. 2011; Landry and Bryson 2004; van der Geest et al. 2001), especially in patterns that include dynamic, repetitive, or social stimuli. It has been suggested (Koldewyn et al. 2013), however, that decreased multiple object tracking (MOT) performance is not due to deficits in dynamic attention but to a diminished capacity to select and maintain attention on multiple targets. Therefore, several problems that may occur, such as a lack of progress, could be due to a lack of reinforcement of sustained attention, poor video content, or a lack of prerequisites (Sigafoos et al. 2007).

Cliplets-based half-dynamic video as intervention materials

In this study, we reduced multiple targets in specific social areas to create half-static and half-dynamic video called Cliplets-based video (CBV) for our intervention-based learning materials. We hypothesize that CBVs require less attention and generate less stress than normal dynamic video (DV) because they have fewer visual channels requiring attention due to the freezing of certain unimportant parts of the DV so as not to distract watchers. Be that as it may, the ignored parts can also support their situational awareness so as to more accurately experience the status of the scenario. In the fundamental theory and practice of ergonomics, when people are required to pay attention to more signal channels, the person’s stress and mental load will increase because the short term sensory store is limited in its ability to process information, and so handling large amounts of information requires the selective focus of attention (Dzubak 2008; Valkenburg 2011). Theories differ as to whether humans employ a single resource or multiple resources to manage the use of this limited capability (Nemeth 2004). Attention is a limited resource, and people have a fixed amount that must be allocated according to need (Scalf et al. 2013). To use a popular analogy, attention is like a bucket of water. People draw upon it as needed, but every dipper full and every teaspoon full leaves less for other purposes (Marc Green 2013). However, total attentional capacity varies according to circumstance (Swallow and Jiang 2013) therefore, the CBV’s cognitive loading was reduced to conserve their attention resources and generate less stress.

The intervention contains a novel visual strategy, namely CBV, to ameliorate this problem. CBV is able to attract the attention of adolescents with ASD, and seems to enable them to focus on the dynamic elements in a video clip. Accordingly, it can be used to help such adolescents increase their awareness and guide their understanding toward the actual meaning and emotional value of facial expressions in specific social situations. Many scholars have suggested that a relatively constrained viewing area limits the attentional frame, which helps people with ASD focus their attention on relevant stimuli and ignore irrelevant ones (Charlop-Christy and Daneshvar 2003; Sherer et al. 2001; Shipley-Benamou et al. 2002). This is why we believe that it is necessary to teach adolescents with ASD to pay attention to some social signals and ignore others. The rationale of using this type of video with adolescents with ASD is that it can simplify social stimuli, and help direct attention to the most relevant features and areas.

Aims of this study

Adolescents with ASD inherently lack the ability to focus attention on the most important social features of a given situation or interaction, such as facial expressions, body movement, and relevant social cues; moreover, they tend to pay less attention to other people and their actions and focus their attention instead on non-social objects (Shic et al. 2011). We developed a CBV intervention learning system because we believe that a DV is too complex to effectively teach adolescents with ASD the skills involved in becoming aware of and recognizing the emotions generated by different facial expressions. Adolescents with ASD always pay more attention to inanimate objects than to faces (e.g. a vase put on a table or a clock hung on a wall); therefore, we provided a half-way point—a short video (less than 60 s) of a specific element within a static frame of surrounding elements (a Microsoft Research Cliplet)—to guide them in increasing their emotional awareness by paying attention to the facial cues of others. This is a novel visual strategy that supports adolescents with ASD by improving their ability to attract their attention on the relevant dynamic nonverbal cues and ignore irrelevant ones. Therefore, this research used Microsoft™ Research Cliplets app (Microsoft 2012) to capture viewing material from real-life situations and create CBV materials to mix static and dynamic elements from a video clip to test its efficacy for adolescents with ASD. We focused on investigating the facial expressions extracted from video materials to determine whether our intervention helped adolescents with ASD understand social situations and judge the emotional meaning of other people’s facial expressions.



We recruited, through the Taiwan Autism Association, six adolescents (4 boys, 2 girls: Yu, Han, Deng, Gung, Yen, and Yuen: all pseudonyms to guarantee anonymity) with ASD as participants [mean age = 13.6 years old; age range: 12–15 years; intelligence quotient (IQ) scores: (a) full scale IQ (FIQ) = 99.66; (b) verbal IQ (VIQ) = 101.33; and (c) performance IQ (PIQ) = 105.66). The inclusion criteria required (1) a clinical diagnosis of ASD based on DSM-IV-TR criteria, (2) no other specific disabilities, (3) not taking medications for physician- or self-diagnosed illnesses, (4) no physician-diagnosed comorbidities, (5) not undergoing any other therapies at the time of the testing, and (6) an FIQ > 90 (Table 1). All participants were fluent in Mandarin Chinese and Taiwanese. The participants’ sensory abilities were within the normal range; however, their parents and special education teachers reported that they usually manifested poor social and communication skills, rarely understood other people’s emotional expressions, and usually did not respond appropriately. Data on the participants’ intelligence, sensory abilities, and social and communication skills were based on multiple information sources: parental interviews, teachers’ reports, VIQ scores (Wechsler Intelligence Scale for Children and Adolescents), and functional language and social adaptation levels (based on clinical observations or behavior and adaptation scales). All participants had a disability identification card issued by a medical institution in Taiwan and had been counseled in special education schools and institutes in Taiwan. The study protocol was approved by our university hospital’s Institutional Review Board (IRB). All procedures performed in this study were in accordance with the standards of the institutional and comparable ethical standards. All participants signed a youth consent form, and parental consent forms were obtained before the participants were enrolled in the study.

Table 1 Summarized demographic information of the participants

Developing the Cliplets-based video intervention materials

The intervention learning system takes video materials that portray everyday life activities and focuses on special moments and social clues to create the CBV, in which the desired part of the image is a movie while the incidental part is a still image (Fig. 1). The researcher created a suitable scenario and script for each CBV after discussing with each participant’s parents and special education teacher. The content of the video was based primarily on the situations that each adolescent’s age might encounter in his or her daily life at home, at school, and in the community (e.g. singing the happy birthday song for a younger brother and blowing out the candles together, or giving a present and big hug to Mother and wishing her a happy Mother’s Day). Each short script was designed to allow the participant’s parents and special education teacher to develop the stories from the adolescent’s point of view. In this manner, the CBV content was tailored for participants to learn useful social scenarios appropriate for their age and life stage. We wondered whether the improvements in their awareness of other people’s emotions would also generalize to real-life situations as there is a big difference between watching a video of a social interaction and engaging in real-life social interaction. Although the latter consists of many static and dynamic features in the interaction and in the environment, there is usually little or no chance to reflect on the situation in real life where appropriate responses are required. Therefore, we created CBV materials to give participants the chance to judge different facial expressions in real-life scenarios. By providing the adolescents with a wide range of scenarios that reflect their everyday life, and analyzing stories suitable for them, therapists might better understand what actually happens in the lives of their child clients and how the adolescents feel about it. We also included other measures, like interviews and treatment reports from parents and therapists that commented on the children’s responses to real-life situations. An effective CBV must have significant facial expressions and appropriately related body language. We focused on investigating the facial expressions and other particulars of a situation extracted from the video materials, which also was required to have rich interaction in each scenario. Highly complicated and abstract videos were excluded. Some films have many metaphorical scripts or symbolic gestures, including people using sentences such as “you are as loyal as a dog”, or “I love you more than life itself” (metaphor), and using hand gestures to form a “heart shape” to represent I love you (symbolic). This specific experimental material was avoided because most individuals with ASD have difficulty understanding abstract language, sarcasm, and metaphors (Persicke et al. 2012). In order focus on visual strategy training, we excluded the possibility of other bias interferences. In addition, we also invited therapists and special education teachers to ensure the training materials were suitable and objective to test for ASD.

Fig. 1
figure 1

Sample of the CBV (photographic material by participant’s parents from the participant’s everyday life)

Design procedure

CBV materials can be used for instructing adolescents with ASD to learn how to pay attention to certain social cues. Moreover, Cliplets give the therapist a simple tool to create a mixed visual medium to direct the attention of adolescent with ASD to certain dynamic facial signals and ignore other contexts (Fig. 2). Thus, participants can focus on facial expressions and relate them to the surrounding situations. In this manner, their area of attention is constrained, thereby reducing their visual load of multiple object channels. First, we use the Cliplets app to: (a) import the short video clip of the story fragment and then choose which parts to keep as a movie and which parts to freeze. (b) We then add a new layer (a loop). (c) We next select a small part focused on the facial expressions and ensure that they are clear and easy to identify in the context of the story; therefore, the fragment we select must be an important part of the video. We want the participant to easily perceive it and try to judge it in context of the clip with the animation we want to feature and (d) decide where the loop will start and end. (e) Once that the start and end have been defined, we draw a mask around the area we want to animate (everything inside the mask will move, and everything outside the mask provides the still background image). It is also possible to create more than one animation mask by adding more than one layer, which allows two focused movement areas in the Cliplet). (f) When the CBV is finished, it can be saved in *.gif, *.mp4, or *.wmv format (Fig. 3). All CBV materials we created show various emotions and were played at the same resolution (1024 × 768) on a 27-inch display. Each video source we used was approximately 45 s long.

Fig. 2
figure 2

Cliplets software used to create a mixed visual medium to focus on those dynamic facial signals and ignore other contexts (Materials retrieved from Youtube:

Fig. 3
figure 3

CBV materials development process


We took place at three Special Education Schools in Tainan, Taiwan. All sessions were conducted in a 4 × 5-m day-treatment room at the school. Each room was equipped with chairs and a table, a personal computer with an Intel Core i7 processor, and a 27-inch LCD monitor. We selected those schools because they have special education classes and special education teachers suitable for us to conduct the test. Further, the participants selected from those schools were suitable to join the class with typically developing (TD) children to learn. Moreover, part of them received some or all of their education in regular classrooms. Those participants’ learning performance on school’s class was sufficient to meet our training the evaluations of the occupational therapist and special education teacher, and their parents also agreed to their participation in our research. We met with the participants and an accompanying occupational therapist for 1 h each week. The participants sat in front of the LCD monitor, and the researcher and occupational therapist sat in chairs beside the table. The occupational therapist recorded all of the participant’s answers during each session.

Phases, sessions, and experimental conditions

This study used a multiple-baseline-across-participants design to analyze the experimental control in the three phases (described below). One certified occupational therapist with more than 3.5 years of experience working with adolescents with ASD conducted all sessions. She instructed all participants what to do during the intervention phase. The experiment had three phases: (a) baseline: in this phase, the researcher collected the dependent-variable data without any intervention in place; (b) intervention: the intervention learning system was used for 6–7 weeks to train the participants to notice key social clues in specific scenarios and to allow the therapist to obtain the performance data used in the assessment. During the intervention phase, the CBV training system was used to teach adolescents with ASD the skills involved in becoming aware of and recognizing the emotions generated by paying attention to the facial cues of others. This is a novel visual strategy to help focus the participants’ attention on the relevant dynamic nonverbal cues and ignore irrelevant ones. And, (c) maintenance: 4 weeks after the intervention phase, the post-training performance of the participants was assessed in order to reduce test–retest interference.

Regarding the advantages of a multiple baseline over an AB design in relation to this study, we provide some elaborations, as follows. In autism and other related fields, individual differences are quite large; therefore, therapy usually focuses on only a single individual’s development rather than a group’s. The six members of our study group have a congenital condition that manifests differently in each person; therefore, we used a single-subject in a multiple baseline across subjects design (Plavnick and Ferreri 2013) to confirm the intervention effectiveness in single subjects, despite their being ostensibly members of a group of similar subjects. This is regarded as a standard and evidence-based method in many intervention treatments used in medical, psychological, and biological research, and is especially applicable to for autism studies provided useful information for the field of special education (Kennedy and Craig 2005; Odom and Strain 2002; Wolery and Dunlap 2001). It is a fundamental experimental method for research in the field, and, in actual practice, does not require control groups or many subjects. Single-subject research has proven particularly relevant for defining educational practices at the level of the individual learner (Horner et al. 2005). Educators building individualized educational and support plans have benefited from the systematic form of experimental analysis single subject research permits (Dunlap and Kern 1997). The multiple baseline design is a style of single-subject research involving the careful measurement of multiple persons, traits, or settings both before and after a treatment. Because the manifestations of autism are different in each individual, the purposes of the research were to ascertain whether the intervention was effective and how each individual had improved (Lindgren and Doobay 2011). It has several advantages over AB designs. It is important to note that the start of the treatment conditions was staggered (started at different times) across individuals, and because of this, we could conclude that changes are due to the treatment rather than to a chance factor. By gathering data from many instances, inferences can be made about the likeliness that the measured trait generalizes to a greater population (Christ 2007).

Baseline phase

In the baseline phase, the therapist (a) first explained to the participants the meanings of the six basic emotions (happy, sad, angry, surprise, fear, and disgust) (Ekman 2005) that they would be asked about. (b) The therapist then had the participants watch 20 standard DVs (not the CBVs) to determine the emotion described in each question. All DVs were displayed on the desktop computer. (c) After they watched the standard DVs, the participants chose one of the six facial expressions of emotion pictures according to the Facial Action Coding System (FACS) (Hamm et al. 2011) from the target emotion pictures that they thought best reflected the feelings of the characters in the videos and one of the six adjectives to answer each question. Subsequently, they tried to mimic the facial expressions in the scenarios and pretended to feel the represented emotions. In this phase, the participants’ ability to determine which emotion was expressed by which facial expression picture in the video scenario and what emotions the participants felt when they saw each facial expression picture was assessed. The answer for each situation mirrored the corresponding emotional expressions; correct and incorrect answers were identified and recorded, after which the rate of correct answers was determined. At baseline, we recorded each participant’s ability to judge the emotional meanings of the facial expressions of the characters in typical DVs.

Intervention phase

In the intervention phase, the participants were required to watch the CBV intervention video materials to activate their understanding of the contexts and judge the questions about emotions. In the first session of the intervention phase, (a) the therapist taught the participants how to watch the CBVs on the computer and made sure that they felt comfortable using the intervention learning system. (b) Given that the intervention not only involved watching the videos, but also mimicking the facial expressions, pretending to feel the emotion, and receiving corrective feedback on the identification of the facial expressions, the CBVs were delivered as part of an intervention package. Because children with ASD tend to move around and easily become distracted, the therapist had to remind him/her to stay focused on screen and training materials. The instruction time was 40–45 min per session. (c) The participants began the experimental sessions by watching the 20 CBVs and focusing their attention on relevant social stimuli on the monitor. (d) After they watched the CBVs, the participants then selected the basic facial expression that they thought best reflected the feelings of the character in video and one of the six adjectives to answer each question. Afterwards, a therapist rated the participants’ learning performance based on these answers. When an answer was incorrect, the therapist asked the participant to watch the CBV again to observe the social stimuli in the surrounding situation, then asked the participant to determine what each social cue represented and why the appropriate facial and emotion adjective should be selected in the scenario.

Maintenance phase

Four weeks after the intervention phase ended, the maintenance phase began. Using the baseline phase procedure, the similar scenario length, and difficulty (totally 40 DVs; 20 in baseline, 20 in maintenance phase) was used in the baseline and maintenance phases respectively (we did it to ensure they have non-interference), but a different scenario was used in the intervention phase, all the test materials were judged consistent in their length and level of difficulty by special education experts and the occupational therapist. The therapist determined whether the participants had retained the skills they had taught.

In the beginning, we asked participants’ parents and special education teachers using the same visual strategy created and recorded whole untreated videos just like normal DVs seen on TV. The DVs’ scenarios were about the perception, family love, or something related to adolescents’ daily social and emotional learning (SEL) materials. Further, we selected certain DV (totally 40 DVs; 20 in baseline phase, 20 in maintenance phase) scenario topics that were similar to, but not the same story event as used in the baseline and maintenance phases to reduce their test–retest effect. We also considered the DV’s difficulty level and tried to select a level as consistent as possible with the CBVs through a pilot study for similar aged adolescents and consulted the therapist. Relatively, in the intervention phase, we also created 20 DVs have similar topic according social and emotional interaction scenarios but not the same story in the intervention phase without used as our draft of training materials, and then those draft DVs were selected and some parts frozen (surrounding inanimate objects). The social cues, such as our facial expressions, gestures and interactions, however, were still active within the whole video to create the CBV. The CBVs used in this phase were aimed at helping to enhance participants’ ability to focus on the important parts and learn to judge different facial emotions.

Intervention materials

In this study, we created 20 scenarios in 20 different CBVs. All six participants watched the same materials at each session to ensure test consistency. Each CBV test lasted 2 min, and each session lasted 40–45 min. All CBVs were created following the same rules, which allowed the researcher to ensure the length and difficulty of each CBV was consistent. Each CBV was then discussed with their therapist and special education teacher using the same visual strategy and questions. The scenario content was about the facial expressions of emotions that frequently occur in each participant’s daily life. The content was intended primarily to depict the six basic emotions, and the scenarios and scripts selected were approved by the 5 special educational experts and the 3 occupational therapists. The scenario stories used for the intervention phase were different from those used for the baseline and maintenance phases. Two questions were asked per story (Q1: six target emotion pictures that they thought well reflected the feelings of the characters; Q2: six adjectives to answer each question), and the participants were not prompted for answers. The questions for each CBV test used the nonverbal cues in the film to judge the participants’ emotional awareness and whether the intervention system training had increased their ability to understand the emotional expressions on other people’s faces (in the sense that the participants must understand the social behavior in the scenario and match them to the emotions expressed in the CBVs). We purposely promoted the social cues and scenario plots to make them more easily prompt the emotional expression cues for the participants to notice.

Reliability of the data

The researcher who examined the procedural reliability of this study was the same certified occupational therapist who conducted all of the tests. In addition, we held expert meetings and conducted pilot testing to verify the test items. We followed the related experimental methods used in other studies (Castelli 2005; Chen et al. 2015) to train and test for the subjects’ ability to identify the six core emotions of happy, sad, angry, surprised, fear, and disgust (Ekman 2005), which are always difficult for adolescents with ASD to grasp and understand. We set the workflow checklist in the test procedure to follow standard operating procedures for a therapist to ensure consistency in the processes and related controls (including the video content, duration, test questions, facial expressions, case criteria, and test environment).Tithe data show that this study achieved a procedural reliability of 99 %. We also used the same visual strategy and design to control the consistency of each video material, to ensure that there were no unclear or emotionally confusing parts. We also did a pretest using TD adolescents of the same age without ASD to confirm reliability and validity of the test items. Their answers were checked by 3 therapists and 5 experts who tested for normative answers, and after the participants completed each test in each phase, we used questionnaires and interviews for expert assessment, and parental reports related to the results of the tests to ensure that social reliability and validity simulated real life. A five-point Likert scale (1—deterioration, 2—no change, 3—a little improvement, 4—fairly good improvement, 5—great improvement) was used to assess the teaching effects. The average score was 86 out of a total of 100, showing the very positive attitudes of parents and 3 therapists towards the validity of this study. In the interviews, it was also found that both the child’s parents and 3 therapists believed there were significant improvements in the child’s emotional judgement skills after the intervention, and thus they felt that it had very good teaching effects.

Data analysis

The percentage of non-overlapping data (PND) was applied. In addition, the Kolmogorov–Smirnov test (KS-test) is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution (Goodman 1954). The KS-test attempts to determine if two datasets differ significantly, and has the advantage of making no assumption about the distribution of the data (Ziegel 2000). It is widely used to analyze the distribution patterns of data sets, especially for self-distribution differences in small samples of individuals themselves (Lilliefors 1967). Further, in our literature review, many similar studies adopted the KS-test for their research evaluation and suggested that the KS-test indeed can indicate whether the learning curve is significant or not (Chang et al. 2011; Shih et al. 2014). Moreover, the KS-test enabled us to view the data graphically which can help better understand how the data is distributed.



Experimental data on the six participants (Yu, Han, Deng, Gung, Yen, and Yuen) in each phase were analyzed. Descriptive statistics of the baseline, intervention and maintenance measures are shown in Table 2. The baseline phase consisted of 3 sessions for Yu, 6 for Han, 7 for Deng, 10 for Gung, 11 for Yen, and 12 for Yuen; however, the intervention phase consisted of 6 sessions for all participants. And lastly, the maintenance phase consisted of 12 sessions for Yu, 9 for Han, 8 for Deng, 5 for Gung, 4 for Yen, and 3 for Yuen (Fig. 4). After the participants had watched the CBVs and answered the questions during the maintenance phase, their skills in judging the emotions of others’ facial expressions were evaluated. Results showed that they had retained the skills they acquired during the maintenance phase.

Table 2 Summarized results of the participants
Fig. 4
figure 4

Correct assessment rates of participants’ correct responses to questions about the feelings of others during the three study phases

Intervention effect

During the baseline phase, the mean correct assessment rate was 35.71 % for Yu; however, in the intervention phase, that rate rose to 72.92 %, and then to 83.18 % in the maintenance phase. The mean correct assessment rate for Han was 50.00 % during the baseline phase. It increased to 96.43 % during the intervention phase, and was 85.71 % during the maintenance phase. During the baseline phase, the mean correct assessment rate was 37.75 % for Deng. During the intervention phase, that rate increased to 86.01 %. During the maintenance phase, it was 77.68 %. The mean correct assessment rate for Gung was 50.71 % during the baseline phase. It increased to 91.97 % during the intervention phase, and was 77.14 % during the maintenance phase. The mean correct assessment rate for Yen was 42.21 % during the baseline phase. It increased to 83.63 % during the intervention phase, and was 76.79 % during the maintenance phase. The mean correct assessment rate for Yuen was 31.55 % during the baseline phase. It increased to 83.04 % during the intervention phase, and was 75.00 % during the maintenance phase (Table 2).

The proportion of non-overlapping to total number of intervention points was calculated. All intervention points among the six participants exceeded their highest baseline point (non-overlapping), indicating highly effective training. Additionally, the Kolmogorov–Smirnov (Siegel and Castellan 1998) test curves indicate that the correct assessment rates of all the participants were significantly (p < .05) higher than baseline during the intervention and maintenance phases.


We found that the intervention system was effective in helping the six participants better judge the six basic facial expressions of others. Furthermore, for make sure more close daily status description than only test score reports. We collected data on the participants’ learning performance were based on multiple information sources: parental interviews, therapist and teachers’ reports, tests scores to ensure close to the real situation feedback. In the baseline phase, the therapist reported that the participants always paid attention to irrelevant parts of the video scenes, such as a cat lying on a chair. They frequently focused on only what interested them, for example, a picture hanging on a wall or decorative objects on a table. Moreover, via the therapist’s discussion with the participants of the character’s facial emotions, we found that they could not understand the plot events and found it difficult to recognize the emotional states of the characters. Although the participants could choose the correct words to describe the emotions, they could not identify the facial expressions that corresponded to those emotions. This could be reasons why all 6 children started with low scores (range 31.55–50.71 %) during the baseline phase.

During the intervention phase, however, using the half-static and half-dynamic video as learning materials guided them to attract their attention on the dynamic elements in the video clip. Although sometimes they still fixated on irrelevant parts of the whole video scenes, CBVs generally helped drive their attention to focus on the picture’s meanings of nonverbal feelings in facial expressions in specific social situations. They were attracted by the characters and began to ask the therapist a series of questions about why the characters’ facial expressions changed, about their gestures, and about the related social activities.

Compared to the traditional DV strategy such as VM, the CBV materials support more specific and dynamic strategy to allow the therapist to directly judge which part is needing enhancement and which can be ignore. Adolescents with ASD have chances to compare whole dynamic and part dynamic video’s differences, which can help them to imagine this video’s original situation and also link the emotional awareness with this movement. Moreover, our strategy extends the DV’s unique benefits of television/video methodology to include: fun, reduced stress, teaching flexibility, multi-sensory teaching and increased ability to gain and hold the student’s attention as well as the ability to have complete control over the observed stimuli (McCoy and Hermansen 2007). CBVs also provide more visual interaction with our participants, and attract them to discuss the video’s scene by knowing which part is important and which is not.

In addition, the therapist also found many interesting ideas during the test phase. In the baseline phase, the therapist reported that the participants always paid attention to the movie’s beginning and end, but ignored the pivotal social signals in facial expressions. The participants could not easily determine, and they frequently confused, the emotions that the six facial expressions represented. Adolescents with ASD are normally unable to judge what people are feeling based solely on facial expressions; similarly, in this study, we also found the participants always asked the therapist why some people were always sitting or standing in the corner of a room, or sitting on a chair, or always waving their hands without talking. They either do not see or do not pay attention to the facial clues. Moreover, they do not seem to understand the key points in social situations, neither their own nor those of others, and are unable to assign them an appropriate emotional status. For example, although our participants could describe what happened in the scenarios and roughly describe events that they saw occur, tell the therapist how many people were in the video, and explain why the scene was light or dark, they were unable to focus on the characters’ facial expressions. This confirms the findings of a recent study (Durham University News 2013), which says that children with ASD might be missing crucial nonverbal social cues, like facial expressions. Missing these cues generally has a negative effect on their social interaction skills and their understanding of the emotional expressions on other people’s faces. This was also mentioned in the parents’ questionnaire feedback reports about their children. They said that when their children were watching cartoons at home, they paid attention to some flashing words or to the patterns on the characters’ clothes, and that they rarely were able to discern the key elements of social signals. They spent more time searching for patterns they were interested in and ignored the important parts of the movie because they were often unable to understand the feelings of the characters.

In contrast, during the intervention phase, the intervention system trained them as to what nonverbal social cues to pay attention to in order to understand the characters’ feelings. The participants then began to ask the therapist a series of questions about why a character’s facial expressions changed, why some characters propped up their head in their hands, and why a particular character bowed his head and felt sad. These questions showed that the participants were attracted by the dynamic parts of the CBVs, had observed the moving elements, and had focused on the relevant parts. During the intervention phase, the therapists were able to interactively teach them how to observe those clues and what they meant. They eventually were able to find the social clues in the photos and compare the details of different facial clues in each scenario. Overall, the intervention system helped these six adolescents with ASD increase their comprehension of social situations and judge the emotions of others’ facial expressions. In their subsequent reports, the parents said that their children showed good improvement in facial expressions and emotional judgment, especially when they were watching familiar scenarios, and that they could point out the key elements, and discuss with them the facial expressions and the feelings of the characters.


This study has some limitations. First, our sample was relatively small because this strategy is a fairly new intervention strategy for individuals with ASD, it was difficult to recruit participants to join the study; moreover, the participants had limited time for the tests as many had activities and classes to take part in. Accordingly, it would be advantageous to recruit and enroll larger samples and extend the experiment time period to provide stronger evidence.

Conclusion and future work

We conclude that a limited amount of information with structured and specific close-up social cues helped the participants improve their judgments about the emotional meaning of the facial expressions of others. In general, although adolescents with ASD may encounter other barriers, the visual support and structured situational characteristics of scenario videos were beneficial for their awareness and understanding of the feelings of others, and also helped them to improve their social-emotional function. We found that using the intervention system enabled them to recognize and understand the emotions in the facial expressions of others, which they had previously ignored. Thus, this study used an innovative but simple technique to improve the teaching of adolescents with ASD by making it more entertaining and effective. It triggered the children’s learning incentive and encouraged them to observe nonverbal facial expression signals, those benefits were not only used as DV function. This study successfully assessed the effectiveness of the CBV training material for emotion judgment by using a multiple baseline design across the participants.

Moreover, future studies using objective devices, such as eyes tracking device and facial action coding system, to measure correct identification of emotions from facial expressions and attention status are needed. In the future, we also want to observe how their viewing behaviors changed and whether these changes can further promote their social skills used in their daily social reciprocity behaviors over time. Finally, future research is warranted to determine how reinvent visual media to increase the recognition of emotion in adolescents and other age-groups with ASD.


  • Ayres KM, Langone J (2005) Intervention and instruction with video for students with autism: a review of the literature. Educ Train Dev Disabil 40(2):183–196

    Google Scholar 

  • Back E, Ropar D, Mitchell P (2007) Do the eyes have it? Inferring mental states from animated faces in autism. Child Dev 78(2):397–411

    Article  Google Scholar 

  • Baron-Cohen S, Belmonte MK (2005) Autism: a window onto the development of the social and the analytic brain. Annu Rev Neurosci 28:109–126

    Article  Google Scholar 

  • Baron-Cohen S, Leslie AM, Frith U (1985) Does the autistic child have a “theory of mind”? Cognition 21(1):37–46

    Article  Google Scholar 

  • Baron-Cohen S, Wheelwright S, Jolliffe T (1997) Is there a ‘‘language of the eyes’’? Evidence from normal adults, and adults with autism or Asperger Syndrome. Vis Cognit 4(3):311–331

    Article  Google Scholar 

  • Baron-Cohen S, Wheelwright S, Hill J, Raste Y, Plumb I (2001) The “Reading the mind in the eyes” Test revised version: a study with normal adults, and adults with Asperger syndrome or high-functioning autism. J Child Psychol Psychiatry 42(2):241–251

    Article  Google Scholar 

  • Bauminger N (2004) The expression and understanding of jealousy in children with autism. Dev Psychopathol 16(1):157–177

    Article  Google Scholar 

  • Bellini S, Akullian J (2007) A meta-analysis of video modeling and video self-modeling interventions for children and adolescents with autism spectrum disorders. Exceptional Child 73(3):264–287

    Article  Google Scholar 

  • Blum-Dimaya A, Reeve SA, Reeve KF, Hoch H (2010) Teaching children with autism to play a video game using activity schedules and game-embedded simultaneous video modeling. Educ Treat Child 33(3):351–370

    Article  Google Scholar 

  • Boelte S, Hallmayer J (eds) (2013) Autism spectrum conditions: FAQs on autism, Asperger syndrome, and atypical autism answered by international experts. Hogrefe Publishing, Ashland

    Google Scholar 

  • Capps L, Yirmiya N, Sigman M (1992) Understanding of simple and complex emotions in nonretarded-children with autism. J Child Psychol Psychiatry 33(7):1169–1182

    Article  Google Scholar 

  • Castelli F (2005) Understanding emotions from standardized facial expressions in autism and normal development. Autism 9(4):428–449

    Article  Google Scholar 

  • Chang Y-J, Chen S-F, Huang J-D (2011) A Kinect-based system for physical rehabilitation: a pilot study for young adults with motor disabilities. Res Dev Disabil 32(6):2566–2570

    Article  Google Scholar 

  • Charlop-Christy MH, Daneshvar S (2003) Using video modeling to teach perspective taking to children with autism. J Posit Behav Interv 5(1):12–21

    Article  Google Scholar 

  • Chen C-H, Lee I-J, Lin L-Y (2015) Augmented reality-based self-facial modeling to promote the emotional expression and social skills of adolescents with autism spectrum disorders. Res Dev Disabil 36:396–403

    Article  Google Scholar 

  • Christ TJ (2007) Experimental control and threats to internal validity of concurrent and no concurrent multiple baseline designs. Psychol Sch 44(5):451–459

    Article  Google Scholar 

  • Corbett BA, Abdullah M (2005) Video modeling: why does it work for children with autism? J Early Intensive Behav Interv 2(1):2

    Article  Google Scholar 

  • Dunlap G, Kern L (1997) The relevance of behavior analysis to special education. Basic knowledge informing research and practice in special education, Foundations of special education, pp 279–290

    Google Scholar 

  • Durham University News (2013) Children with autism could miss out on non-verbal cues to social interaction. Retrieved from: Accessed 26 Feb 2015

  • Dzubak CM (2008) Multitasking: the good, the bad, and the unknown. J Assoc Tutoring Prof 1(2):1–12

    Google Scholar 

  • Ekman P (2005) Basic emotions handbook of cognition and emotion. Wiley, New York, pp 45–60

    Book  Google Scholar 

  • Elsabbagh M, Volein A, Holmboe K, Tucker L, Csibra G, Baron-Cohen S et al (2009) Visual orienting in the early broader autism phenotype: disengagement and facilitation. J Child Psychol Psychiatry 50(5):637–642

    Article  Google Scholar 

  • Facoetti A, Ruffino M, Peru A, Paganoni P, Chelazzi L (2008) Sluggish engagement and disengagement of non-spatial attention in dyslexic children. Cortex 44(9):1221–1233

    Article  Google Scholar 

  • Falck-Ytter T, Bolte S, Gredeback G (2013) Eye tracking in early autism research. J Neurodev Disord 5:28

    Article  Google Scholar 

  • Golan O, Ashwin E, Granader Y, McClintock S, Day K, Leggett V, Baron-Cohen S (2010) Enhancing emotion recognition in children with autism spectrum conditions: an intervention using animated vehicles with real emotional faces. J Autism Dev Disord 40(3):269–279

    Article  Google Scholar 

  • Goodman LA (1954) Kolmogorov–Smirnov tests for psychological research. Psychol Bull 51(2):160

    Article  Google Scholar 

  • Hamm J, Kohler CG, Gur RC, Verma R (2011) Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders. J Neurosci Methods 200(2):237–256

    Article  Google Scholar 

  • Hillier A, Allinson L (2002) Understanding embarrassment among those with autism: breaking down the complex emotion of embarrassment among those with autism. J Autism Dev Disord 32(6):583–592

    Article  Google Scholar 

  • Horner RH, Carr EG, Halle J, McGee G, Odom S, Wolery M (2005) The use of single-subject research to identify evidence-based practice in special education. Except Child 71(2):165–179

    Article  Google Scholar 

  • Ibanez LV, Messinger DS, Newell L, Lambert B, Sheskin M (2008) Visual disengagement in the infant siblings of children with an autism spectrum disorder (ASD). Autism 12(5):473–485

    Article  Google Scholar 

  • Kennedy CH (2005) Single-case designs for educational research. Prentice Hall, Upper Saddle River

    Google Scholar 

  • Kikuchi Y, Senju A, Akechi H, Tojo Y, Osanai H, Hasegawa T (2011) Atypical disengagement from faces and its modulation by the control of eye fixation in children with autism spectrum disorder. J Autism Dev Disord 41(5):629–645

    Article  Google Scholar 

  • Koldewyn K, Weigelt S, Kanwisher N, Jiang YH (2013) Multiple objects tracking in autism spectrum disorders. J Autism Dev Disord 43(6):1394–1405

    Article  Google Scholar 

  • Krasny L, Williams BJ, Provencal S, Ozonoff S (2003) Social skills interventions for the autism spectrum: essential ingredients and a model curriculum. Child Adolesc Psychiatr Clin N Am 12(1):107–122

    Article  Google Scholar 

  • Lacava PG, Golan O, Baron-Cohen S, Myles BS (2007) Using assistive technology to teach emotion recognition to students with Asperger syndrome—a pilot study. Remedial Spec Educ 28(3):174–181

    Article  Google Scholar 

  • Landry R, Bryson SE (2004) Impaired disengagement of attention in young children with autism. J Child Psychol Psychiatry 45(6):1115–1122

    Article  Google Scholar 

  • Lindgren S, Doobay A (2011) Evidence-based interventions for autism spectrum disorders. The University of Iowa, Iowa

    Google Scholar 

  • Lilliefors HW (1967) On the Kolmogorov–Smirnov test for normality with mean and variance unknown. J Am Stat Assoc 62(318):399–402

    Article  Google Scholar 

  • Marc Green (2013) Law of limited attention—psychological observations. Retrieved from:

  • Martins MP, Harris SL (2006) Teaching children with autism to respond to joint attention initiations. Child Fam Behav Therapy 28(1):51–68

    Article  Google Scholar 

  • McCoy K, Hermansen E (2007) Video modeling for individuals with autism: a review of model types and effects. Educ Treat Child 30(4):183–213

    Article  Google Scholar 

  • McPartland JC, Webb SJ, Keehn B, Dawson G (2011) Patterns of visual attention to face and objects in autism spectrum disorder. J Autism Dev Disord 41(2):148–157

    Article  Google Scholar 

  • Microsoft, Microsoft Research Cliplets (2012) Version 1.1.1

  • Mundy P, Sigman M, Ungerer J, Sherman T (1986) Defining the social deficits of autism: the contribution of non-verbal communication measures. J Child Psychol Psychiatry 27(5):657–669

    Article  Google Scholar 

  • Nemeth CP (2004) Human factors methods for design: making systems human-centered. CRC Press, Boca Raton

    Book  Google Scholar 

  • Odom SL, Strain PS (2002) Evidence-based practice in early intervention/early childhood special education: single-subject design research. J Early Interv 25(2):151–160

    Article  Google Scholar 

  • Persicke A, Tarbox J, Ranick J, Clair MS (2012) Establishing metaphorical reasoning in children with autism. Res Autism Spectr Disord 6(2):913–920

    Article  Google Scholar 

  • Plavnick J, Ferreri S (2013) Single-case experimental designs in educational research: a methodology for causal analyses in teaching and learning. Educ Psychol Rev 25(4):549–569

    Article  Google Scholar 

  • Ryan C, Ni Charragain C (2010) Teaching emotion recognition skills to children with autism. J Autism Dev Disord 40(12):1505–1511

    Article  Google Scholar 

  • Scalf PE, Torralbo A, Tapia E, Beck DM (2013) Competition explains limited attention and perceptual resources: implications for perceptual load and dilution theories. Front Psychol 4(243):1–9

    Google Scholar 

  • Senju A, Johnson MH (2009) Atypical eye contact in autism: models, mechanisms and development. Neurosci Biobehav Rev 33(8):1204–1214

    Article  Google Scholar 

  • Sherer M, Pierce KL, Paredes S, Kisacky KL, Ingersoll B, Schreibman L (2001) Enhancing conversation skills in children with autism via video technology—which is better, “self” or “other” as a model? Behav Modif 25(1):140–158

    Article  Google Scholar 

  • Shic F, Bradshaw J, Klin A, Scassellati B, Chawarska K (2011) Limited activity monitoring in toddlers with autism spectrum disorder. Brain Res 1380:246–254

    Article  Google Scholar 

  • Shih C-H, Chiang M-S, Wang S-H, Chen C-N (2014) Teaching two teenagers with autism spectrum disorders to request the continuation of video playback using a touchscreen computer with the function of automatic response to requests. Res Autism Spectr Disord 8(9):1055–1061

    Article  Google Scholar 

  • Shipley-Benamou R, Lutzker JR, Taubman M (2002) Teaching daily living skills to children with autism through instructional video modeling. J Posit Behav Interv 4(3):166–177

    Article  Google Scholar 

  • Siegel S, Castellan NJ (1998) Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw-Hill Book, New York

    Google Scholar 

  • Sigafoos J, O’Reilly M, de la Cruz B (2007) How to use video modeling and video prompting. PRO-ED, Austin

    Google Scholar 

  • Smith A (2006) Cognitive empathy and emotional empathy in human behavior and evolution. Psychol Record 56(1):3–21

    Google Scholar 

  • Swallow KM, Jiang YV (2013) Attentional load and attentional boost: a review of data and theory. Front Psychol 4:274

    Article  Google Scholar 

  • Valkenburg J (2011) Attention, reflection and distraction: The impact of technology on learning. Synergy Online J Assoc Tutoring Prof v4:4–15

  • Van der Geest JN, Kemner C, Camfferman G, Verbaten MN, van Engeland H (2001) Eye movements, visual attention, and autism: a saccadic reaction time study using the gap and overlap paradigm. Biol Psychiatry 50(8):614–619

    Article  Google Scholar 

  • Wolery M, Dunlap G (2001) Reporting on studies using single-subject experimental methods. J Early Interv 24(2):85–89

    Article  Google Scholar 

  • Ziegel ER (2000) Encyclopedia of Biostatistics. Technometrics 42(2):222–223

    Google Scholar 

Download references

Authors’ contributions

Dr. I-JL conceptualized and designed the study, carried out the assessments and data analyses, drafted the initial manuscript, and approved the final manuscript as submitted. Dr. C-HC critically reviewed and revised the manuscript, and approved the final manuscript as submitted. Dr. L-YL coordinated and supervised data collection at the site, critically reviewed and revised the manuscript, and approved the final manuscript as submitted. All authors read and approved the final manuscript.


This work is supported by the Ministry of Science and Technology of Taiwan (MOST 104-2420-H-006-020-MY3). The authors thank the referees very much for their valuable comments and suggestions on this paper. Besides, we are grateful to the parents and children who participated in our study as well as the students who assisted in the various phases of the studies. I would also like to thank the individuals who participated in this research and the autism associations in Taiwan.

Competing interests

The authors declare that they have no competing interests.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and comparable ethical standards.

Author information

Authors and Affiliations


Corresponding authors

Correspondence to I-Jui Lee or Chien-Hsu Chen.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lee, IJ., Chen, CH. & Lin, LY. Applied Cliplets-based half-dynamic videos as intervention learning materials to attract the attention of adolescents with autism spectrum disorder to improve their perceptions and judgments of the facial expressions and emotions of others. SpringerPlus 5, 1211 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Half-static and half-dynamic video
  • Selective attention
  • Facial expressions
  • Nonverbal social cues