HCD Publications | Research from IIIT-Delhi

| 2025

ETiDM Lab

A visuo-haptic extended reality–based training system for hands-on manual metal arc welding training

Kalpana Shankhwar, Tung-Jui Chuang, Yao-Yang Tsai, Shana Smith

Welding training has been an important job training process in the industry and usually demands a large amount of resources. In real practice, the strong magnetic force and intense heat during the welding processes often frighten novice welders. In order to provide safe and effective welding training, this study developed a visuo-haptic extended reality (VHXR)–based hands-on welding training system for training novice welders to perform a real welding task. Novice welders could use the VHXR-based system to perform a hands-on manual arc welding task, without exposure to high temperature and intense ultraviolet radiation. Real-time and realistic force and visual feedback are provided to help trainees to maintain a constant arc length, travel speed, and electrode angle. Compared to the traditional video training, users trained using the VHXR-based welding training system significantly demonstrated better performance in real welding tasks. Trainees were able to produce better-quality joints by performing smoother welding with less mistakes, inquiry times, and hints.

CI Lab

ARoma: Augmented Reality Olfactory Menu Application

Aarav Balachandran, Kritika Gupta, Prajna Vohra, Anmol Srivastava

This study aims to explore the integration of Augmented Reality (AR) and olfactory technology to enhance dining experience in restaurants. We present ARoma, an innovative AR olfactory menu application for Indian cuisine which provides users with 3D visualisation of dishes, detailed ingredient and nutritional information, and historical context, as well as an olfaction device to deliver the aroma of the dishes. Our research compares the traditional menu experience with the AR menu and ARoma, aiming to understand how these technologies affect customers’ perceptions of food quality, dining enjoyment, and immersion. Our user study involved a sample size of 30 participants, divided into two groups. Group A compared traditional menu experiences with AR menus, while Group B experienced traditional menus followed by ARoma. Using this control group study and mixed-method approach, including quantitative surveys and qualitative interviews, we found that AR menus significantly enhance the dining experience by providing detailed and engaging information. Our findings suggest that AR and olfactory technology can significantly improve customer satisfaction and engagement in the food industry.

ETiDM Lab

An interactive extended reality-based tutorial system for fundamental manual metal arc welding training

Kalpana Shankhwar, Shana Smith

Extended reality (XR) technology has been proven an effective human–computer interaction tool to increase the perception of presence. The purpose of this study is to develop an interactive XR-based welding tutorial system to enhance the learning and hands-on skills of novice welders. This study is comprised of two parts: (1) fundamental manual metal arc welding (MMAW) science and technology tutoring in a virtual reality (VR)-based environment, and (2) hands-on welding training in a mixed reality (MR)-based environment. Using the developed tutorial system, complicated welding process and the effects of welding process parameters on weld bead geometry can be clearly observed and comprehended by using a 3D interactive user interface. Visual aids and quantitative guidance are displayed in real time to guide novice welders through the correct welding procedure and help them to maintain a proper welding position. A user study was conducted to evaluate the learnability, workload, and usability of the system. Results show that users obtained significantly better performance by using the XR-based welding tutorial system, compared to those who were trained using the conventional classroom training method.

XRHCIVR

MIDAS Lab

Long-Term Ad Memorability: Understanding & Generating Memorable Ads

Harini SI, Somesh Singh, Yaman K Singla, Aanisha Bhattacharyya, Veeky Baths, Changyou Chen, Rajiv Ratn Shah, Balaji Krishnamurthy

Despite the importance of long-term memory in marketing and brand building, until now, there has been no large-scale study on the memorability of ads. All previous memorability studies have been conducted on short-term recall on specific content types like action videos. On the other hand, long-term memorability is crucial for advertising industry, and ads are almost always highly multimodal. Therefore, we release the first memorability dataset, LAMBDA, consisting of 1749 participants and 2205 ads covering 276 brands. Running statistical tests over different participant subpopulations and ad types, we find many interesting insights into what makes an ad memorable, e.g., fast-moving ads are more memorable than those with slower scenes; people who use ad-blockers remember a lower number of ads than those who don't. Next, we present a model, Henry, to predict the memorability of a content. Henry achieves state-of-the-art performance across all prominent literature memorability datasets. It shows strong generalization performance with better results in 0-shot on unseen datasets. Finally, with the intent of memorable ad generation, we present a scalable method to build a high-quality memorable ad generation model by leveraging automatically annotated data. Our approach, SEED (Self rEwarding mEmorability Modeling), starts with a language model trained on LAMBDA as seed data and progressively trains an LLM to generate more memorable ads. We show that the generated advertisements have 44% higher memorability scores than the original ads. We release this large-scale ad dataset, UltraLAMBDA, consisting of 5 million ads. Our code and the datasets, LAMBDA and UltraLAMBDA, are open-sourced at https://behavior-in-the-wild.github.io/memorability.

| 2024

HMI Lab

Effecti-Net: A Multimodal Framework and Database for Educational Content Effectiveness Analysis

Jainendra Shukla, Deep Dwivedi, Ritik Garg, Shiva Baghel, Rushil Thareja, Ritvik Kulshrestha, Mukesh Mohania

Amid the evolving landscape of education, evaluating the impact of educational video content on students remains a challenge. Existing methods for assessment often rely on heuristics and self-reporting, leaving room for subjectivity and limited insight. This study addresses this issue by leveraging physiological sensor data to predict student-perceived content effectiveness. Within the realm of educational content evaluation, prior studies focused on conventional approaches, leaving a gap in understanding the nuanced responses of students to educational materials. To bridge this gap, our research introduces a novel perspective, building upon previous work in multimodal physiological data analysis. Our primary contributions encompass two key elements. First, we present the ’Effecti-Net’ architecture, a sophisticated deep learning model that integrates data from multiple sensor modalities, including Electroencephalogram (EEG), Eye Tracker, Galvanic Skin Response (GSR), and Photoplethysmography (PPG). Second, we introduce the ’DECEP’ dataset, a repository comprising 597 minutes of multimodal sensor data. To assess the effectiveness of our approach, we benchmark it against conventional methods. Remarkably, our model achieves a lowest MSE of 0.1651 and MAE of 0.3544 on the DECEP dataset. It offers educators and content creators a comprehensive framework that promotes the development of more engaging educational content.

Deep Learning

HMI Lab

InMDb: Indian Movie Database for Emotion Analysis

Jainendra Shukla, Ritik Garg, Rushil Thareja, Manak Bisht, Manavjeet Singh, Sarthak Arora

Cinematic experiences, characterized by intricate audio-visual stimuli, foster profound emotional engagement. However, the correlation between audience emotions, physiological responses, film genres, and ratings, particularly in the underexplored Bollywood context, remains largely uncharted. Understanding this intricate interplay can provide filmmakers valuable insights for content adaptation. Addressing this research gap, we introduce "InMDB: Indian Movie DataBase," a comprehensive multimodal dataset that examines emotional responses elicited by Bollywood trailers, using both self-reported measures and physiological data. Our meticulous statistical analysis of the dataset deepens the understanding of how emotions and their subsequent physiological responses correlate with, and potentially influence, film ratings and categories, offering novel insights into emotional engagement in the cinematic context.

Multimodal Physiological Data

AID Lab

Inclusive Medicine Packaging for the Geriatric Population: Bridging Accessibility Gaps

Mrishika Kannan Nair, Richa Gupta

The geriatric population is the largest and most consistent consumer of medications [6]. Age-related changes impacting visual and tactile acuity pose barriers to effective medication management. The primary reason for this, is the neglect of inclusive and accessible design practices in medicine strips. This research uncovers the exclusionary design of medication packaging and emphasises the imperative shift towards a more inclusive design. A mixed-method study was employed to understand the major physical and cognitive challenges faced by the elderly in medication management. Amongst the different design interventions explored, augmented reality QR tags emerged as a versatile solution, offering easy, magnified, text-to-speech content on mobile devices. To validate the proposed prototype and approach, an experiment was conducted. Our design reduced task completion time, minimised the chances of medication errors and reduced the reliance on assistance. The qualitative interview post-experiment revealed enhanced user satisfaction and ease of use. This research has illuminated the possibilities for enhancing healthcare accessibility and medication management through the thoughtful integration of technology into medicine strip design. By offering a more inclusive and user-friendly approach, the study bridges the accessibility gap, empowering individuals of all ages and abilities to manage their medications safely and effectively.

CI Lab

KaavadBits: Exploring Tangible Interactive Storytelling of Branching Narratives through a Kaavad-inspired Installation

Anmol Srivastava, Saumik Shashwat, Aditya Padmagirwar, Shivoy Arora

This work explores branching narratives through KaavadBits, a tabletop art installation embodying the kaavadiya-jajmaan or the narrator-patron perspective of kaavad baanchana, the Indian storytelling tradition of reciting the kaavad. For diegetic worldbuilding of tales from Panchatantra, a compilation of ancient Indian animal fables, the narrator takes the physical form of a tree, with which the audience interacts using tokens for a seamless multi-modal storytelling experience. Building on the related explorations, we propose a novel design that immerses the audience through choice, character and question-based interactions. We discuss the insights from a pilot user study and directions for future work. Through this paper, we aim to strike consequential tangible, technological and narrative explorations into various lesser-known traditional forms of storytelling that may inspire new interaction techniques, ultimately preserving the intangible heritages.

HMI Lab

Opacity, Transparency, and the Ethics of Affective Computing

Jainendra Shukla, Manohar Kumar, Aisha Aijaz, Omkar Chattar, Raghava Mutharaju

Human opacity is the intrinsic quality of unknowability of human beings with respect to machines. The descriptive relationship between humans and machines, which captures how much information one can gather about the other, can be explicated using an opacity-transparency relationship. This relationship allows us to describe and normatively evaluate a spectrum of opacity where humans and machines may be either opaque or transparent. In this paper, we argue that the advent of Affective Computing (AC) has begun to shift the ideal position of humans on this spectrum towards greater transparency, while much of this technology is shifting towards opacity. We explore the implications of this shift with regard to the affective information of humans and how the threat to human opacity by AC systems has various adverse repercussions, such as infringement of one's autonomy, deception, manipulation, and increased anxiety. There are also distributive consequences that expose vulnerable groups to unjustified burdens and reduce them to mere profiles. We further provide an assessment of current AC technology, which follows the descriptive relationship between humans and machines from the lens of opacity and transparency. Finally, we foresee and address three possible objections to our claims. These are the beneficence of AC systems, their relation to privacy, and their restrictive capacity to capture human affects. Through these arguments, the paper aims to bring attention to the ontological relationship between humans and machines from the perspective of opacity and transparency while emphasizing on the gravity of the ethical concerns raised by their threat to human opacity.

HCIAI

MIDAS Lab

RanLayNet: A Dataset for Document Layout Detection used for Domain Adaptation and Generalization

Rajiv R Shah, Avinash Anand, Raj Jaiswal, Mohit Gupta, Siddhesh S Bangar, Pijush Bhuyan, Naman Lal, Ritika Jha, Rajeev Singh, Shin'ichi Satoh

Large ground-truth datasets and recent advances in deep learning techniques have been useful for layout detection. However, because of the restricted layout diversity of these datasets, training on them requires a sizable number of annotated instances, which is both expensive and time-consuming. As a result, differences between the source and target domains may significantly impact how well these models function. To solve this problem, domain adaptation approaches have been developed that use a small quantity of labeled data to adjust the model to the target domain. In this research, we introduced a synthetic document dataset called RanLayNet, enriched with automatically assigned labels denoting spatial positions, ranges, and types of layout elements. The primary aim of this endeavor is to develop a versatile dataset capable of training models with robustness and adaptability to diverse document formats. Through empirical experimentation, we demonstrate that a deep layout identification model trained on our dataset exhibits enhanced performance compared to a model trained solely on actual documents. Moreover, we conduct a comparative analysis by fine-tuning inference models using both PubLayNet and IIIT-AR-13K datasets on the Doclaynet dataset. Our findings emphasize that models enriched with our dataset are optimal for tasks such as achieving 0.398 and 0.588 mAP95 score in the scientific document domain for the TABLE class.

Deep Learning

| 2023

CI Lab

A Rapid Scoping Review and Conceptual Analysis of the Educational Metaverse in the Global South: Socio-Technical Perspectives

Anmol Srivastava

This paper presents a conceptual insight into the Design of the Metaverse to facilitate educational transformation in selected developing nations within the Global South regions, e.g., India. These regions are often afflicted with socio-economic challenges but rich in cultural diversity. By utilizing a socio-technical design approach, this study explores the specific needs and opportunities presented by these diverse settings. A rapid scoping review of the scant existing literature is conducted to provide fundamental insights. A novel design methodology was formulated that utilized ChatGPT for ideation, brainstorming, and literature survey query generation. This paper aims not only to shed light on the educational possibilities enabled by the Metaverse but also to highlight design considerations unique to the Global South.

Metaverse

MIDAS Lab

A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot

Rajiv R Shah, Aanisha Bhattacharya, Yaman K Singla, Balaji Krishnamurthy, Changyou Chen

Multimedia content, such as advertisements and story videos, exhibit a rich blend of creativity and multiple modalities. They incorporate elements like text, visuals, audio, and storytelling techniques, employing devices like emotions, symbolism, and slogans to convey meaning. There is a dearth of large annotated training datasets in the multimedia domain hindering the development of supervised learning models with satisfactory performance for real-world applications. On the other hand, the rise of large language models (LLMs) has witnessed remarkable zero-shot performance in various natural language processing (NLP) tasks, such as emotion classification, question-answering, and topic classification. To leverage such advanced techniques to bridge this performance gap in multimedia understanding, we propose verbalizing long videos to generate their descriptions in natural language, followed by performing video-understanding tasks on the generated story as opposed to the original video. Through extensive experiments on fifteen video-understanding tasks, we demonstrate that our method, despite being zero-shot, achieves significantly better results than supervised baselines for video understanding. Furthermore, to alleviate a lack of story understanding benchmarks, we publicly release the first dataset on a crucial task in computational social science on persuasion strategy identification.

LLMs

HMI Lab

An Analysis of Physiological and Psychological Responses in Virtual Reality and Flat Screen Gaming

Jainendra Shukla, Ritik Vatsal, Shrivatsa Mishra, Rushil Thareja, Mrinmoy Chakrabarty, Ojaswa Sharma

Recent research has focused on the effectiveness of Virtual Reality (VR) in games as a more immersive method of interaction. However, there is a lack of robust analysis of the physiological effects between VR and flatscreen (FS) gaming. This paper introduces the first systematic comparison and analysis of emotional and physiological responses to commercially available games in VR and FS environments. To elicit these responses, we first selected four games through a pilot study of 6 participants to cover all four quadrants of the valence-arousal space. Using these games, we recorded the physiological activity, including Blood Volume Pulse and Electrodermal Activity, and self-reported emotions of 33 participants in a user study. Our data analysis revealed that VR gaming elicited more pronounced emotions, higher arousal, increased cognitive load and stress, and lower dominance than FS gaming. The Virtual Reality and Flat Screen (VRFS) dataset, containing over 15 hours of multimodal data comparing FS and VR gaming across different games, is also made publicly available for research purposes. Our analysis provides valuable insights for further investigations into the physiological and emotional effects of VR and FS gaming.

HMI Lab

An EEG-Based Computational Model for Decoding Emotional Intelligence,Personality, and Emotions

Jainendra Shukla, K. Kannadasan, Sridevi Veerasingam, B. Shameedha Begum, N. Ramasubramanian

Emotional intelligence (EI), a critical aspect of regulating emotions and behavior in daily life, holds paramount significance in both psychology research and real-world applications. Understanding and assessing EI are essential for informed decision-making, nurturing relationships, and facilitating efficient communication. As human–computer interaction (HCI) continues to evolve, there is a growing need to develop systems capable of comprehending human emotions, personality traits, and moods through recognition models. This research endeavors to explore the potential of recognizing EI in the context of effective HCI. To address this challenge, we have developed a novel computational model based on electroencephalogram (EEG) data. Our work encompasses a carefully curated EEG dataset, featuring recordings from 40 participants who were exposed to a set of 16 emotional video clips selected from distinct quadrants of the valence-arousal (VA) space. Participants’ emotional responses were meticulously annotated through self-assessment of emotional dimensions for each video stimulus. In addition, participants’ feedback on the big-five personality traits and their responses to the trait emotional intelligence questionnaire (TEIQue) served as our ground truth for further analysis. Our study includes a comprehensive correlation analysis, using Pearson correlations to establish the relationships between personality traits and EI. Furthermore, we conducted EEG-based analysis to uncover connections between EEG signals and emotional attributes. Remarkably, our analysis reveals that EEG signals excel at capturing differences in EI levels. Leveraging machine learning algorithms, we have constructed binary classification models that yield average $F1$ scores of 0.72, 0.71, and 0.62 for emotions, personality traits, and EI, respectively. These experimental outcomes underscore the potential of EEG signals in the recognition of EI, personality traits, and emotions. We envision our proposed model as a foundational element in the development of effective HCI systems, enabling a deeper and better understanding of human behavior.

HCI

HMI Lab

AttentioNet: Monitoring Student Attention Type in Learning with EEG-Based Measurement System

Jainendra Shukla, Dhruv Verma, Sejal Bhalla, S. V. Sai Santosh, Saumya Yadav, Aman Parnami

Student attention is an indispensable input for uncovering their goals, intentions, and interests, which prove to be invaluable for a multitude of research areas, ranging from psychology to interactive systems. However, most existing methods to classify attention fail to model its complex nature. To bridge this gap, we propose AttentioNet, a novel Convolutional Neural Network-based approach that utilizes Electroencephalography (EEG) data to classify attention into five states: Selective, Sustained, Divided, Alternating, and relaxed state. We collected a dataset of 20 subjects through standard neuropsychological tasks to elicit different attentional states. The average across-student accuracy of our proposed model at this configuration is 92.3% (SD=3.04), which is well-suited for end-user applications. Our transfer learning-based approach for personalizing the model to individual subjects effectively addresses the issue of individual variability in EEG signals, resulting in improved performance and adaptability of the model for real-world applications. This represents a significant advancement in the field of EEG-based classification. Experimental results demonstrate that AttentioNet outperforms a popular EEGnet baseline (p-value < 0.05) in both subject-independent and subject-dependent settings, confirming the effectiveness of our proposed approach despite the limitations of our dataset. These results highlight the promising potential of AttentioNet for attention classification using EEG data.

Deep LearningHCI

MIDAS Lab

Emotionally Enhanced Talking Face Generation

Rajiv R Shah, Sahil Goyal, Sarthak Bhagat, Shagun Uppal, Hitkul Jangra, Yi Yu, Yifang Yin

Several works have developed end-to-end pipelines for generating lip-synced talking faces with real-world applications, such as teaching and language translation in videos. However, these prior works fail to create realistic-looking videos since they focus little on people's expressions and emotions. Moreover, these methods' effectiveness largely depends on the faces in the training dataset, which means they may not perform well on unseen faces. To mitigate this, we build a talking face generation framework conditioned on a categorical emotion to generate videos with appropriate expressions, making them more realistic and convincing. With a broad range of six emotions, i.e., happiness, sadness, fear, anger, disgust, and neutral, we show that our model can adapt to arbitrary identities, emotions, and languages. Our proposed framework has a user-friendly web interface with a real-time experience for talking face generation with emotions. We also conduct a user study for subjective evaluation of our interface's usability, design, and functionality.

HCI

HMI Lab

EngageMe: Assessing Student Engagement in Online Learning Environment Using Neuropsychological Tests

Jainendra Shukla, Saumya Yadav, Momin Naushad Siddiqui

In the proposed research, we investigated whether the standardized neuropsychological tests commonly used to assess attention can be used to measure students’ engagement in online learning settings. Accordingly, we employed 73 students in three clinically relevant neuropsychological tests to assess three types of attention. Students’ engagement performance, as evidenced by their facial video, was also annotated by three independent annotators. The manual annotations observed a high level of inter-annotator reliability (Krippendorffs’ Alpha of 0.864). Further, by obtaining a correlation value of 0.673 (Spearmans’ Rank Correlation) between manual annotation and neuropsychological tests score, our results show construct validity to prove neuropsychological test scores’ significance as a latent variable for measuring students’ engagement. Finally, using non-intrusive behavioral cues, including facial action unit and eye gaze data collected via webcam, we propose a machine learning method for engagement analysis in online learning settings, achieving a low mean squared error value (0.022). The findings suggest a neuropsychological test-based machine learning technique could effectively assess students’ engagement in online education.

Machine Learning

ETiDM Lab

Finite element analysis results visualization of manual metal arc welding using an interactive mixed reality-based user interface

Kalpana Shankhwar, Shana Smith

Welding is extensively used in manufacturing industries for various applications. However, residual stress is induced due to the non-uniform temperature distribution on the weld plates during the welding process, which significantly affects the fatigue strength. In addition, the non-uniform expansion and contraction of the weld and surrounding base metal cause structural distortion. The distortion affects final product quality and results in lower productivity. Therefore, the structural analysis of the welded component is significantly important. In this work, a mixed reality (MR)-based user interface was developed to overlay the finite element analysis (FEA) results on the real weld plates in real time for manual metal arc welding (MMAW). Since the numerical simulation using FEA requires a large number of computational resources, a gradient boosted regression tree (GBRT) model was trained to predict the residual stress and deformation results. Furthermore, a lookup table and a trilinear interpolation method were used to render the results based on users' input data using Microsoft HoloLens 2 in real time. The developed interactive MR-based user interface can help welders quickly predict and control the residual stress and welding distortion before the real welding process and help novices learn the relationship between the welding parameters and the induced residual stress and deformation.

MRMachine Learning

MIDAS Lab

Hindi Chatbot for Supporting Maternal and Child Health Related Queries in Rural India

Rajiv R Shah, Ritwik Mishra, Simranjeet Singh, Jasmeet Kaur, Pushpendra Singh

In developing countries like India, doctors and healthcare professionals working in public health spend significant time answering health queries that are fact-based and repetitive. Therefore, we propose an automated way to answer maternal and child health-related queries. A database of Frequently Asked Questions (FAQs) and their corresponding answers generated by experts is curated from rural health workers and young mothers. We develop a Hindi chatbot that identifies k relevant Question and Answer (QnA) pairs from the database in response to a healthcare query (q) written in Devnagri script or Hindi-English (Hinglish) code-mixed script. The curated database covers 80% of all the queries that a user of our study is likely to ask. We experimented with (i) rule-based methods, (ii) sentence embeddings, and (iii) a paraphrasing classifier, to calculate the q-Q similarity. We observed that paraphrasing classifier gives the best result when trained first on an open-domain text and then on the healthcare domain. Our chatbot uses an ensemble of all three approaches. We observed that if a given q can be answered using the database, then our chatbot can provide at least one relevant QnA pair among its top three suggestions for up to 70% of the queries.

NLP

MIDAS Lab

Meta Perturbed Re-Id Defense

Rajiv R Shah, A V Subramanyam, Mohammad Ali Jauhar, Divij Gera, Astha Verma

Adversarial attacks have gained significant attention in object re-identification (Re-Id). However, very few works target defense, and they primarily adopt adversarial training. While adversarial training has been shown to be a major line of defense, we observe that vanilla adversarial training alone does not provide a robust defense against adversarial attacks. Towards this, we propose a novel meta perturbed defense algorithm for Re-Id task. Our contributions are, (i) we introduce anisotropic and isotropic perturbations to design a stochastic neural network, and train it with a novel meta-learning strategy with tasks as vanilla, perturbed, and perturbed-adversarial training; (ii) we show the generalizability of our model against various unseen attacks; and (iii) we derive a novel feature covariance alignment (FCA) loss which gives us a high clean performance while providing robustness against different attacks. Extensive experiments on Market-1501, MSMT17 and VeRi-776 reveal SOTA performance.

Stochastic Neural Networks

HMI Lab

Multi-source transfer learning for facial emotion recognition using multivariate correlation analysis

Jainendra Shukla, Ashwini B, Arka Sarkar, Pruthivi Raj Behera

Deep learning techniques have proven to be effective in solving the facial emotion recognition (FER) problem. However, it demands a significant amount of supervision data which is often unavailable due to privacy and ethical concerns. In this paper, we present a novel approach for addressing the FER problem using multi-source transfer learning. The proposed method leverages the knowledge from multiple data sources of similar domains to inform the model on a related task. The approach involves the optimization of aggregate multivariate correlation among the source tasks trained on the source dataset, thus controlling the transfer of information to the target task. The hypothesis is validated on benchmark datasets for facial emotion recognition and image classification tasks, and the results demonstrate the effectiveness of the proposed method in capturing the group correlation among features, as well as being robust to negative transfer and performing well in few-shot multi-source adaptation. With respect to the state-of-the-art methods MCW and DECISION, our approach shows an improvement of 7% and 15% respectively.

Facial Emotion Recognition (FER)

HMI Lab

SPASHT: Semantic and Pragmatic Speech Features for Automatic Assessment of Autism

Jainendra Shukla, Vrinda Narayan, Ashwini B

Language and communication impairments are considered one of the core features of autism spectrum disorder (ASD). Quantifying the language atypicalities in autism is a challenging task. Prior works have explored acoustic, and text-based features to assess children’s language and communicative behaviours and have shown their relevance in the diagnosis of autism. In this work, we explore the semantic and pragmatic language features in children with autism (CwA) to understand their significance in the diagnosis of autism. We use natural language processing (NLP) and machine learning (ML) techniques to automatically extract relevant features and detect the existence of speech behaviours such as echolalia, semantic coherence, repetitive language, etc. We further analyse their correlation with the clinical diagnosis of autism. We conducted validation experiments on the transcripts of 76 children (35 ASD and 41 TD) extracted from the CHILDES databank. Our analysis shows that the semantic and pragmatic language features are representative candidates for autism diagnosis and are found to complement the syntactic and lexical features in the classification of CwA with an accuracy of 94%. Further, these features being more coherent and relatable to the standard diagnostic tools improves the interpretability of the diagnostic predictions made using speech signals.

NLPMachine Learning

CI Lab

Safar: Heuristics for Augmented Reality Integration in Cultural Heritage

Anmol Srivastava, Cyrus Monteiro, Ipsita Rajasekar, Prakhar Bhargava

This research explores the integration of Augmented Reality (AR) technology to enhance historical site exploration, with a particular focus on the Indian context. The primary objective is to establish comprehensive design guidelines that seamlessly blend AR and cultural heritage, ultimately enriching the heritage tourism experience. Through a case study approach centred around a prominent historical site, the research utilises co-creation sessions, alongside thorough primary and secondary research practices, to analyse user interactions with AR and Virtual Reality (VR) tools. While relying primarily on qualitative insights, this study uncovers the potential of AR in heightening the heritage encounter and bridging the gap between traditional narratives and contemporary technology. By distilling findings from these methodologies, the paper contributes practical and informed design guidelines for effectively integrating AR within cultural sites. The outcomes of this research provide valuable insights for scholars, practitioners, and enthusiasts seeking to navigate the evolving landscape of technology-driven heritage tourism.

| 2022

ETiDM Lab

A review of heat source and resulting temperature distribution in arc welding

Kalpana Shankhwar, Ankit Das, Arvind Kumar, Nenad Gubeljak

Thermal analysis is one of the cardinal studies essential for arc welding processes. Thermal field and temperature distribution in arc welds affect the quality of welds as they govern the microstructural and thermo-mechanical properties. Therefore, thorough understanding of the thermal behaviour in arc welds is an absolute necessity. Significant efforts have been made in the past to determine the temperature field associated with arc welding. However, for accurate determination of the temperature field/distribution, it is necessary to understand the heat source which influences the temperature distribution in welds. Rosenthal reported the first concept of modelling the heat source, which was then improvised and new models have been instituted through the years. This review article summarizes a collective study made on the heat source and the resulting temperature distribution in arc welds. Numerous methods have been developed to conduct transient temperature distribution studies on arc welds. Analytical approaches with constant material properties, numerical approaches with variable material properties, infrared imaging systems, machine vision systems with soft computing, etc. have been developed to facilitate understanding of transient temperature in arc welds. We first summarize heat source studies followed by literatures on various techniques and methods devoted to transient temperature investigations. Eventually, latest methods used for thermal studies, such as image processing, machine learning and intelligent systems are summarized and discussed.

AID Lab

Perceiving Sequences and Layouts through Touch" has been accepted at IEEE Eurohaptics Conference 2022

Richa Gupta

Effective design of tactile graphics demands an in-depth investigation of perceptual foundations of exploration through touch. This work investigates primitives in tactile perception of spatial arrangements (i.e. sequences and layouts). Two experiments using tiles with different tactile shapes were arranged in a row on a tabletop or within a 5x5 grid board. The goal of the experiments was to determine whether certain positions offered perceptual salience. The results indicate that positional primitives exist (e.g. corners, field edges and first and last positions in sequences), and these reinforce memory of spatial relationships. These inferences can influence effective tactile graphic design as well as design of inclusive and multi-modal interfaces/experiences.