Title: Akash Chaudhary, Manshul Belani, Naman Maheshwari, and Aman Parnami. 2021. Verbose : Designing a Context-based Educational System for Improving Communicative Expressions. In Proceedings of the 23rd International Conference on Mobile Human-Computer Interaction (MobileHCI '21). Association for Computing Machinery, New York, NY, USA, Article 41, 1–13.
DOI: https://doi.org/10.1145/3447526.3472057
Abstract: ESL (English as a second language) speakers tend to follow the tone structure of their first language, making their speech difficult to understand for native speakers, thereby limiting their opportunities for education and employment. To address this problem, we build an interactive smartphone-based educational mobile application using the user-centered design process. This application teaches English intonations based on globally consistent pitch patterns through conversations with a trained chat assistant, which inculcates expert linguists’ teaching principles. After co-designing the application’s parameters with primary stakeholders and expert visual designers, we assess its effectiveness by measuring the pre and post-performance of the users after the system usage, using various quantitative measures, like intonation scores, SEQ, and SUS. Feedback from users suggests that ESL speakers find significant improvement in the perception of their vocal expressions, thereby highlighting the necessity of such a system in improving the quality of conversations that people have in general.
Title: Dhruv Verma, Sejal Bhalla, Dhruv Sahnan, Jainendra Shukla, and Aman Parnami. 2021. ExpressEar: Sensing Fine-Grained Facial Expressions with Earables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 3, Article 129 (Sept 2021), 28 pages.
DOI: https://doi.org/10.1145/3478085
Abstract: Continuous and unobtrusive monitoring of facial expressions holds tremendous potential to enable compelling applications in a multitude of domains ranging from healthcare and education to interactive systems. Traditional, vision-based facial expression recognition (FER) methods, however, are vulnerable to external factors like occlusion and lighting, while also raising privacy concerns coupled with the impractical requirement of positioning the camera in front of the user at all times. To bridge this gap, we propose ExpressEar, a novel FER system that repurposes commercial earables augmented with inertial sensors to capture fine-grained facial muscle movements. Following the Facial Action Coding System (FACS), which encodes every possible expression in terms of constituent facial movements called Action Units (AUs), ExpressEar identifies facial expressions at the atomic level. We conducted a user study (N=12) to evaluate the performance of our approach and found that ExpressEar can detect and distinguish between 32 Facial AUs (including 2 variants of asymmetric AUs), with an average accuracy of 89.9% for any given user. We further quantify the performance across different mobile scenarios in presence of additional face-related activities. Our results demonstrate ExpressEar's applicability in the real world and open up research opportunities to advance its practical adoption.
Title: Arpit Bhatia, Dhruv Kundu, Suyash Agarwal, Varnika Kairon, and Aman Parnami. 2021. Soma-noti: Delivering Notifications Through Under-clothing Wearables. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 221, 1–8.
DOI: https://doi.org/10.1145/3411764.3445123
Abstract: Different form factors of wearable technology provide unique opportunities for output based on how they are connected to the human body. In this work, we investigate the idea of delivering notifications through devices worn on the underside of a user’s clothing. A wearable worn in such a manner is in direct contact with the user’s skin. We leverage this proximity to test the performance of 10 on-skin sensations (Press, Poke, Pinch, Heat, Cool, Blow, Suck, Vibrate, Moisture and Brush) as methods of notification delivery. We developed prototypes for each stimulus and conducted a user study to evaluate them across 6 locations commonly covered by upper body clothing. Results indicate significant differences in reaction time, error rates and comfort which may influence the design of future under-clothing wearables.
Title: Singh A., Eden G. (2021) Hanging Out Online: Social Life During the Pandemic. In: Ardito C. et al. (eds) Human-Computer Interaction – INTERACT 2021. INTERACT 2021. Lecture Notes in Computer Science, vol 12933. Springer, Cham
DOI: https://doi.org/10.1007/978-3-030-85616-8_2
Abstract: In March 2020, the government of India ordered a nationwide lockdown to prevent the spread of Covid-19. This led to the shutdown of educational institutes throughout the country, restricting all activities to online mediums. The shift has affected how students engage with each other, where rather than in-person interaction, they meet through a variety of online tools. In this paper, we discuss how the normal everyday routine of ‘hanging out’ with friends has been transformed during a prolonged lockdown of over ten months and counting. We investigate the opportunities and challenges students encounter when socializing online through various online modes including video calls, communal movie watching and social media. We discuss how social interaction; in particular, hanging out with friends has been transformed through these technologies and its implications for facilitating spontaneous interaction, negotiating intimacy, mutual understanding, and accessibility to different social groups. Finally, we conclude with a discussion of how these factors impact the transition from in-person to online modes of casual social interaction.
Title: Sumita Sharma, Netta Iivari, Marianne Kinnula, Grace Eden, Alipta Ballav, Rocio Fatas, Ritwik Kar, Deepak Ranjan Padhi, Vahid Sadeghie, Pratiti Sarkar, Riya Sinha, Rucha Tulaskar, and Nikita Valluri. 2021. From Mild to Wild: Reimagining Friendships and Romance in the Time of Pandemic Using Design Fiction. In Designing Interactive Systems Conference 2021 (DIS '21). Association for Computing Machinery, New York, NY, USA, 64–77.
DOI: https://doi.org/10.1145/3461778.3462110
Abstract: With the forced reboot of our lives due to the COVID-19 pandemic, our interpersonal relationships are nowhere yet everywhere. However, opportunities for initiating or maintaining friendships and romance in the physical world have dwindled. Within the context of India where multiple realities exist, the question arises – what is the future of these relationships? In this paper, we present the outcomes of a workshop looking at the future of relationships using design fiction. Participants worked in small teams to create scenarios that critically consider the future of love, friendships, and romance within the Indian context. Through the lenses of criticality, empowerment, and value creation, we examine the design scenarios and the design process including criticality of the designs, empowering experiences of the participants, and the perceived value gained from participating in such a workshop. Our findings indicate the potential of design fiction to allow participants to step out of their comfort zone into a critical stance in discussing love and intimacy. Based upon our findings, we discuss implications for design research, practice, and education.
Title: Marianne Kinnula, Netta Iivari, Sumita Sharma, Grace Eden, Markku Turunen, Krishnashree Achuthan, Prema Nedungadi, Tero Avellan, Biju Thankachan, and Rucha Tulaskar. 2021. Researchers’ Toolbox for the Future: Understanding and Designing Accessible and Inclusive Artificial Intelligence (AIAI). In Academic Mindtrek 2021 (Mindtrek 2021). Association for Computing Machinery, New York, NY, USA, 1–4.
DOI: https://doi.org/10.1145/3464327.3464965
Abstract: As Artificial Intelligence (AI) is integrated into all things technical, there is a valid concern over its lack of diversity, inclusiveness, and accessibility. Further, questions such as what it means for AI to be accessible and inclusive, why is inclusive AI required, and how can it be achieved, is an emerging area of research. In this two-part workshop, we will explore the nuanced challenges towards Accessible and Inclusive AI together with participants with diverse backgrounds. First, we will collaboratively define Accessible and Inclusive AI (AIAI), building on the diverse experiences of the participants and moderators. The goal is to contribute to the formulation of a shared vision for Accessibility and AI as well as identify the challenges and opportunities towards realizing this vision. Working in small teams, participants will collaboratively conceptually design a future scenario for AIAI or critically analyse an example solution. The aim is for teams to tackle tough questions related to what it means for AI to be accessible and inclusive, while addressing algorithmic biases and limitations of AI, in addition to opportunities for overcoming them in the future. Finally, teams will present their conceptual designs and scenarios in the larger group. Overall, the workshop will ignite innovative, and even provocative, ideas and future scenarios, building towards an inclusive and accessible AI.
Title: Arpit Bhatia, Aneesha Lakra, Rakshita Anand, and Grace Eden. 2021. An Analysis of Ludo Board Game Play on Smartphones. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (CHI EA '21). Association for Computing Machinery, New York, NY, USA, Article 396, 1–6.
DOI: https://doi.org/10.1145/3411763.3451728
Abstract: From sports to party games, almost every kind of game has been adapted into a digital video game format. While previous research has studied player motivations and experiences for certain categories of digital games, there has yet to be such a study on digital board games, especially in the modern context of smartphone apps. To address this, we conduct a case study of a popular board game, Ludo, to understand players’ opinions of its digital adaptation. For this, we study the functionality and user reviews of nine popular Ludo apps, to assess player opinions of how traditional gameplay has been re-imagined. Based upon our analysis, we conclude with recommendations for improving Ludo apps and other apps, based on random chance board games.
Title: Noura Howell, Britta F. Schulte, Amy Twigger Holroyd, Rocío Fatás Arana, Sumita Sharma, and Grace Eden. 2021. Calling for a Plurality of Perspectives on Design Futuring: An Un-Manifesto. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (CHI EA '21). Association for Computing Machinery, New York, NY, USA, Article 31, 1–10.
DOI: https://doi.org/10.1145/3411763.3450364
Abstract: The Futures Cone, a prominent model in design futuring, is useful for promoting discussions about possible, plausible, probable, and preferable futures. Yet this model has limitations, such as representing diverse human experiences as a singular point of “the present” and implicitly embedding notions of linear progress. Responding to this, we argue that a plurality of perspectives is needed to engage imaginations that depict a diverse unfolding of potential futures. Through reflecting on our own cultural and professional backgrounds, we offer five perspectives for design futuring as a contribution to this plurality: Parallel Presents, “I Am Time”, Epithelial Metaphors, the Uncertainties Cone, and Meet (with) “Speculation”. These perspectives open alternative approaches to design futuring, move outside prevalent notions of technological progress, and foreground interdependent, relational agencies.
Title: Dhruv Verma, Sejal Bhalla, Dhruv Sahnan, Jainendra Shukla, and Aman Parnami. 2021. ExpressEar: Sensing Fine-Grained Facial Expressions with Earables. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 5, 3, Article 129 (Sept 2021), 28 pages.
DOI: https://doi.org/10.1145/3478085
Abstract: Continuous and unobtrusive monitoring of facial expressions holds tremendous potential to enable compelling applications in a multitude of domains ranging from healthcare and education to interactive systems. Traditional, vision-based facial expression recognition (FER) methods, however, are vulnerable to external factors like occlusion and lighting, while also raising privacy concerns coupled with the impractical requirement of positioning the camera in front of the user at all times. To bridge this gap, we propose ExpressEar, a novel FER system that repurposes commercial earables augmented with inertial sensors to capture fine-grained facial muscle movements. Following the Facial Action Coding System (FACS), which encodes every possible expression in terms of constituent facial movements called Action Units (AUs), ExpressEar identifies facial expressions at the atomic level. We conducted a user study (N=12) to evaluate the performance of our approach and found that ExpressEar can detect and distinguish between 32 Facial AUs (including 2 variants of asymmetric AUs), with an average accuracy of 89.9% for any given user. We further quantify the performance across different mobile scenarios in presence of additional face-related activities. Our results demonstrate ExpressEar's applicability in the real world and open up research opportunities to advance its practical adoption.
Title: K. Rana, R. Madaan and J. Shukla, "Effect of Polite Triggers in Chatbot Conversations on User Experience across Gender, Age, and Personality," 2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN), 2021, pp. 813-819
DOI: 10.1109/RO-MAN50785.2021.9515528.
Abstract: Chatbots are one of the emerging intelligent systems which interact with customers to solve different queries in a wide range of domain areas. During social interaction, politeness plays a vital role in achieving effective communication. Consequently, it becomes essential to understand how a chatbot’s politeness affects user experience during the interaction. To understand it, we conducted a between-subject user study with two chatbots where one of the chatbots employs polite triggers, and the other one replies intending to answer the queries. To introduce politeness in normal chatbot responses, we used the state-of-the-art tag and generate approach. We first analyzed how different personality traits influence the response of individual persons to polite triggers. In addition, we also investigated the effects of polite triggers among different genders and age groups using a cross-sectional analysis.
Title: B. Ashwini, V. Narayan, A. Bhatia and J. Shukla, "Responsiveness towards robot-assisted interactions among pre-primary children of Indian ethnicity," 2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN), 2021, pp. 619-625.
DOI: 10.1109/RO-MAN50785.2021.9515520.
Abstract: Today’s world is undeniably technology-driven and children are the ones who adopt technology with ease. This fact could be leveraged to design assistive technologies for children using social robots as they are observed to be efficient pedagogical agents. Robotic technology is evolving rapidly and robots are designed to play social roles in education, health care and home assistance. However, there has been limited research focusing on the use of robotic technologies for designing interactions with children in the global south, owing to which the response behaviour towards robot-assisted interventions are unknown. To address this gap, we conducted a study to understand the response behaviour of Indian children of the age 3-6 years towards robot-assisted interventions during directive tasks. Our analysis shows that the children could follow to robot’s instructions during the tasks and complete the tasks successfully. The exploratory outcomes also highlight the acceptance and benefits of using robotic assistants as a facilitator in education, cognitive therapies and healthcare.
Title: Singhal A., Goyal M., Shukla J., Mutharaju R. (2021) Feature Fused Human Activity Recognition Network (FFHAR-Net). In: Stephanidis C., Antona M., Ntoa S. (eds) HCI International 2021 - Posters. HCII 2021. Communications in Computer and Information Science, vol 1420. Springer, Cham.
DOI: https://doi.org/10.1007/978-3-030-78642-7_72
Abstract: With the advances in smart home technology and Internet of Things (IoT), there has been keen research interest in human activity recognition to allow service systems to understand human intentions. Recognizing human objectives by these systems without user intervention, results in better service, which is crucial to improve the user experience. Existing research approaches have focused primarily on probabilistic methods like Bayesian networks (for instance, the CRAFFT algorithm). Though quite versatile, these probabilistic models may be unable to successfully capture the possibly complex relationships between the input variables. To the best of our knowledge, a statistical study of features in a human activity recognition task, their relationships, etc., has not yet been attempted. To this end, we study the domain of human activity recognition to improve the state-of-the-art and present a novel neural network architecture for the task. It employs early fusion on different types of minimalistic features such as time and location to make extremely accurate predictions with a maximum micro F1-score of 0.98 on the Aruba CASAS dataset. We also accompany the model with a comprehensive study of the features. Using feature selection techniques like Leave-One-Out, we rank the features according to the information they add to deep learning models and make further inferences using the ranking obtained. Our empirical results show that the feature Previous Activity Performed is the most useful of all, surprisingly even more than time (the basis of activity scheduling in most societies). We use three Activities of Daily Living (ADL) datasets in different settings to empirically demonstrate the utility of our architecture. We share our findings along with the models and the source code.
Title: Vidit Jain, Maitree Leekha, Rajiv Ratn Shah, and Jainendra Shukla. 2021. Exploring Semi-Supervised Learning for Predicting Listener Backchannels. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 395, 1–12.
DOI: https://doi.org/10.1145/3411764.3445449
Abstract: Developing human-like conversational agents is a prime area in HCI research and subsumes many tasks. Predicting listener backchannels is one such actively-researched task. While many studies have used different approaches for backchannel prediction, they all have depended on manual annotations for a large dataset. This is a bottleneck impacting the scalability of development. To this end, we propose using semi-supervised techniques to automate the process of identifying backchannels, thereby easing the annotation process. To analyze our identification module’s feasibility, we compared the backchannel prediction models trained on (a) manually-annotated and (b) semi-supervised labels. Quantitative analysis revealed that the proposed semi-supervised approach could attain 95% of the former’s performance. Our user-study findings revealed that almost 60% of the participants found the backchannel responses predicted by the proposed model more natural. Finally, we also analyzed the impact of personality on the type of backchannel signals and validated our findings in the user-study.
Title: Deepika Yadav, Prerna Malik, Kirti Dabas, and Pushpendra Singh. 2021. Illustrating the Gaps and Needs in the Training Support of Community Health Workers in India. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 231, 1–16.
DOI: https://doi.org/10.1145/3411764.3445111
Abstract: In India and other developing countries, Community Health Workers (CHWs) provide the first line of care in delivering necessary maternal and child health services. In this work, we assess the training and skill-building needs of CHWs, through a mobile-based training intervention deployed for six months to 500 CHWs for conducting 144 training sessions in rural India. We qualitatively probed 1178 questions, asked by CHWs, during training sessions, and conducted a content analysis of the learning material provided to CHWs. Further, we interviewed 48 CHWs to understand the rationale of information seeking and perceptions of training needs. We present our understanding of the knowledge gaps of CHWs and how the current learning material and training methods are ineffective in addressing it. Our study presents design implications for HCI4D researchers for mobile learning platforms targeted towards CHWs. We also provide policy-level suggestions to improve the training of CHWs in India or a similar context.
Title: Ayushi Srivastava, Shivani Kapania, Anupriya Tuli, and Pushpendra Singh. 2021. Actionable UI Design Guidelines for Smartphone Applications Inclusive of Low-Literate Users. Proc. ACM Hum.-Comput. Interact. 5, CSCW1, Article 136 (April 2021), 30 pages.
DOI: https://doi.org/10.1145/3449210
Abstract: With easy access to affordable internet-powered smartphones, developing countries are adopting smartphone applications to provide enabling services to its citizens, through eHealth, eGovernance, and digital payments. The challenge is to ensure equitable access to these services by everyone, including people with semi-literacy or low-literacy who form a large part of the population in developing countries. However, extensive HCI literature has identified literacy as one of the barriers to designing user interfaces. In this work, we propose a framework of actionable guidelines for designing smartphone UIs that would be usable by low-literate users. We reviewed the last two decades of HCI literature engaging people with low literacy, to synthesize our framework-designing SARAL. To evaluate the framework, we conducted a preliminary study with a group of 20 practitioners and researchers working in the field of UI/UX/HCI. We also analyzed six publicly available industry reports on designing UIs for people with low-literacy. The proposed guidelines intend to support researchers, practitioners, designers, and implementers in the design and evaluation of UIs of smartphone applications for people with low literacy. We present the evolutionary nature of the proposed framework while highlighting the importance of adopting a translational approach when building such frameworks.
Title: Ayushi Srivastava, Shivani Kapania, Anupriya Tuli, and Pushpendra Singh. 2021. Actionable Guidelines for Mobile Applications Inclusive of Low-Literate Users. 30 pages. InProceedings of the ACM on Human-Computer Interaction.CSCW’ 2021. (CORE A, ranked number 2 venue by Google Scholar in HCI area with h-median 86).
DOI: https://doi.org/10.1145/3449210
Abstract: With easy access to affordable internet-powered smartphones, developing countries are adopting smartphone applications to provide enabling services to its citizens, through eHealth, eGovernance, and digital payments. The challenge is to ensure equitable access to these services by everyone, including people with semi-literacy or low-literacy who form a large part of the population in developing countries. However, extensive HCI literature has identified literacy as one of the barriers to designing user interfaces. In this work, we propose a framework of actionable guidelines for designing smartphone UIs that would be usable by low-literate users. We reviewed the last two decades of HCI literature engaging people with low literacy, to synthesize our framework—designing SARAL. To evaluate the framework, we conducted a preliminary study with a group of 20 practitioners and researchers working in the field of UI/UX/HCI. We also analyzed six publicly available industry reports on designing UIs for people with low-literacy. The proposed guidelines intend to support researchers, practitioners, designers, and implementers in the design and evaluation of UIs of smartphone applications for people with low literacy. We present the evolutionary nature of the proposed framework while highlighting the importance of adopting a translational approach when building such frameworks
Title: Deepika Yadav, Prerna Malik, Kirti Dabas, and Pushpendra Singh. 2021. Illustrating the gaps and Needs in the Training Support of Community Health Workers in India. 30 pages.In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems.CHI’2021. (CORE A*, ranked number 1 venue by Google Scholar in HCI area with h-median122 and also one of the top 10 Computer Science Conferences)
DOI: http://repository.iiitd.edu.in/xmlui/handle/123456789/932
Abstract: Community health workers (CHWs) in low- and middle-income countries play a vital role in public healthcare. CHWs particularly assist in improving maternal and child health conditions of the poor and vulnerable who often remain unaware of the available services and face socio-cultural barriers in accessing the health services. India, which is still undergoing a burden of high child mortality, implements its CHW program as a flagship program with close to a million CHWs appointed across its states. However, under-training significantly limits the ability of CHWs to provide quality services. We address the training problem in India by designing and deploying low-cost mobile training tools that can complement the existing face-to-face training mechanisms. Our system adopts a hybrid architecture to use Interactive Voice Response for facilitating online audio training sessions. Thus, allowing CHWs to access training from anywhere through their feature phones, a key need that has been well recognized by HCI4D research. We contribute on the following aspects: (1) Testing the feasibility and efficacy of our training tool through a controlled field experiment (2) Unpacking the training needs of CHWs by analyzing a question and answer record of 1178 and mapping it back to the existing reference material through a large-scale deployment on 500 CHWs, (3) Investigating the potential for peer-to-peer learning models to address the challenge of experts availability through a controlled field experiment, and (4) Finally, exploring the potential for automated techniques in this domain by proposing a semi-automated NLP approach for curating generated learning content and exposing CHWs and women to Chabot-based education for the first time. By using a range of mixed methods and field experiments, this dissertation expands the focus of HCI4D and mHealth research on CHWs competence development in low-resource settings, an area that has long been neglected.
Title: Sharma, D., Kumar, B., Chand, S. et al. A Trend Analysis of Significant Topics Over Time in Machine Learning Research. SN COMPUT. SCI. 2, 469 (2021).
DOI: https://doi.org/10.1007/s42979-021-00876-2
Abstract: A vast number of research papers on numerous topics publish every year in different conferences and journals. Thus, it is difficult for new researchers to identify research problems and topics manually, which research community is currently focusing on. Since such research problems and topics help researchers to be updated with new topics in research, it is essential to know trends in research based on topic significance over time. Therefore, in this paper, we propose a method to identify the trends in machine learning research based on significant topics over time automatically. Specifically, we apply a topic coherence model with latent Dirichlet allocation (LDA) to evaluate the optimal number of topics and significant topics for a dataset. The LDA model results in topic proportion over documents where each topic has its probability (i.e., topic weight) related to each document. Subsequently, the topic weights are processed to compute average topic weights per year, trend analysis using rolling mean, topic prevalence per year, and topic proportion per journal title. To evaluate our method, we prepare a new dataset comprising of 21,906 scientific research articles from top six journals in the area of machine learning published from 1988 to 2017. Extensive experimental results on the dataset demonstrate that our technique is efficient, and can help upcoming researchers to explore the research trends and topics in different research areas, say machine learning.
Title: Parekh, S., Kumar, Y. S., Singh, S., Chen, C., Krishnamurthy, B., & Shah, R. R. (2021). MINIMAL: Mining Models for Data Free Universal Adversarial Triggers. arXiv preprint arXiv:2109.12406.
Link: arXiv:2109.12406
Abstract: A vast number of research papers on numerous topics publish every year in different conferences and journals. Thus, it is difficult for new researchers to identify research problems and topics manually, which research community is currently focusing on. Since such research problems and topics help researchers to be updated with new topics in research, it is essential to know trends in research based on topic significance over time. Therefore, in this paper, we propose a method to identify the trends in machine learning research based on significant topics over time automatically. Specifically, we apply a topic coherence model with latent Dirichlet allocation (LDA) to evaluate the optimal number of topics and significant topics for a dataset. The LDA model results in topic proportion over documents where each topic has its probability (i.e., topic weight) related to each document. Subsequently, the topic weights are processed to compute average topic weights per year, trend analysis using rolling mean, topic prevalence per year, and topic proportion per journal title. To evaluate our method, we prepare a new dataset comprising of 21,906 scientific research articles from top six journals in the area of machine learning published from 1988 to 2017. Extensive experimental results on the dataset demonstrate that our technique is efficient, and can help upcoming researchers to explore the research trends and topics in different research areas, say machine learning.
Title: Kabra, A., Bhatia, M., Kumar, Y., Li, J. J., Jin, D., & Shah, R. R. (2021). Calling Out Bluff: Evaluation Toolkit For Robustness Testing Of Automatic Essay Scoring Systems.
Link: arXiv:2007.06796
Abstract: Automatic scoring engines have been used for scoring approximately fifteen million test takers in just the last three years. This number is increasing further due to COVID-19 and the associated automation of education and testing. Despite such wide usage, the AI based testing literature of these ‘intelligent’ models is highly lacking. Most of the papers proposing new models rely only on qudratic weighted kappa (QWK) based agreement with human raters for showing model efficacy. However, this effectively ignores the highly multi-feature nature of essay scoring. Essay scoring depends on features like coherence, grammar, relevance, sufficiency, vocabulary, etc., and till date, there has been no study testing Automated Essay Scoring (AES) systems holistically on all these features. With this motivation, we propose a model agnostic adversarial evaluation scheme and associated metrics for AES systems to test their natural language understanding capabilities and overall robustness. We evaluate the current state-of-the-art AES models using the proposed scheme and report the results on five recent models. These models range from feature-engineering based approaches to the latest deep learning algorithms. We find that AES models are highly overstable such that even heavy modifications (as much as 25%) with content unrelated to the topic of the questions does not decrease the score produced by the models. On the other hand, unrelated content, on average, increases the scores, thus showing that the models’ evaluation strategy and rubrics should be reconsidered. We also ask 200 human raters to score both an original and adversarial response to see if humans are able to detect differences between the two and whether they agree with the scores assigned by autoscorers.
Title: Singla, Y. K., Parekh, S., Singh, S., Li, J. J., Shah, R. R., & Chen, C. (2021). AES Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses. arXiv preprint arXiv:2109.11728.
DOI: arXiv:2109.11728
Abstract: Deep-learning based Automatic Essay Scoring (AES) systems are being actively used by states and language testing agencies alike to evaluate millions of candidates for life-changing decisions ranging from college applications to visa approvals. However, little research has been put to understand and interpret the black-box nature of deep-learning based scoring algorithms. Previous studies indicate that scoring models can be easily fooled. In this paper, we explore the reason behind their surprising adversarial brittleness. We utilize recent advances in interpretability to find the extent to which features such as coherence, content, vocabulary, and relevance are important for automated scoring mechanisms. We use this to investigate the oversensitivity (i.e., large change in output score with a little change in input essay content) and overstability (i.e., little change in output scores with large changes in input essay content) of AES. Our results indicate that autoscoring models, despite getting trained as “end-to-end” models with rich contextual embeddings such as BERT, behave like bag-of-words models. A few words determine the essay score without the requirement of any context making the model largely overstable. This is in stark contrast to recent probing studies on pre-trained representation learning models, which show that rich linguistic features such as parts-of-speech and morphology are encoded by them. Further, we also find that the models have learnt dataset biases, making them oversensitive. The presence of a few words with high co-occurence with a certain score class makes the model associate the essay sample with that score. This causes score changes in ∼95% of samples with an addition of only a few words. To deal with these issues, we propose detection-based protection models that can detect oversensitivity and overstability causing samples with high accuracies. We find that our proposed models are able to detect unusual attribution patterns and flag adversarial samples successfully.
Title: Singla, Y. K., Gupta, A., Bagga, S., Chen, C., Krishnamurthy, B., & Shah, R. R. (2021). Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring. arXiv preprint arXiv:2109.00928.
DOI: 10.1145/3459637.3482395
Abstract: Automatic Speech Scoring (ASS) is the computer-assisted evaluation of a candidate’s speaking proficiency in a language. ASS systems face many challenges like open grammar, variable pronunciations, and unstructured or semi-structured content. Recent deep learning approaches have shown some promise in this domain. However, most of these approaches focus on extracting features from a single audio, making them suffer from the lack of speaker-specific context required to model such a complex task. We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context vectors from these responses and feed them as additional speaker-specific context to our network to score a particular response. We compare our technique with strong baselines and find that such modeling improves the model’s average performance by 6.92% (maximum = 12.86%, minimum = 4.51%). We further show both quantitative and qualitative insights into the importance of this additional context in solving the problem of ASS.
Title: Sawhney, R., Goyal, M., Goel, P., Mathur, P., & Shah, R. (2021, August). Multimodal Multi-Speaker Merger & Acquisition Financial Modeling: A New Task, Dataset, and Neural Baselines. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 6751-6762).
DOI: 10.18653/v1/2021.acl-long.526
Abstract: Risk prediction is an essential task in financial markets. Merger and Acquisition (M&A) calls provide key insights into the claims made by company executives about the restructuring of the financial firms. Extracting vocal and textual cues from M&A calls can help model the risk associated with such financial activities. To aid the analysis of M&A calls, we curate a dataset of conference call transcripts and their corresponding audio recordings for the time period ranging from 2016 to 2020. We introduce M3ANet, a baseline architecture that takes advantage of the multimodal multi-speaker input to forecast the financial risk associated with the M&A calls. Empirical results prove that the task is challenging, with the pro-posed architecture performing marginally better than strong BERT-based baselines. We release the M3A dataset and benchmark models to motivate future research on this challenging problem domain.
Title: Ramit Sawhney, Shivam Agarwal, Megh Thakkar, Arnav Wadhwa, and Rajiv Ratn Shah. 2021. Hyperbolic Online Time Stream Modeling. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21). Association for Computing Machinery, New York, NY, USA, 1682–1686.
DOI: https://doi.org/10.1145/3404835.3463119
Abstract: The rapidly rising ubiquity and dissemination of online information such as social media text and news improve user accessibility towards financial markets, however, modeling these vast streams of irregular, temporal data poses a challenge. Such temporal streams of information show power-law dynamics, scale-free characteristics, and time irregularities that sequential models are unable to accurately model. In this work, we propose the first Hierarchical Time-Aware Hyperbolic LSTM (HTLSTM), which leverages the Riemannian manifold for encoding the scale-free nature of a sequence of text in a time-aware fashion. Through experiments on three financial tasks: stock trading, equity price movement prediction, and financial risk prediction, we demonstrate HTLSTM's applicability for modeling temporal sequences of online information. On real-world data from four global stock markets and three stock indices spanning data in English and Chinese, we make a step towards time-aware text modeling via hyperbolic geometry.
Title: Agrawal, M., Mehrotra, P., Kumar, R., & Shah, R. R. (2021). Defending Touch-based Continuous Authentication Systems from Active Adversaries Using Generative Adversarial Networks. arXiv preprint arXiv:2106.07867.
Link: https://arxiv.org/abs/2106.07867
Abstract: Previous studies have demonstrated that commonly studied (vanilla) touch-based continuous authentication systems (V-TCAS) are susceptible to population attack. This paper proposes a novel Generative Adversarial Network assisted TCAS (G-TCAS) framework, which showed more resilience to the population attack. G-TCAS framework was tested on a dataset of 117 users who interacted with a smartphone and tablet pair. On average, the increase in the false accept rates (FARs) for V-TCAS was much higher (22%) than G-TCAS (13%) for the smartphone. Likewise, the increase in the FARs for V-TCAS was 25% compared to G-TCAS (6%) for the tablet.
Title: S. Chopra, P. Mathur, R. Sawhney and R. R. Shah, "Meta-Learning for Low-Resource Speech Emotion Recognition," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 6259-6263
DOI: 10.1109/ICASSP39728.2021.9414373.
Abstract: While emotion recognition is a well-studied task, it remains unexplored to a large extent in cross-lingual settings. Speech Emotion Recognition (SER) in low-resource languages poses difficulties as existing approaches for knowledge transfer do not generalize seamlessly. Probing the learning process of generalized representations across languages, we propose a meta-learning approach for low-resource speech emotion recognition. The proposed approach achieves fast adaptation on a number of unseen target languages simultaneously. We evaluate the Model Agnostic Meta-Learning (MAML) algorithm on three low-resource target languages -Persian, Italian, and Urdu. We empirically demonstrate that our proposed method - MetaSER 1 , considerably outperforms multitask and transfer learning-based methods for speech emotion recognition task, and discuss the benefits, efficiency, and challenges of MetaSER on limited data settings.
Title: A. N. Mathur, D. Batra, Y. K. Singla, R. Ratn Shah, C. Chen and R. Zimmermann, "LIFI: Towards Linguistically Informed Frame Interpolation," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 7593-7597,
DOI: 10.1109/ICASSP39728.2021.9413998.
Abstract: Here we explore the problem of speech video interpolation. With close to 70% of web traffic, such content today forms the primary form of online communication and entertainment. Despite high performance on conventional metrics like MSE, PSNR, and SSIM, we find that the state-of-the-art frame interpolation models fail to produce faithful speech interpolation. For instance, we observe the lips stay static while the person is still speaking for most interpolated frames. With this motivation, using the information of words, sub-words, and visemes, we provide a new set of linguistically informed metrics targeted explicitly to the problem of speech video interpolation. We release several datasets to test video interpolation models of their speech understanding. We also design linguistically informed deep learning video interpolation algorithms to generate the missing frames.
Title: Sawhney, R., Wadhwa, A., Agarwal, S., & Shah, R. (2021, June). Quantitative Day Trading from Natural Language using Reinforcement Learning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4018-4030).
DOI: 10.18653/v1/2021.naacl-main.316
Abstract: It is challenging to design profitable and practical trading strategies, as stock price movements are highly stochastic, and the market is heavily influenced by chaotic data across sources like news and social media. Existing NLP approaches largely treat stock prediction as a classification or regression problem and are not optimized to make profitable investment decisions. Further, they do not model the temporal dynamics of large volumes of diversely influential text to which the market responds quickly. Building on these shortcomings, we propose a deep reinforcement learning approach that makes time-aware decisions to trade stocks while optimizing profit using textual data. Our method outperforms state-ofthe-art in terms of risk-adjusted returns in trading simulations on two benchmarks: Tweets (English) and financial news (Chinese) pertaining to two major indexes and four global stock markets. Through extensive experiments and studies, we build the case for our method as a tool for quantitative trading.
Title: Sawhney, R., Aggarwal, A., & Shah, R. (2021, June). An Empirical Investigation of Bias in the Multimodal Analysis of Financial Earnings Calls. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 3751-3757).
Link: 10.18653/v1/2021.naacl-main.294
Abstract: Volatility prediction is complex due to the stock market’s stochastic nature. Existing research focuses on the textual elements of financial disclosures like earnings calls transcripts to forecast stock volatility and risk, but ignores the rich acoustic features in the company executives’ speech. Recently, new multimodal approaches that leverage the verbal and vocal cues of speakers in financial disclosures significantly outperform previous state-of-the-art approaches demonstrating the benefits of multimodality and speech. However, the financial realm is still plagued with a severe underrepresentation of various communities spanning diverse demographics, gender, and native speech. While multimodal models are better risk forecasters, it is imperative to also investigate the potential bias that these models may learn from the speech signals of company executives. In this work, we present the first study to discover the gender bias in multimodal volatility prediction due to gender sensitive audio features and fewer female executives in earnings calls of one of the world’s biggest stock indexes, the S&P 500 index. We quantitatively analyze bias as error disparity and investigate the sources of this bias. Our results suggest that multimodal neural financial models accentuate gender-based stereotypes.
Title: Sawhney, R., Joshi, H., Shah, R., & Flek, L. (2021, June). Suicide Ideation Detection via Social and Temporal User Representations using Hyperbolic Learning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 2176-2190).
Link: https://aclanthology.org/2021.naacl-main.176
Abstract: Recent psychological studies indicate that individuals exhibiting suicidal ideation increasingly turn to social media rather than mental health practitioners. Personally contextualizing the buildup of such ideation is critical for accurate identification of users at risk. In this work, we propose a framework jointly leveraging a user’s emotional history and social information from a user’s neighborhood in a network to contextualize the interpretation of the latest tweet of a user on Twitter. Reflecting upon the scale-free nature of social network relationships, we propose the use of Hyperbolic Graph Convolution Networks, in combination with the Hawkes process to learn the historical emotional spectrum of a user in a timesensitive manner. Our system significantly outperforms state-of-the-art methods on this task, showing the benefits of both socially and personally contextualized representations.
Title: Sawhney, R., Mathur, P., Jain, T., Gautam, A. K., & Shah, R. R. Multitask Learning for Emotionally Analyzing Sexual Abuse Disclosures.
Link: 10.18653/v1/2021.naacl-main.387
Abstract: The #MeToo movement on social media platforms initiated discussions over several facets of sexual harassment in our society. Prior work by the NLP community for automated identification of the narratives related to sexual abuse disclosures barely explored this social phenomenon as an independent task. However, emotional attributes associated with textual conversations related to the #MeToo social movement are complexly intertwined with such narratives. We formulate the task of identifying narratives related to the sexual abuse disclosures in online posts as a joint modeling task that leverages their emotional attributes through multitask learning. Our results demonstrate that positive knowledge transfer via context-specific shared representations of a flexible cross-stitched parameter sharing model helps establish the inherent benefit of jointly modeling tasks related to sexual abuse disclosures with emotion classification from the text in homogeneous and heterogeneous settings. We show how for more domain-specific tasks related to sexual abuse disclosures such as sarcasm identification and dialogue act (refutation, justification, allegation) classification, homogeneous multitask learning is helpful, whereas for more general tasks such as stance and hate speech detection, heterogeneous multitask learning with emotion classification works better.
Title: Sawhney, R., Joshi, H., Nobles, A., & Shah, R. R. (2021). Towards Emotion- and Time-Aware Classification of Tweets to Assist Human Moderation for Suicide Prevention. Proceedings of the International AAAI Conference on Web and Social Media, 15(1), 609-620.
Link: https://ojs.aaai.org/index.php/ICWSM/article/view/18088
Abstract: Social media platforms are already engaged in leveraging existing online socio-technical systems to employ just-in-time interventions for suicide prevention to the public. These efforts primarily rely on self-reports of potential self-harm content that is reviewed by moderators. Most recently, platforms have employed automated models to identify self-harm content, but acknowledge that these automated models still struggle to understand the nuance of human language (e.g., sarcasm). By explicitly focusing on Twitter posts that could easily be misidentified by a model as expressing suicidal intent (i.e., they contain similar phrases such as ``wanting to die''), our work examines the temporal differences in historical expressions of general and emotional language prior to a clear expression of suicidal intent. Additionally, we analyze time-aware neural models that build on these language variants and factors in the historical, emotional spectrum of a user's tweeting activity. The strongest model achieves high (statistically significant) performance (macro F1=0.804, recall=0.813) to identify social media indicative of suicidal intent. Using three use cases of tweets with phrases common to suicidal intent, we qualitatively analyze and interpret how such models decided if suicidal intent was present and discuss how these analyses may be used to alleviate the burden on human moderators within the known constraints of how moderation is performed (e.g., no access to the user's timeline). Finally, we discuss the ethical implications of such data-driven models and inferences about suicidal intent from social media. Content warning: this article discusses self-harm and suicide.
Title: Sawhney, R., Agarwal, S., Wadhwa, A., Derr, T., & Shah, R. R. (2021, May). Stock Selection via Spatiotemporal Hypergraph Attention Network: A Learning to Rank Approach. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 1, pp. 497-504).
Link: https://www.aaai.org/AAAI21Papers/AAAI-7907.SawhneyR.pdf
Abstract: Quantitative trading and investment decision making are intricate financial tasks that rely on accurate stock selection. Despite advances in deep learning that have made significant progress in the complex and highly stochastic stock prediction problem, modern solutions face two significant limitations. They do not directly optimize the target of investment in terms of profit, and treat each stock as independent from the others, ignoring the rich signals between related stocks’ temporal price movements. Building on these limitations, we reformulate stock prediction as a learning to rank problem and propose STHAN-SR, a neural hypergraph architecture for stock selection. The key novelty of our work is the proposal of modeling the complex relations between stocks through a hypergraph and a temporal Hawkes attention mechanism to tailor a new spatiotemporal attention hypergraph network architecture to rank stocks based on profit by jointly modeling stock interdependence and the temporal evolution of their prices. Through experiments on three markets spanning over six years of data, we show that STHAN-SR significantly outperforms state-of-the-art neural stock forecasting methods. We validate our design choices through ablative and exploratory analyses over STHAN-SR’s spatial and temporal components and demonstrate its practical applicability
Title: Yin, Y., Shrivastava, H., Zhang, Y., Liu, Z., Shah, R. R., & Zimmermann, R. (2021, May). Enhanced Audio Tagging via Multi-to Single-Modal Teacher-Student Mutual Learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 12, pp. 10709-10717).
Link: https://www.aaai.org/AAAI21Papers/AAAI-920.YinY.pdf
Abstract: Recognizing ongoing events based on acoustic clues has been a critical yet challenging problem that has attracted significant research attention in recent years. Joint audio-visual analysis can improve the event detection accuracy but may not always be feasible as under many circumstances only audio recordings are available in real-world scenarios. To solve the challenges, we present a novel visual-assisted teacherstudent mutual learning framework for robust sound event detection from audio recordings. Our model adopts a multimodal teacher network based on both acoustic and visual clues, and a single-modal student network based on acoustic clues only. Conventional teacher-student learning performs unsatisfactorily for knowledge transfer from a multi-modality network to a single-modality network. We thus present a mutual learning framework by introducing a single-modal transfer loss and a cross-modal transfer loss to collaboratively learn the audio-visual correlations between the two networks. Our proposed solution takes the advantages of joint audiovisual analysis in training while maximizing the feasibility of the model in use cases. Our extensive experiments on the DCASE17 and the DCASE18 sound event detection datasets show that our proposed method outperforms the state-of-the art audio tagging approaches.
Title: Vidit Jain, Maitree Leekha, Rajiv Ratn Shah, and Jainendra Shukla. 2021. Exploring Semi-Supervised Learning for Predicting Listener Backchannels. Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, New York, NY, USA, Article 395, 1–12.
DOI: https://doi.org/10.1145/3411764.3445449
Abstract: Developing human-like conversational agentsis a prime area in HCI research and subsumes many tasks. Predicting listener backchannels is one such actively-researched task. While many studies have used different approaches for backchannel prediction, they all have depended on manual annotations for a large dataset. This is a bottleneck impacting the scalability of development. To this end, we propose using semi-supervised techniques to automate the process of identifying backchannels, thereby easing the annotation process. To analyze our identification module’s feasibility, we compared the backchannel prediction models trained on (a) manually-annotated and (b) semi-supervised labels. Quantitative analysis revealed that the proposed semi-supervised approach could attain 95% of the former’s performance. Our user-study fndings revealed that almost 60% of the participants found the backchannel responses predicted by the proposed model more natural. Finally, we also analyzed the impact of personality on the type of backchannel signals and validated our fndings in the user-study.
Title: Ramit Sawhney, Shivam Agarwal, Arnav Wadhwa, and Rajiv Shah. 2021. Exploring the Scale-Free Nature of Stock Markets: Hyperbolic Graph Learning for Algorithmic Trading. In Proceedings of the Web Conference 2021 (WWW '21). Association for Computing Machinery, New York, NY, USA, 11–22.
DOI: https://doi.org/10.1145/3442381.3450095
Abstract: Quantitative trading and investment decision making are intricate financial tasks in the ever-increasing sixty trillion dollars global stock market. Despite advances in stock forecasting, a limitation of most existing neural methods is that they treat stocks independent of each other, ignoring the valuable rich signals between related stocks’ movements. Motivated by financial literature that shows stock markets and inter-stock correlations show scale-free network characteristics, we leverage domain knowledge on the Web to model inter-stock relations as a graph in four major global stock markets and formulate stock selection as a scale-free graph-based learning to rank problem. To capture the scale-free spatial and temporal dependencies in stock prices, we propose HyperStockGAT: Hyperbolic Stock Graph Attention Network, the first model on the Riemannian Manifolds for stock selection. Our work’s key novelty is the proposal of modeling the complex, scale-free nature of inter-stock relations through temporal hyperbolic graph learning on Riemannian manifolds that can represent the spatial correlations between stocks more accurately. Through extensive experiments on long-term real-world data spanning over six years on four of the world’s biggest markets: NASDAQ, NYSE, TSE, and China exchanges, we show that HyperStockGAT significantly outperforms state-of-the-art stock forecasting methods in terms of profitability by over 12%, and risk-adjusted Sharpe Ratio by over 4%. We analyze HyperStockGAT’s components’ contributions through a series of exploratory and ablative experiments to demonstrate its practical applicability to real-world trading. Furthermore, we propose a novel hyperbolic architecture that can be applied across various spatiotemporal problems on the Web’s commonly occurring scale-free networks.
Title: Sawhney, R., Joshi, H., Gandhi, S., & Shah, R. R. (2021, March). Towards Ordinal Suicide Ideation Detection on Social Media. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (pp. 22-30).
Link: https://doi.org/10.1145/3437963.3441805
Abstract: The rising ubiquity of social media presents a platform for individuals to express suicide ideation, instead of traditional, formal clinical settings. While neural methods for assessing suicide risk on social media have shown promise, a crippling limitation of existing solutions is that they ignore the inherent ordinal nature across finegrain levels of suicide risk. To this end, we reformulate suicide risk assessment as an Ordinal Regression problem, over the ColumbiaSuicide Severity Scale. We propose SISMO, a hierarchical attention model optimized to factor in the graded nature of increasing suicide risk levels, through soft probability distribution since not all wrong risk-levels are equally wrong. We establish the face value of SISMO for preliminary suicide risk assessment on real-world Reddit data annotated by clinical experts. We conclude by discussing the empirical, practical, and ethical considerations pertaining to SISMO in a larger picture, as a human-in-the-loop framework.
Title: Mehnaz, L., Mahata, D., Gosangi, R., Gunturi, U. S., Jain, R., Gupta, G., ... & Shah, R. R. (2021). GupShup: An Annotated Corpus for Abstractive Summarization of Open-Domain Code-Switched Conversations. arXiv preprint arXiv:2104.08578.
Link: arXiv preprint arXiv:2104.08578
Abstract: Code-switching is the communication phenomenon where speakers switch between different languages during a conversation. With the widespread adoption of conversational agents and chat platforms, code-switching has become an integral part of written conversations in many multi-lingual communities worldwide. This makes it essential to develop techniques for summarizing and understanding these conversations. Towards this objective, we introduce abstractive summarization of Hindi-English code-switched conversations and develop the first code-switched conversation summarization dataset - GupShup, which contains over 6,831 conversations in Hindi-English and their corresponding human annotated summaries in English and Hindi-English. We present a detailed account of the entire data collection and annotation processes. We analyze the dataset using various code-switching statistics. We train stateof-the-art abstractive summarization models and report their performances using both automated metrics and human evaluation. Our results show that multi-lingual mBART and multi-view seq2seq models obtain the best performances on the new dataset1 .
Title: Sawhney, R., Wadhwa, A., Agarwal, S., & Shah, R. (2021, April). FAST: Financial News and Tweet Based Time Aware Network for Stock Trading. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. 2164-2175).
Link: https://aclanthology.org/2021.eacl-main.185.pdf
Abstract: Designing profitable trading strategies is complex as stock movements are highly stochastic; the market is influenced by large volumes of noisy data across diverse information sources like news and social media. Prior work mostly treats stock movement prediction as a regression or classification task and is not directly optimized towards profit-making. Further, they do not model the fine-grain temporal irregularities in the release of vast volumes of text that the market responds to quickly. Building on these limitations, we propose a novel hierarchical, learning to rank approach that uses textual data to make time-aware predictions for ranking stocks based on expected profit. Our approach outperforms state-of-the-art methods by over 8% in terms of cumulative profit and risk-adjusted returns in trading simulations on two benchmarks: English tweets and Chinese financial news spanning two major stock indexes and four global markets. Through ablative and qualitative analyses, we build the case for our method as a tool for daily stock trading.
Title: Kashyap, A. R., Mehnaz, L., Malik, B., Waheed, A., Hazarika, D., Kan, M. Y., & Shah, R. (2021, April). Analyzing the Domain Robustness of Pretrained Language Models, Layer by Layer. In Proceedings of the Second Workshop on Domain Adaptation for NLP (pp. 222-244).
DOI: https://aclanthology.org/2021.adaptnlp-1.23/
Abstract: The robustness of pretrained language models(PLMs) is generally measured using performance drops on two or more domains. However, we do not yet understand the inherent robustness achieved by contributions from different layers of a PLM. We systematically analyze the robustness of these representations layer by layer from two perspectives. First, we measure the robustness of representations by using domain divergence between two domains. We find that i) Domain variance increases from the lower to the upper layers for vanilla PLMs; ii) Models continuously pretrained on domain-specific data (DAPT)(Gururangan et al., 2020) exhibit more variance than their pretrained PLM counterparts; and that iii) Distilled models (e.g., DistilBERT) also show greater domain variance. Second, we investigate the robustness of representations by analyzing the encoded syntactic and semantic information using diagnostic probes. We find that similar layers have similar amounts of linguistic information for data from an unseen domain.
Title: Yifang Yin, Ying Zhang, Zhenguang Liu, Sheng Wang, Rajiv Ratn Shah, and Roger Zimmermann. "GPS2Vec: Pre-trained Semantic Embeddings for Worldwide GPS Coordinates." IEEE Transactions on Multimedia (2021).
DOI: 10.1109/TMM.2021.3060951
Abstract: GPS coordinates are fine-grained location indicators that are difficult to be effectively utilized by classifiers in geo-aware applications. Previous GPS encoding methods concentrate on generating hand-crafted features for small areas of interest. However, many real world applications require a machine learning model, analogous to the pre-trained ImageNet model for images, that can efficiently generate semantically-enriched features for planet-scale GPS coordinates. To address this issue, we propose a novel two-level grid-based framework, termed GPS2Vec, which is able to extract geo-aware features in real-time for locations worldwide. The Earth's surface is first discretized by the Universal Transverse Mercator (UTM) coordinate system. Each UTM zone is then considered as a local area of interest that is further divided into fine-grained cells to perform the initial GPS encoding. We train a neural network in each UTM zone to learn the semantic embeddings from the initial GPS encoding. The training labels can be automatically derived from large-scale geotagged documents such as tweets, check-ins, and images that are available from social sharing platforms. We conducted comprehensive experiments on three geo-aware applications, namely place semantic annotation, geotagged image classification, and next location prediction. Experimental results demonstrate the effectiveness of our approach, as prediction accuracy improves significantly based on a simple multi-feature early fusion strategy with deep neural networks, including both CNNs and RNNs.
Title: Ramit Sawhney, Harshit Joshi, Saumya Gandhi, Di Jin, Rajiv Ratn Shah, Robust suicide risk assessment on social media via deep adversarial learning, Journal of the American Medical Informatics Association, Volume 28, Issue 7, July 2021, Pages 1497–1506
DOI: https://doi.org/10.1093/jamia/ocab031
Abstract: The prevalence of social media for sharing personal thoughts makes it a viable platform for the assessment of suicide risk. However, deep learning models are not able to capture the diverse nature of linguistic choices and temporal patterns that can be exhibited by a suicidal user on social media and end up overfitting on specific cues that are not generally applicable. We propose Adversarial Suicide assessment Hierarchical Attention (ASHA), a hierarchical attention model that employs adversarial learning for improving the generalization ability of the model.
Title: Deepak Sharma, Bijendra Kumar, Satish Chand, and Rajiv Ratn Shah. "Uncovering research trends and topics of communities in machine learning." Multimedia Tools and Applications 80, no. 6 (2021): 9281-9314.
DOI: https://doi.org/10.1007/s11042-020-10072-8
Abstract: This paper aims to uncover the research topics in machine learning research communities in a scientific collaboration network (SCN) to enhance the characteristic of systems such as retrieval or recommendation in intelligence-based systems. The existing research mainly focuses on the community evolution and measurement of typical features of the network. It is however unexplored how to identify the research interest of the communities along with authors in each community. A dataset is prepared consisting of 21,906 scientific articles from six top journals in the field of machine learning published from 1988 to 2017. An integrated approach combining the author-topic (AT) model with communities using through the directed affiliations (CoDA) method is explored to identify the research interest of the communities in a scientific collaboration network. The top rank communities are identified using the crank network community prioritization method. Finally, the similarity and dissimilarity of research interest in communities across decades are uncovered using the cosine similarity. The experimental results demonstrate the effectiveness and efficacy of the proposed technique. This study may be helpful for upcoming researchers to explore the research trends and topics in machine learning research communities.
Title: Shagun Uppal, Anish Madan, Sarthak Bhagat, Yi Yu, and Rajiv Ratn Shah. 2021. C3VQG: category consistent cyclic visual question generation. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia (MMAsia '20). Association for Computing Machinery, New York, NY, USA, Article 49, 1–7.
DOI: https://doi.org/10.1145/3444685.3446302
Abstract: Visual Question Generation (VQG) is the task of generating natural questions based on an image. Popular methods in the past have explored image-to-sequence architectures trained with maximum likelihood which have demonstrated meaningful generated questions given an image and its associated ground-truth answer. VQG becomes more challenging if the image contains rich contextual information describing its different semantic categories. In this paper, we try to exploit the different visual cues and concepts in an image to generate questions using a variational autoencoder (VAE) without ground-truth answers. Our approach solves two major shortcomings of existing VQG systems: (i) minimize the level of supervision and (ii) replace generic questions with category relevant generations. Most importantly, by eliminating expensive answer annotations, the required supervision is weakened. Using different categories enables us to exploit different concepts as the inference requires only the image and the category. Mutual information is maximized between the image, question, and answer category in the latent space of our VAE. A novel category consistent cyclic loss is proposed to enable the model to generate consistent predictions with respect to the answer category, reducing redundancies and irregularities. Additionally, we also impose supplementary constraints on the latent space of our generative model to provide structure based on categories and enhance generalization by encapsulating decorrelated features within each dimension. Through extensive experiments, the proposed model, C3VQG outperforms state-of-the-art VQG methods with weak supervision.
Title: Fiona McGaughey, Richard Watermeyer, Kalpana Shankar, Venkata Ratnadeep Suri, Cathryn Knight, Tom Crick, Joanne Hardman, Dean Phelan & Roger Chung (2021) ‘This can’t be the new norm’: academics’ perspectives on the COVID-19 crisis for the Australian university sector, Higher Education Research & Development
DOI: 10.1080/07294360.2021.1973384
Abstract: The COVID-19 pandemic has profoundly affected the university sector globally. This article reports on the Australian findings from a large-scale survey of academic staff and their experiences and predictions of the impact of the pandemic on their wellbeing. We report the perceptions of n = 370 Australian academics and accounts of their institutions’ responses to COVID-19, analysed using self-determination theory. Respondents report work-related stress, digital fatigue, and a negative impact on work-life balance; as well as significant concerns over potential longer-term changes to academia as a result of the pandemic. Respondents also articulate their frustration with Australia’s neoliberal policy architecture and the myopia of quasi-market reform, which has spawned an excessive reliance on international students as a pillar of income generation and therefore jeopardised institutional solvency – particularly during the pandemic. Conversely, respondents identify a number of ‘silver linings’ which speak to the resilience of academics.
Title: Richard Watermeyer, Kalpana Shankar, Tom Crick, Cathryn Knight, Fiona McGaughey, Joanna Hardman, Venkata Ratnadeep Suri, Roger Chung & Dean Phelan (2021) ‘Pandemia’: a reckoning of UK universities’ corporate response to COVID-19 and its academic fallout, British Journal of Sociology of Education
DOI: 10.1080/01425692.2021.1937058
Abstract: Universities in the UK, and in other countries like Australia and the USA, have responded to the operational and financial challenges presented by the COVID-19 pandemic by prioritising institutional solvency and enforcing changes to the work practices and profiles of their staff. For academics, an adjustment to institutional life under COVID-19 has been dramatic and resulted in the overwhelming majority making a transition to prolonged remote-working. Many have endured significant work intensification; others have lost – or may soon lose – their jobs. The impact of the pandemic appears transformational and for the most part negative. This article reports the experiences of 1099 UK academics specific to the corporate response of institutional leadership to the COVID-19 crisis. We find articulated a story of universities in the grip of ‘pandemia’ and COVID-19 emboldening processes and protagonists of neoliberal governmentality and market reform that pay little heed to considerations of human health and well-being.
Title: Kalpana Shankar, Dean Phelan, Venkata Ratnadeep Suri, Richard Watermeyer, Cathryn Knight & Tom Crick (2021) ‘The COVID-19 crisis is not the core problem’: experiences, challenges, and concerns of Irish academia during the pandemic, Irish Educational Studies, 40:2, 169-175
DOI: 10.1080/03323315.2021.1932550
Abstract: This article, drawing on data from an international survey – distributed in the summer of 2020 – explores the experiences and concerns of academic staff (n = 167) working in universities in Ireland and their perceptions of their institutions’ early response to the pandemic. Concerns related to transitioning to remote online working, impact on research productivity and culture, and work intensification, as intersected by enhanced managerialism, are ubiquitous to their accounts. As some respondents wrote of potential positive changes, particularly in the delivery of teaching, we conclude by suggesting potential avenues for building on successes in coping with the pandemic with some recommendations for mitigating some of the harms.
Profile: Dr. Richa is the newest core member of the CDNM and joined on 15 July, 2021. She completed Ph.D. from IIT Delhi in 2020. She holds a postgraduate degree in Industrial Design (M.Des.) from IDC, IIT Bombay (2013) and completed B.Tech. in Mechanical Engineering from IIIT Jabalpur (2011). She has done collaborative research at the School of Informatics and Computing, IUPU Indianapolis, USA (2017-18) and TU Darmstadt, Germany (2012). She has also worked as Project Scientist at AssisTech Labs, IIT Delhi where she contributed in design and development of several award winning translational research projects, namely SmartCane Device, DotBook (Braille Laptop for Blind), TacRead, OnBoard Bus Identification System, Accessible Graphics Design, and Multi-Modal Braille Learning Device. She is recipient of several prestigious fellowships namely, Stanford Ignite Global Innovation Fellowship (2015), Visvesvaraya PhD Fellowship (2015), JENESYS Fellowship (Indo-Japan Exchange Program, 2010). She was awarded the Chairman’s Silver Medal for Excellence in Academics at IIIT Jabalpur in 2011.
Her current research interests are: Perceptual foundations of Design, Inclusive Design and Accessibility, Product Design & Modern Prototyping, Multi-modal Interaction/Experience Design