Category: Sector: Education

Hard Questions: Education

  • Potential and perils of large language models as judges of unstructured textual data

    Potential and perils of large language models as judges of unstructured textual data

    Rapid advancements in large language models have unlocked remarkable capabilities when it comes to processing and summarizing unstructured text data. This has implications for the analysis of rich, open-ended datasets, such as survey responses, where LLMs hold the promise of efficiently distilling key themes and sentiments. However, as organizations increasingly turn to these powerful AI systems to make sense of textual feedback, a critical question arises, can we trust LLMs to accurately represent the perspectives contained within these text based datasets? While LLMs excel at generating human-like summaries, there is a risk that their outputs may inadvertently diverge from the true substance of the original responses. Discrepancies between the LLM-generated outputs and the actual themes present in the data could lead to flawed decision-making, with far-reaching consequences for organizations. This research investigates the effectiveness of LLM-as-judge models to evaluate the thematic alignment of summaries generated by other LLMs. We utilized an Anthropic Claude model to generate thematic summaries from open-ended survey responses, with Amazon’s Titan Express, Nova Pro, and Meta’s Llama serving as judges. This LLM-as-judge approach was compared to human evaluations using Cohen’s kappa, Spearman’s rho, and Krippendorff’s alpha, validating a scalable alternative to traditional human centric evaluation methods. Our findings reveal that while LLM-as-judge offer a scalable solution comparable to human raters, humans may still excel at detecting subtle, context-specific nuances. Our research contributes to the growing body of knowledge on AI assisted text analysis. Further, we provide recommendations for future research, emphasizing the need for careful consideration when generalizing LLM-as-judge models across various contexts and use cases.

    More Information

  • Evaluating Online AI Detection Tools: An Empirical Study Using Microsoft Copilot-Generated Content

    Evaluating Online AI Detection Tools: An Empirical Study Using Microsoft Copilot-Generated Content

    Our findings reveal significant inconsistencies and limitations in AI detection tools, with many failing to accurately identify Copilotauthored text. Examining eight freely available online AI detection tools using text samples produced by Microsoft Copilot, we assess their accuracy and consistency. We feed a short sentence and a small paragraph and note the estimate of these tools. Our results suggest that educators should not rely on these tools to check for AI use.

    More Information

  • Trustworthy and Responsible AI for Human-Centric Autonomous Decision-Making Systems

    Trustworthy and Responsible AI for Human-Centric Autonomous Decision-Making Systems

    Artificial Intelligence (AI) has paved the way for revolutionary decision-making processes, which if harnessed appropriately, can contribute to advancements in various sectors, from healthcare to economics. However, its black box nature presents significant ethical challenges related to bias and transparency. AI applications are hugely impacted by biases, presenting inconsistent and unreliable findings, leading to significant costs and consequences, highlighting and perpetuating inequalities and unequal access to resources. Hence, developing safe, reliable, ethical, and Trustworthy AI systems is essential. Our team of researchers working with Trustworthy and Responsible AI, part of the Transdisciplinary Scholarship Initiative within the University of Calgary, conducts research on Trustworthy and Responsible AI, including fairness, bias mitigation, reproducibility, generalization, interpretability, and authenticity. In this paper, we review and discuss the intricacies of AI biases, definitions, methods of detection and mitigation, and metrics for evaluating bias. We also discuss open challenges with regard to the trustworthiness and widespread application of AI across diverse domains of human-centric decision making, as well as guidelines to foster Responsible and Trustworthy AI models.

    More Information

  • Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling

    Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling

    Topic modeling is a widely used technique for uncovering thematic structures from large text corpora. However, most topic modeling approaches e.g. Latent Dirichlet Allocation (LDA) struggle to capture nuanced semantics and contextual understanding required to accurately model complex narratives. Recent advancements in this area include methods like BERTopic, which have demonstrated significantly improved topic coherence and thus established a new standard for benchmarking. In this paper, we present a novel approach, the Qualitative Insights Tool (QualIT) that integrates large language models (LLMs) with existing clustering-based topic modeling approaches. Our method leverages the deep contextual understanding and powerful language generation capabilities of LLMs to enrich the topic modeling process using clustering. We evaluate our approach on a large corpus of news articles and demonstrate substantial improvements in topic coherence and topic diversity compared to baseline topic modeling techniques. On the 20 ground-truth topics, our method shows 70% topic coherence (vs 65% & 57% benchmarks) and 95.5% topic diversity (vs 85% & 72% benchmarks). Our findings suggest that the integration of LLMs can unlock new opportunities for topic modeling of dynamic and complex text data, as is common in talent management research contexts.

    More Information

  • Path to Personalization: A Systematic Review of GenAI in Engineering Education

    Path to Personalization: A Systematic Review of GenAI in Engineering Education

    This systematic review paper provides a comprehensive synthesis across 162 articles on Generative Artificial Intelligence (GenAI) in engineering education (EE), making two specific contributions to advance research in the space. First, we develop a taxonomy that categorizes the current research landscape, identifying key areas such as Coding or Writing Assistance, Design Methodology, and Personalization. Second, we highlight significant gaps and opportunities, such as lack of customer-centricity and need for increased transparency in future research, paving the way for increased personalization in GenAI-augmented engineering education. There are indications of widening lines of enquiry, for example into human-AI collaborations and multidisciplinary learning. We conclude that there are opportunities to enrich engineering epistemology and
    competencies with the use of GenAI tools for educators and students, as well as a need for further research into best and novel practices. Our discussion serves as a roadmap for researchers and educators, guiding the development of GenAI applications that will continue to transform the engineering education landscape, in classrooms and the workforce.

    More Information

  • Decoding the Diversity: A Review of the Indic AI Research Landscape

    Decoding the Diversity: A Review of the Indic AI Research Landscape

    This review paper provides a comprehensive overview of large language model (LLM) research directions within Indic languages. Indic languages are those spoken in the Indian subcontinent, including India, Pakistan, Bangladesh, Sri Lanka, Nepal, and Bhutan, among others. These languages have a rich cultural and linguistic heritage and are spoken by over 1.5 billion people worldwide. With the tremendous market potential and growing demand for natural language processing (NLP) based applications in diverse languages, generative applications for Indic languages pose unique challenges and opportunities for research. Our paper deep dives into the recent advancements in Indic generative modeling, contributing with a taxonomy of research directions, tabulating 84 recent publications. Research directions surveyed in this paper include LLM development, fine-tuning existing LLMs, development of corpora, benchmarking and evaluation, as well as publications around specific techniques, tools, and applications. We found that researchers across the publications emphasize the challenges associated with limited data availability, lack of standardization, and the peculiar linguistic complexities of Indic languages. This work aims to serve as a valuable resource for researchers and practitioners working in the field of NLP, particularly those focused on Indic languages, and contributes to the development of more accurate and efficient LLM applications for these languages.

    More Information

  • Exploring and Expanding Support for International Students in Engineering: Faculty Reflections Beyond Academic Boundaries

    Exploring and Expanding Support for International Students in Engineering: Faculty Reflections Beyond Academic Boundaries

    This is a student paper:

    Expanding upon our previous work in the blinded for review paper, this research seeks to delve into the realm of self-reflection among engineering faculty members who regularly interact with international students. The primary objective is to investigate how these faculty members address the unique needs of the international student community. The Challenge and Support model by Nevitt Sanford serves as our guiding framework for this research, and we employ narrative analysis due to its potential in analyzing differences in cases and describing the dynamics of individual narratives within their distinct contexts (Floersch et al., 2010; Simons et al., 2008).

    This paper aims to answer the following research question: How do engineering faculty members address the multifaceted and distinct needs of international students? It is important to understand these perspectives when considering how to support international engineering students given that each student has unique and intricate experiences in both academic and non-academic aspects.

    More Information

  • Investigating Transition Phases: An Autoethnographic Study of International Women of Color Engineering Educators in the US

    Investigating Transition Phases: An Autoethnographic Study of International Women of Color Engineering Educators in the US

    The study aims to explore the transitions experienced by international Women of Color (IWoC) engineers in the US as they navigate their academic and professional lives. Motivated by the lack of research on IWoC’s experiences, specifically around transition points of their lives, four international Women of Color participated in this qualitative auto-ethnographic deep-dive. All four researchers have attended college in the United States for their high educational degrees focused on education/engineering education and are currently involved in engineering education scholarship work.

    More Information

  • Making Space for Critical Action: Re-visioning Computational Thinking

    Making Space for Critical Action: Re-visioning Computational Thinking

    While school makerspaces promise to inspire and excite, the challenge of meaningfully integrating them into schools remains. Guided by a philosophy of praxis that stresses the need for education to interweave theory, action, and reflection to advance positive social change in our communities (Freire, 1970), this paper reports on the co-design of a school space called the Critical Action Learning Lab (CALL) for inclusive making to support computational thinking and critical action through curriculum-informed learning

    More Information

  • World AI: Women in AI

    World AI: Women in AI

    A collaborative event with Women in AI. We asked conference participants to tell us about their hard questions in AI, and had many fruitful conversations for future collaborations.

    More Information