This autoethnographic study explores the need for interdisciplinary education spanning both technical an philosophical skills – as such, this study leverages whole-person education as a theoretical approach needed in AI engineering education to address the limitations of current paradigms that prioritize technical expertise over ethical and societal considerations. Drawing on a collaborative autoethnography approach of fourteen diverse stakeholders, the study identifies key motivations driving the call for change, including the need for global perspectives, bridging the gap between academia and industry, integrating ethics and societal impact, and fostering interdisciplinary collaboration. The findings challenge the myths of technological neutrality and technosaviourism, advocating for a future where AI engineers are equipped not only with technical skills but also with the ethical awareness, social responsibility, and interdisciplinary understanding necessary to navigate the complex challenges of AI development. The study provides valuable insights and recommendations for transforming AI engineering education to ensure the responsible development of AI technologies.
Category: Hard Questions: Education
Hard Questions: Education
-

Potential and perils of large language models as judges of unstructured textual data
Rapid advancements in large language models have unlocked remarkable capabilities when it comes to processing and summarizing unstructured text data. This has implications for the analysis of rich, open-ended datasets, such as survey responses, where LLMs hold the promise of efficiently distilling key themes and sentiments. However, as organizations increasingly turn to these powerful AI systems to make sense of textual feedback, a critical question arises, can we trust LLMs to accurately represent the perspectives contained within these text based datasets? While LLMs excel at generating human-like summaries, there is a risk that their outputs may inadvertently diverge from the true substance of the original responses. Discrepancies between the LLM-generated outputs and the actual themes present in the data could lead to flawed decision-making, with far-reaching consequences for organizations. This research investigates the effectiveness of LLM-as-judge models to evaluate the thematic alignment of summaries generated by other LLMs. We utilized an Anthropic Claude model to generate thematic summaries from open-ended survey responses, with Amazon’s Titan Express, Nova Pro, and Meta’s Llama serving as judges. This LLM-as-judge approach was compared to human evaluations using Cohen’s kappa, Spearman’s rho, and Krippendorff’s alpha, validating a scalable alternative to traditional human centric evaluation methods. Our findings reveal that while LLM-as-judge offer a scalable solution comparable to human raters, humans may still excel at detecting subtle, context-specific nuances. Our research contributes to the growing body of knowledge on AI assisted text analysis. Further, we provide recommendations for future research, emphasizing the need for careful consideration when generalizing LLM-as-judge models across various contexts and use cases.
-

Evaluating Online AI Detection Tools: An Empirical Study Using Microsoft Copilot-Generated Content
Our findings reveal significant inconsistencies and limitations in AI detection tools, with many failing to accurately identify Copilotauthored text. Examining eight freely available online AI detection tools using text samples produced by Microsoft Copilot, we assess their accuracy and consistency. We feed a short sentence and a small paragraph and note the estimate of these tools. Our results suggest that educators should not rely on these tools to check for AI use.
-

Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling
Topic modeling is a widely used technique for uncovering thematic structures from large text corpora. However, most topic modeling approaches e.g. Latent Dirichlet Allocation (LDA) struggle to capture nuanced semantics and contextual understanding required to accurately model complex narratives. Recent advancements in this area include methods like BERTopic, which have demonstrated significantly improved topic coherence and thus established a new standard for benchmarking. In this paper, we present a novel approach, the Qualitative Insights Tool (QualIT) that integrates large language models (LLMs) with existing clustering-based topic modeling approaches. Our method leverages the deep contextual understanding and powerful language generation capabilities of LLMs to enrich the topic modeling process using clustering. We evaluate our approach on a large corpus of news articles and demonstrate substantial improvements in topic coherence and topic diversity compared to baseline topic modeling techniques. On the 20 ground-truth topics, our method shows 70% topic coherence (vs 65% & 57% benchmarks) and 95.5% topic diversity (vs 85% & 72% benchmarks). Our findings suggest that the integration of LLMs can unlock new opportunities for topic modeling of dynamic and complex text data, as is common in talent management research contexts.
-

Investigating Transition Phases: An Autoethnographic Study of International Women of Color Engineering Educators in the US
The study aims to explore the transitions experienced by international Women of Color (IWoC) engineers in the US as they navigate their academic and professional lives. Motivated by the lack of research on IWoC’s experiences, specifically around transition points of their lives, four international Women of Color participated in this qualitative auto-ethnographic deep-dive. All four researchers have attended college in the United States for their high educational degrees focused on education/engineering education and are currently involved in engineering education scholarship work.








