Category: Sreyoshi Bhaduri, Ph.D.

Sreyoshi Bhaduri, Ph.D.
Biography
Linkedin
Google Scholar

  • Levers of Power in the Field of AI

    Levers of Power in the Field of AI

    Forthcoming study, now available on Arxiv:

    Levers of Power in the Field of AI
    An Ethnography of Personal Influence in Institutionalization

    Who holds power over decisions in our society? How do these people influence decisions, and how are these people influenced? How is this the same or different when it comes to questions about AI?  These are some of the questions we set out to understand.

    Abstract: This paper examines how decision makers in academia, government, business, and civil society navigate questions of power in implementations of artificial intelligence (AI). The study explores how individuals experience and exercise “levers of power”, which are presented as social mechanisms that shape institutional responses to technological change. The study reports on the responses of personalised questionnaires designed to gather insight on a decision maker’s institutional purview, based on an institutional governance framework developed from the work of Neo Institutionalists. Findings present the anonymized, real responses and circumstances of respondents in the form of twelve fictional personas of high-level decision makers from North America and Europe. These personas illustrate how personal agency, organizational logics, and institutional infrastructures may intersect in the governance of AI. The decision makers’ responses to the questionnaires then inform a discussion of the field level personal power of decision-makers, methods of fostering institutional stability in times of change, and methods of influencing institutional change in the field of AI. The final section of the discussion presents a table of the dynamics of the levers of power in the field of AI for change makers and 5 testable hypotheses for institutional and social movement researchers. In summary, this study provides insight on the means for policymakers within institutions and their counterparts in civil society to personally engage with AI governance.

    Read  on Arxiv.

  • AWS blog: “AI judging AI”

    AWS blog: “AI judging AI”

    “Picture this: Your team just received 10,000 customer feedback responses. The traditional approach? Weeks of manual analysis. But what if AI could not only analyze this feedback but also validate its own work? Welcome to the world of large language model (LLM) jury systems deployed using Amazon Bedrock. As more organizations embrace generative AI, particularly LLMs for various applications, a new challenge has emerged: ensuring that the output from these AI models aligns with human perspectives and is accurate and relevant to the business context. ”

    Read the work on their blog: https://aws.amazon.com/blogs/machine-learning/ai-judging-ai-scaling-unstructured-text-analysis-with-amazon-nova/

  • Beyond mere automation: A techno-functional framework for reimagining gen-AI in supply chain operations

    Beyond mere automation: A techno-functional framework for reimagining gen-AI in supply chain operations

    As Generative AI (Gen-AI) continues to evolve rapidly, its potential to transform supply chain operations remains largely unexplored. Narrowing in on retail supply chain, this paper presents a taxonomy diagram that categorizes trends in Gen-AI adoption across various functions thereby mapping current Gen-AI capabilities and identifying immediate opportunities and potential challenges. We identify several key patterns in Gen-AI integration, including the automation of routine cognitive tasks, and enhancement of human decision-making capabilities. We posit that while Gen-AI shows immense promise in improving supply chain efficiency and resilience, successful implementation requires careful consideration of existing workflows, user capabilities, and organizational readiness. Finally, we present a cohesive vision for scaling Gen-AI in Supply Chain operations. Ultimately, this position paper provides insights for both practitioners looking to implement Gen-AI solutions and researchers exploring the future of AI in and for supply chain management.

    Read the full workshop report here.

  • Whole-Person Education for AI Engineers: Presented to CEEA (Peer Reviewed)

    Whole-Person Education for AI Engineers: Presented to CEEA (Peer Reviewed)

    This autoethnographic study explores the need for interdisciplinary education spanning both technical an philosophical skills – as such, this study leverages whole-person education as a theoretical approach needed in AI engineering education to address the limitations of current paradigms that prioritize technical expertise over ethical and societal considerations. Drawing on a collaborative autoethnography approach of fourteen diverse stakeholders, the study identifies key motivations driving the call for change, including the need for global perspectives, bridging the gap between academia and industry, integrating ethics and societal impact, and fostering interdisciplinary collaboration. The findings challenge the myths of technological neutrality and technosaviourism, advocating for a future where AI engineers are equipped not only with technical skills but also with the ethical awareness, social responsibility, and interdisciplinary understanding necessary to navigate the complex challenges of AI development. The study provides valuable insights and recommendations for transforming AI engineering education to ensure the responsible development of AI technologies.

    More Information

  • WIP: Gen AI in Engineering Education and the Da Vinci Cube (Peer Reviewed)

    WIP: Gen AI in Engineering Education and the Da Vinci Cube (Peer Reviewed)

    As generative AI (GenAI) tools rapidly transform the engineering landscape, a critical question emerges: Are current educational innovations adequately preparing engineers for the socio-technical challenges of the future? This work-in-progress paper presents two key contributions. First, we build on prior work presenting a systematic review of over 160 scholarly articles on GenAI implementations in engineering education, revealing a predominant focus on enhancing technical proficiency while often neglecting essential socio-technical competencies. Second, we apply an emerging framework—the da Vinci Cube (dVC)—to support engineering educators in critically evaluating GenAI-driven innovations. The dVC framework extends traditional models of innovation by incorporating three dimensions: the pursuit of knowledge, consideration of use, and contemplation of sentiment. Our analysis suggests that while GenAI tools can improve problem-solving and technical efficiency, engineering education must also address ethical, human-centered, and societal impacts. The dVC framework provides a structured lens for assessing how GenAI tools are integrated into curricula and research, encouraging a more holistic, reflective approach. Ultimately, this paper aims to provoke dialogue on the future of engineering education and to challenge the prevailing assumption that technical skill development alone is sufficient in an AI-mediated world.

    More Information

  • Work in Progress: Exclusive Rhetoric in AI Conference Mission Statements

    Work in Progress: Exclusive Rhetoric in AI Conference Mission Statements

    AI conferences are pivotal spaces for knowledge exchange, collaboration, and shaping the trajectory of research, practice, and education. This paper presents preliminary findings from an analysis of AI conference mission statements, investigating how their stated goals affect who is welcomed into AI conversations. We find that many mission statements reflect assumptions that may unintentionally narrow participation and reinforce disciplinary and institutional silos. This limits engagement from a broad range of contributors—including educators, students, working professionals, and even younger users —who are essential to a thriving AI ecosystem. We advocate for clearer framing that supports democratizing and demystifying AI. By broadening participation and intentionally fostering cross-sector and interdisciplinary connections, AI conferences can help unlock more innovation.

    More Information

  • Canary in the Mine: An LLM Augmented Survey of Disciplinary Complaints to the Ordre des ingénieurs du Québec (OIQ) (Peer Reviewed)

    Canary in the Mine: An LLM Augmented Survey of Disciplinary Complaints to the Ordre des ingénieurs du Québec (OIQ) (Peer Reviewed)

    This study investigates disciplinary incidents involving engineers in Quebec, shedding light on critical gaps in engineering education. Through a comprehensive review of the disciplinary register of the Ordre des ingénieurs du Québec (OIQ)’s disciplinary register for 2010 to 2024, researchers from engineering education and human resources management in technological development laboratories conducted a thematic analysis of reported incidents to identify patterns, trends, and areas for improvement. The analysis aims to uncover the most common types of disciplinary incidents, underlying causes, and implications for the field in how engineering education addresses (or fails to address) these issues. Our findings identify recurring themes, analyze root causes, and offer recommendations for engineering educators and students to mitigate similar incidents. This research has implications for informing curriculum development, professional development, and performance evaluation, ultimately fostering a culture of professionalism and ethical responsibility in engineering. By providing empirical evidence of disciplinary incidents and their causes, this study contributes to evidence-based practices for engineering education and professional development, enhancing the engineering education community’s understanding of professionalism and ethics.

    More Information

  • What We Do Not Know: GPT Use in Business and Management

    What We Do Not Know: GPT Use in Business and Management

    This systematic review examines peer-reviewed studies on application of GPT in business management, revealing significant knowledge gaps. Despite identifying interesting research directions such as best practices, benchmarking, performance comparisons, social impacts, our analysis yields only 42 relevant studies for the 22 months since its release. There are so few studies looking at a particular sector or subfield that management researchers, business consultants, policymakers, and journalists do not yet have enough information to make well-founded statements on how GPT is being used in businesses. The primary contribution of this paper is a call to action for further research. We provide a description of current research and identify knowledge gaps on the use of GPT in business. We cover the management subfields of finance, marketing, human resources, strategy, operations, production, and analytics, excluding retail and sales. We discuss gaps in knowledge of GPT potential consequences on employment, productivity, environmental costs, oppression, and small businesses. We propose how management consultants and the media can help fill those gaps. We call for practical work on business control systems as they relate to existing and foreseeable AI-related business challenges. This work may be of interest to managers, to management researchers, and to people working on AI in society.

    More Information

  • IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding

    IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding

    Known by more than 1.5 billion people in the Indian subcontinent, Indic languages present unique challenges and opportunities for natural language processing (NLP) research due to their rich cultural heritage, linguistic diversity, and complex structures. IndicMMLU-Pro is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) across Indic languages, building upon the MMLU Pro (Massive Multitask Language Understanding) framework. Covering major languages such as Hindi, Bengali, Gujarati, Marathi, Kannada, Punjabi, Tamil, Telugu, and Urdu, our benchmark addresses the unique challenges and opportunities presented by the linguistic diversity of the Indian subcontinent. This benchmark encompasses a wide range of tasks in language comprehension, reasoning, and generation, meticulously crafted to capture the intricacies of Indian languages. IndicMMLU-Pro provides a standardized evaluation framework to push the research boundaries in Indic language AI, facilitating the development of more accurate, efficient, and culturally sensitive models. This paper outlines the benchmarks’ design principles, task taxonomy, and data collection methodology, and presents baseline results from state-of-the-art multilingual models.

    More Information

  • Potential and perils of large language models as judges of unstructured textual data

    Potential and perils of large language models as judges of unstructured textual data

    Rapid advancements in large language models have unlocked remarkable capabilities when it comes to processing and summarizing unstructured text data. This has implications for the analysis of rich, open-ended datasets, such as survey responses, where LLMs hold the promise of efficiently distilling key themes and sentiments. However, as organizations increasingly turn to these powerful AI systems to make sense of textual feedback, a critical question arises, can we trust LLMs to accurately represent the perspectives contained within these text based datasets? While LLMs excel at generating human-like summaries, there is a risk that their outputs may inadvertently diverge from the true substance of the original responses. Discrepancies between the LLM-generated outputs and the actual themes present in the data could lead to flawed decision-making, with far-reaching consequences for organizations. This research investigates the effectiveness of LLM-as-judge models to evaluate the thematic alignment of summaries generated by other LLMs. We utilized an Anthropic Claude model to generate thematic summaries from open-ended survey responses, with Amazon’s Titan Express, Nova Pro, and Meta’s Llama serving as judges. This LLM-as-judge approach was compared to human evaluations using Cohen’s kappa, Spearman’s rho, and Krippendorff’s alpha, validating a scalable alternative to traditional human centric evaluation methods. Our findings reveal that while LLM-as-judge offer a scalable solution comparable to human raters, humans may still excel at detecting subtle, context-specific nuances. Our research contributes to the growing body of knowledge on AI assisted text analysis. Further, we provide recommendations for future research, emphasizing the need for careful consideration when generalizing LLM-as-judge models across various contexts and use cases.

    More Information

  • Advancements in Modern Recommender Systems: Industrial Applications in Social Media, E-commerce, Entertainment, and Beyond

    Advancements in Modern Recommender Systems: Industrial Applications in Social Media, E-commerce, Entertainment, and Beyond

    In the current digital era, the proliferation of online content has overwhelmed users with vast amounts of information, necessitating effective filtering mechanisms. Recommender systems have become indispensable in addressing this challenge, tailoring content to individual preferences and significantly enhancing user experience. This paper delves into the latest advancements in recommender systems, analyzing 115 research papers and 10 articles, and dissecting their application across various domains such as e-commerce, entertainment, and social media. We categorize these systems into content-based, collaborative, and hybrid approaches, scrutinizing their methodologies and performance. Despite their transformative impact, recommender systems grapple with persistent issues like scalability, cold-start problems, and data sparsity. Our comprehensive review not only maps the current landscape of recommender system research but also identifies critical gaps and future directions. By offering a detailed analysis of datasets, simulation platforms, and evaluation metrics, we provide a robust foundation for developing next-generation recommender systems poised to deliver more accurate, efficient, and personalized user experiences, inspiring innovative solutions to drive forward the evolution of recommender technology.

    More Information

  • Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling

    Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling

    Topic modeling is a widely used technique for uncovering thematic structures from large text corpora. However, most topic modeling approaches e.g. Latent Dirichlet Allocation (LDA) struggle to capture nuanced semantics and contextual understanding required to accurately model complex narratives. Recent advancements in this area include methods like BERTopic, which have demonstrated significantly improved topic coherence and thus established a new standard for benchmarking. In this paper, we present a novel approach, the Qualitative Insights Tool (QualIT) that integrates large language models (LLMs) with existing clustering-based topic modeling approaches. Our method leverages the deep contextual understanding and powerful language generation capabilities of LLMs to enrich the topic modeling process using clustering. We evaluate our approach on a large corpus of news articles and demonstrate substantial improvements in topic coherence and topic diversity compared to baseline topic modeling techniques. On the 20 ground-truth topics, our method shows 70% topic coherence (vs 65% & 57% benchmarks) and 95.5% topic diversity (vs 85% & 72% benchmarks). Our findings suggest that the integration of LLMs can unlock new opportunities for topic modeling of dynamic and complex text data, as is common in talent management research contexts.

    More Information