University of Notre Dame and IBM Research build tools for AI governance

Expanding into virtually all aspects of modern society, AI systems are transforming everything from education to healthcare, but how trustworthy are the vast data landscapes that are fueling them?
The BenchmarkCards framework, a collection of datasets, benchmarks, and mitigations that serves as a guide for developers to build safe and transparent AI systems, was recently incorporated into IBM’s Risk Atlas Nexus, the company’s open-source AI toolkit for governance of foundation models.
Through support from the Notre Dame-IBM Technology Ethics Lab, researchers at the University of Notre Dame’s Lucy Family Institute for Data & Society and IBM Research jointly developed the framework, targeting the entire community of researchers and developers, and providing a practical guideline for improved evaluation and mitigation of potential risks when developing AI models.
The development of Large Language Models (LLMs) and assessment of their capabilities is guided by their performance in benchmarks – the combination of datasets, evaluation metrics, and associated processing steps. Although these benchmarks serve this critical role, when misused, they can provide a false sense of safety or performance, leading to serious ethical and practical implications.
In education, the heavy emphasis on popular benchmarks such as Massive Multitask Language Understanding (MMLU) and Grade School Math 8K (GSM8K) to evaluate large language models like ChatGPT has contributed to the development of AI tutors and test proctoring systems that, while innovative, may be limited in promoting deep conceptual understanding, sometimes yield inconsistent results, and raise important questions about the use of personal biometric data and informed consent.
To address these concerns, the BenchmarkCards framework is designed with a standardized documentation system that records essential benchmark metadata, including factual accuracy and bias detection. This enables researchers and developers to make more informed decisions about which benchmarks best suit their specific needs for AI system design — a recognized need in the AI community that was identified during a user study of BenchmarkCards.
“There is a growing discussion about large language models and concern about how these tools behave when compared to certain benchmarks,” said Nuno Moniz, associate research professor at the Lucy Family Institute for Data & Society and former director of the Notre Dame-IBM Technology Ethics Lab. “Benchmarks are designed with specific uses in mind and, in addition to not being exempt from risks, we observe a growing practice of assessing LLM capabilities outside of such intended uses. The BenchmarkCards framework is a significant contribution to the community, allowing developers to be more intentional and better guided when assessing the capabilities of these tools,” he added.
This project is led by Moniz, Michael Hind, Distinguished Research Staff Member at IBM Research, and Elizabeth Daly, Research Scientist and Lead of the Interactive AI Group at the IBM Research Laboratory (Dublin, Ireland), with Anna Sokol, doctoral student in the Department of Computer Science and Engineering, and Lucy Graduate Scholar.
“Current documentation of benchmarks is ad hoc and often incomplete,” said Hind. “By having more standardized documentation, researchers and developers will be able to choose the most appropriate benchmark for their use case, resulting in better evaluations and ultimately more accurate and safe AI systems.”
Collaborators included Notre Dame faculty Nitesh Chawla, founding director of the Lucy Family Institute for Data and Society, Xiangliang Zhang, the Leonard C. Bettex Collegiate Professor of Computer Science in the Department of Computer Science and Engineering; and David Piorkowski, Staff Research Scientist at IBM.
“The work we did with our partners at the University of Notre Dame is helping to set a standard that we hope the community adopts to improve transparency and documentation around these benchmarks,” said Daly. “At IBM Research, it is now an integral part of our ontology as part of Risk Atlas Nexus.”
Since 2023, the University of Notre Dame has also been a member of the AI Alliance—a global consortium led by IBM and Meta—dedicated to advancing AI research, education, and governance, with a focus on open innovation and the development of safe and trustworthy AI systems.
“The future of AI lies not just in advancing algorithms, but in aligning them with human values—ensuring innovation fosters societal benefit,” said Chawla, who is also the Frank M. Freimann Professor of Computer Science & Engineering. “To actualize this alignment, we are deepening industry and academia partnerships by working together to design tools that promote transparency and empower researchers and developers to build more responsible AI models."
To learn more about other AI projects and activities within the Lucy Family Institute, please visit the Lucy Family Institute website.
Contact:
Christine Grashorn, Program Director, Engagement and Strategic Storytelling
Lucy Family Institute for Data & Society / University of Notre Dame
cgrashor@nd.edu / 574.631.4856
lucyinstitute.nd.edu / @lucy_institute
About the Lucy Family Institute for Data & Society
Guided by Notre Dame’s Mission, the Lucy Family Institute adventurously collaborates on advancing data-driven and artificial intelligence (AI) convergence research, translational solutions, and education to ethically address society’s vexing problems. As an innovative nexus of academia, industry, and the public, the Institute also fosters data science and AI access to strengthen diverse and inclusive capacity building within communities.
About the Notre Dame-IBM Technology Ethics Lab
The Notre Dame–IBM Technology Ethics Lab, a critical component of the Institute for Ethics and the Common Good and the Notre Dame Ethics Initiative, advances ethical, human-centered approaches to the design, development, and use of artificial intelligence and emerging technologies. Through applied, interdisciplinary research and broad stakeholder engagement, the Lab fosters dialogue, builds collaborative communities, and shapes policies and practices for responsible innovation and governance at scale.
Latest Research
- Smarter tools for policymakers: Notre Dame researchers target urban carbon emissions, building by buildingCarbon emissions continue to increase at record levels, fueling climate instability and worsening air quality conditions for billions in cities worldwide. Yet despite global commitments to carbon neutrality, urban policymakers still struggle to implement effective mitigation strategies at the city scale. Now, researchers at Notre Dame’s School of Architecture, the College of Engineering and the Lucy Family Institute for Data & Society are working to reduce carbon emissions through advanced simulations and a novel artificial intelligence-driven tool, EcoSphere.
- Seven engineering faculty named collegiate professorsSeven faculty members in the Notre Dame College of Engineering have been named collegiate professors—a prestigious title awarded by the university and college in recognition of excellence in research, teaching and service. The designation may be conferred on faculty at the assistant, associate or…
- ‘A special challenge’: German studies scholar wins National Humanities Center fellowship for research on medieval womenFor CJ Jones, the joy of research is not the answers but the journey. And the next step on that journey is a fellowship with the National Humanities Center. …
- Notre Dame Lead Innovation Team partners with local WIC program to identify, prevent lead poisoning in childrenB.A.B.E. store “shoppers” now have something new to help their families: free lead screening kits offered by the University of Notre Dame’s Lead Innovation Team.
- Notre Dame Welcomes Ninth Cohort of Warrior-Scholars for Transformative Academic JourneyNOTRE DAME, IN – The University of Notre Dame recently concluded its ninth successful Warrior-Scholar Project (WSP) boot camp, hosting 34 dedicated Warrior-Scholars from June 21st to 28th. This intensive, week-long academic residency provided transitioning service members and veterans…
- Entrepreneurship and Empowerment in South Africa study abroad program celebrates 25 yearsThis year, the Entrepreneurship and Empowerment in South Africa (EESA) program marked its 25th year of operation. EESA is a six-week summer study abroad program that enables students to help historically…