University of Notre Dame and IBM Research build tools for AI governance

Expanding into virtually all aspects of modern society, AI systems are transforming everything from education to healthcare, but how trustworthy are the vast data landscapes that are fueling them?
The BenchmarkCards framework, a collection of datasets, benchmarks, and mitigations that serves as a guide for developers to build safe and transparent AI systems, was recently incorporated into IBM’s Risk Atlas Nexus, the company’s open-source AI toolkit for governance of foundation models.
Through support from the Notre Dame-IBM Technology Ethics Lab, researchers at the University of Notre Dame’s Lucy Family Institute for Data & Society and IBM Research jointly developed the framework, targeting the entire community of researchers and developers, and providing a practical guideline for improved evaluation and mitigation of potential risks when developing AI models.
The development of Large Language Models (LLMs) and assessment of their capabilities is guided by their performance in benchmarks – the combination of datasets, evaluation metrics, and associated processing steps. Although these benchmarks serve this critical role, when misused, they can provide a false sense of safety or performance, leading to serious ethical and practical implications.
In education, the heavy emphasis on popular benchmarks such as Massive Multitask Language Understanding (MMLU) and Grade School Math 8K (GSM8K) to evaluate large language models like ChatGPT has contributed to the development of AI tutors and test proctoring systems that, while innovative, may be limited in promoting deep conceptual understanding, sometimes yield inconsistent results, and raise important questions about the use of personal biometric data and informed consent.
To address these concerns, the BenchmarkCards framework is designed with a standardized documentation system that records essential benchmark metadata, including factual accuracy and bias detection. This enables researchers and developers to make more informed decisions about which benchmarks best suit their specific needs for AI system design — a recognized need in the AI community that was identified during a user study of BenchmarkCards.
“There is a growing discussion about large language models and concern about how these tools behave when compared to certain benchmarks,” said Nuno Moniz, associate research professor at the Lucy Family Institute for Data & Society and former director of the Notre Dame-IBM Technology Ethics Lab. “Benchmarks are designed with specific uses in mind and, in addition to not being exempt from risks, we observe a growing practice of assessing LLM capabilities outside of such intended uses. The BenchmarkCards framework is a significant contribution to the community, allowing developers to be more intentional and better guided when assessing the capabilities of these tools,” he added.
This project is led by Moniz, Michael Hind, Distinguished Research Staff Member at IBM Research, and Elizabeth Daly, Research Scientist and Lead of the Interactive AI Group at the IBM Research Laboratory (Dublin, Ireland), with Anna Sokol, doctoral student in the Department of Computer Science and Engineering, and Lucy Graduate Scholar, and Inge Vejsbjerg, research software engineer for the Interactive AI at the IBM Research Laboratory (Dublin, Ireland).
“Current documentation of benchmarks is ad hoc and often incomplete,” said Hind. “By having more standardized documentation, researchers and developers will be able to choose the most appropriate benchmark for their use case, resulting in better evaluations and ultimately more accurate and safe AI systems.”
Collaborators included Notre Dame faculty Nitesh Chawla, founding director of the Lucy Family Institute for Data and Society, Xiangliang Zhang, the Leonard C. Bettex Collegiate Professor of Computer Science in the Department of Computer Science and Engineering; and David Piorkowski, Staff Research Scientist at IBM.
“The work we did with our partners at the University of Notre Dame is helping to set a standard that we hope the community adopts to improve transparency and documentation around these benchmarks,” said Daly. “At IBM Research, it is now an integral part of our ontology as part of Risk Atlas Nexus.”
Since 2023, the University of Notre Dame has also been a member of the AI Alliance—a global consortium led by IBM and Meta—dedicated to advancing AI research, education, and governance, with a focus on open innovation and the development of safe and trustworthy AI systems.
“The future of AI lies not just in advancing algorithms, but in aligning them with human values—ensuring innovation fosters societal benefit,” said Chawla, who is also the Frank M. Freimann Professor of Computer Science & Engineering. “To actualize this alignment, we are deepening industry and academia partnerships by working together to design tools that promote transparency and empower researchers and developers to build more responsible AI models."
To learn more about other AI projects and activities within the Lucy Family Institute, please visit the Lucy Family Institute website.
Contact:
Christine Grashorn, Program Director, Engagement and Strategic Storytelling
Lucy Family Institute for Data & Society / University of Notre Dame
cgrashor@nd.edu / 574.631.4856
lucyinstitute.nd.edu / @lucy_institute
About the Lucy Family Institute for Data & Society
Guided by Notre Dame’s Mission, the Lucy Family Institute adventurously collaborates on advancing data-driven and artificial intelligence (AI) convergence research, translational solutions, and education to ethically address society’s vexing problems. As an innovative nexus of academia, industry, and the public, the Institute also fosters data science and AI access to strengthen diverse and inclusive capacity building within communities.
About the Notre Dame-IBM Technology Ethics Lab
The Notre Dame–IBM Technology Ethics Lab, a critical component of the Institute for Ethics and the Common Good and the Notre Dame Ethics Initiative, advances ethical, human-centered approaches to the design, development, and use of artificial intelligence and emerging technologies. Through applied, interdisciplinary research and broad stakeholder engagement, the Lab fosters dialogue, builds collaborative communities, and shapes policies and practices for responsible innovation and governance at scale.
“Current documentation of benchmarks is ad hoc and often incomplete,” said Hind. “By having more standardized documentation, researchers and developers will be able to choose the most appropriate benchmark for their use case, resulting in better evaluations and ultimately more accurate and safe AI systems.”
Collaborators included Notre Dame faculty Nitesh Chawla, founding director of the Lucy Family Institute for Data and Society, Xiangliang Zhang, the Leonard C. Bettex Collegiate Professor of Computer Science in the Department of Computer Science and Engineering; and David Piorkowski, Staff Research Scientist at IBM.
“The work we did with our partners at the University of Notre Dame is helping to set a standard that we hope the community adopts to improve transparency and documentation around these benchmarks,” said Daly. “At IBM Research, it is now an integral part of our ontology as part of Risk Atlas Nexus.”
Since 2023, the University of Notre Dame has also been a member of the AI Alliance—a global consortium led by IBM and Meta—dedicated to advancing AI research, education, and governance, with a focus on open innovation and the development of safe and trustworthy AI systems.
“The future of AI lies not just in advancing algorithms, but in aligning them with human values—ensuring innovation fosters societal benefit,” said Chawla, who is also the Frank M. Freimann Professor of Computer Science & Engineering. “To actualize this alignment, we are deepening industry and academia partnerships by working together to design tools that promote transparency and empower researchers and developers to build more responsible AI models."
To learn more about other AI projects and activities within the Lucy Family Institute, please visit the Lucy Family Institute website.
Contact:
Christine Grashorn, Program Director, Engagement and Strategic Storytelling
Lucy Family Institute for Data & Society / University of Notre Dame
cgrashor@nd.edu / 574.631.4856
lucyinstitute.nd.edu / @lucy_institute
About the Lucy Family Institute for Data & Society
Guided by Notre Dame’s Mission, the Lucy Family Institute adventurously collaborates on advancing data-driven and artificial intelligence (AI) convergence research, translational solutions, and education to ethically address society’s vexing problems. As an innovative nexus of academia, industry, and the public, the Institute also fosters data science and AI access to strengthen diverse and inclusive capacity building within communities.
About the Notre Dame-IBM Technology Ethics Lab
The Notre Dame–IBM Technology Ethics Lab, a critical component of the Institute for Ethics and the Common Good and the Notre Dame Ethics Initiative, advances ethical, human-centered approaches to the design, development, and use of artificial intelligence and emerging technologies. Through applied, interdisciplinary research and broad stakeholder engagement, the Lab fosters dialogue, builds collaborative communities, and shapes policies and practices for responsible innovation and governance at scale.
Latest Research
- Doctoral student Joryán Hernández to receive inaugural Sr. Dianna Ortiz, OSU Peacemaker AwardJoryán Hernández, a peace studies and theology doctoral student at the University of Notre Dame, was tapped as the first-ever recipient of the Sr. Dianna Ortiz, OSU Peacemaker Award from Pax…
- The Institute for Educational Initiatives at Notre Dame Launches Free Math App to Help Teachers Strengthen Students’ Understanding of Numbers and OperationsThe Number Sense Assessment app gives educators quick, research-based insights to target instruction and improve student outcomes Notre Dame, IN — Researchers at the Institute for Educational Initiatives at the University of Notre Dame have launched…
- U.S. Senator Todd Young on bridge-building in Congress and Notre Dame’s role in strengthening civil discourseThe University’s home state Senator discusses the importance of fostering common ground, on Capitol Hill and on campus
- Notre Dame researchers to shed light on the Brazilian Amazon, conflict resolution, microplastics, and moreNotre Dame Research (NDR) has selected five awardees of the Research and Scholarship Program – Regular Grant (RSP-RG) and five awardees of the Research…
- First impressions count: How babies are talked about during ultrasounds impacts parent perceptions, caregiving relationshipPsychologist Kaylin Hill studied the impact of a parent’s first impression of their baby during an ultrasound exam. The words used by the medical professional to describe the baby (positive or negative) influence how the parents perceive their baby, relate to them after they're born and even how that child behaves as a toddler. The research has broad implications for how we train medical professionals to interact with expectant parents, as well as how we care for parents during the perinatal period when they are most susceptible to depression.
- Researchers at Notre Dame detect ‘forever chemicals’ in reusable feminine hygiene productsWhen a reporter with the Sierra Club magazine asked Graham Peaslee, a physicist at the University of Notre Dame, to test several different samples of unused menstrual underwear for per- and polyfluoroalkyl substances (PFAS) in 2019, the results fueled concern over chemical exposure in feminine hygiene products — which ultimately ended up in a $5 million lawsuit against the period and incontinence underwear brand Thinx. Then in 2023, the New York Times asked Peaslee to test 44 additional period and incontinence products for PFAS, a class of toxic fluorinated compounds inherently repellent to oil, water, soil and stains, and known as “forever chemicals” for their exceptionally strong chemical and thermal stability. Measurable PFAS were found in some layers of many of the products tested — some low enough to suggest the chemicals may have transferred off packaging materials, while others contained higher concentrations, suggesting the chemicals were intentionally used during the manufacturing process. In the meantime, another group of researchers published a study that found PFAS in single-use period products, leading Peaslee and his lab to widen their investigation into all sorts of reusable feminine hygiene products — often viewed as an eco-friendly option by consumers. Now, the results of that study have been published in Environmental Science & Technology Letters.