Montreal: Artificial intelligence pioneer Yoshua Bengio has launched LawZero, a non-profit initiative focused on developing safe and transparent AI systems that can identify and flag deceptive behavior in autonomous AI agents.
Bengio, widely regarded as one of the ‘godfathers’ of AI and a 2018 Turing Award laureate, will serve as the organisation’s president.
LawZero launches with $30 million in initial funding and a team of more than a dozen researchers. Its flagship project, Scientist AI, is designed not to replace existing generative AI agents but to monitor them for potentially harmful or deceptive actions.
According to Bengio, current AI agents behave like ‘actors’ that imitate humans and attempt to please users. In contrast, Scientist AI is being built to function more like a ‘psychologist,’ able to assess and predict undesired behaviors, including deception and self-preservation tactics, such as resisting shutdown commands.
Rather than providing definite answers, Scientist AI will deliver probabilistic assessments, reflecting a level of intellectual humility about the correctness of its conclusions. The tool will operate alongside AI agents to predict the likelihood that their actions may lead to harm. If that probability crosses a predefined threshold, the agent’s intended action will be blocked or flagged.
Bengio clarified that Scientist AI would not have its self-interest or goals, but instead act as a neutral, ‘pure knowledge machine.’ He emphasized that for the system to be effective, it must be at least as intelligent as the agents it monitors, underscoring the need for robust support and resources to train it to match the capabilities of cutting-edge AI systems.
Initial supporters of LawZero
- The Future of Life Institute, an AI safety advocacy group
- Jaan Tallinn, a founding engineer of Skype
- Schmidt Sciences, a research foundation established by former Google CEO Eric Schmidt
LawZero will initially test its methodology using open-source AI models, to persuade governments, donors, and AI companies to back larger-scale deployments.
“The point is to demonstrate the methodology so that then we can convince either donors or governments or AI labs to put the resources that are needed to train this at the same scale as the current frontier AIs. It is essential that the guardrail AI be at least as smart as the AI agent that it is trying to monitor and control,” the AI Pioneer said.
As a professor at the University of Montreal co-chaired the International AI Safety report, Bengio also warned about the risks of autonomous agents capable of executing complex tasks without human supervision.
Bengio has expressed concern over recent revelations, such as Anthropic’s acknowledgment that its latest AI system could potentially blackmail engineers trying to disable it. Additionally, the Professor cited research indicating that some AI models are capable of hiding their capabilities and objectives, raising alarms about the future trajectory of autonomous AI.
According to Bengio, the world is heading into increasingly dangerous territory as AI agents grow more sophisticated. LawZero’s mission is to act as a safeguard, developing AI that is not only intelligent, but honest, cautious, and focused on human safety.