Anthropic releases a new constitution that describes the behavior of AI model Claude.
Anthropic has published a new constitution for the AI model Claude. The announcement reads as follows: “It is a detailed description of Anthropic’s vision of Claude’s values and behavior; a holistic document that explains the context in which Claude operates and what kind of entity we expect from Claude.”
Updated version
The previous constitution dates from 2023 and, according to the company, was a list of isolated principles that were not specific enough. “We should not only specify what they expect of them; AI models must understand why we want them to behave in a certain way,” said Anthropic.
The new version is based on four general requirements. First, Claude must be broadly safe by refusing prohibited actions and being more transparent about decisions. In addition, he must be “really helpful” and act according to the context of the user. The other pillars are ethical behavior and compliance with specific internal guidelines, such as protection against jailbreaking and proper handling of external applications.
Training and assessment
The document is part of Claude’s training dataset. Based on the document, Claude generates synthetic training data that helps him learn and understand the constitution. This allows him to translate the vision into a useful tool for his answers, and thus comply with the rules. If Claude does give an answer that does not comply with the constitution, users can send feedback to Anthropic.
read also
