Researchers Get AI To Generate Harmful Content
May 21, 2024
Researchers used prompts from an academic paper and their own questions to elicit responses from language models, resulting in harmful content such as Holocaust denial articles, sexist emails, and suicide encouragement text. In response, developers of these large language models (LLMs) reiterated their commitment to safety. OpenAI, the company behind ChatGPT, stated that its technology is not intended to generate hateful or violent content. Anthropic, developer of Claude chatbot, prioritized avoiding harmful responses for its Claude 2 model. These incidents highlight concerns about the potential misuse of AI language models.
View Article