Duck Soup: Breaking Bad AI

Monday, July 31, 2023

Breaking Bad AI

@AnthropicAI's "Claude" is susceptible to the same base64 jailbreak as chatGPT. I'm very unclear why this works at all

[ed. Birthing pains: Code prompts (instead of text) circumvent AI safety features. See also: Anthropic launches Claude, a chatbot to rival OpenAI’s ChatGPT ; and: OpenAI makes GPT-4 generally available (TC):]

"Since the reveal of GPT-4 in March, the generative AI competition has grown fiercer. Recently, Anthropic expanded the context window for Claude — its flagship text-generating AI model, still in preview — from 9,000 tokens to 100,000 tokens. (Context window refers to the text the model considers before generating additional text, while tokens represent raw text — e.g. the word “fantastic” would be split into the tokens “fan,” “tas” and “tic.”)

GPT-4 held the previous crown in terms of context window, weighing in at 32,000 tokens on the high end. Generally speaking, models with small context windows tend to “forget” the content of even very recent conversations, leading them to veer off topic." (...)

“The challenge is making models that both never hallucinate but are still useful — you can get into a tough situation where the model figures a good way to never lie is to never say anything at all, so there’s a tradeoff there that we’re working on,” the Anthropic spokesperson said. “We’ve also made progress on reducing hallucinations, but there is more to do.” (...)

No doubt, Anthropic is feeling some sort of pressure from investors to recoup the hundreds of millions of dollars that’ve been put toward its AI tech. (...)

Most recently, Google pledged $300 million in Anthropic for a 10% stake in the startup. Under the terms of the deal, which was first reported by the Financial Times, Anthropic agreed to make Google Cloud its “preferred cloud provider” with the companies “co-develop[ing] AI computing systems.”