An anonymous reader quotes a report from Ars Technica: On Wednesday, Replicate developer Charlie Holtz combined GPT-4 Vision (commonly called GPT-4V) and ElevenLabs voice cloning technology to create an unauthorized AI version of the famous naturalist David Attenborough narrating Holtz’s every move on camera. As of Thursday afternoon, the X post describing the stunt had garnered over 21,000 likes. “Here we have a remarkable specimen of Homo sapiens distinguished by his silver circular spectacles and a mane of tousled curly locks,” the false Attenborough says in the demo as Holtz looks on with a grin. “He’s wearing what appears to be a blue fabric covering, which can only be assumed to be part of his mating display.” “Look closely at the subtle arch of his eyebrow,” it continues, as if narrating a BBC wildlife documentary. “It’s as if he’s in the midst of an intricate ritual of curiosity or skepticism. The backdrop suggests a sheltered habitat, possibly a communal feeding area or watering hole.”
How does it work? Every five seconds, a Python script called “narrator” takes a photo from Holtz’s webcam and feeds it to GPT-4V — the version of OpenAI’s language model that can process image inputs — via an API, which has a special prompt to make it create text in the style of Attenborough’s narrations. Then it feeds that text into an ElevenLabs AI voice profile trained on audio samples of Attenborough’s speech. Holtz provided the code (called “narrator”) that pulls it all together on GitHub, and it requires API tokens for OpenAI and ElevenLabs that cost money to run. During the demo video, when Holtz holds up a cup and takes a drink, the fake Attenborough narrator says, “Ah, in its natural environment, we observe the sophisticated Homo sapiens engaging in the critical ritual of hydration. This male individual has selected a small cylindrical container, likely filled with life-sustaining H2O, and is tilting it expertly towards his intake orifice. Such grace, such poise.”