YouTube channel ColdFusion has released a new video that explores the latest developments in artificial intelligence. The video looks at stacked generative adversarial networks, and how the progress of AI could drastically alter society.
The YouTube channel ColdFusion (previously called ColdfusTion) has released a new video that explores the latest developments in artificial intelligence.
"Imagine typing a descriptive sentence of a theme, and having an artificial intelligence generate a convincing photorealistic image just from your text input," asks Dagogo Altraide in the video. "This has just been created."
The system, detailed in this paper, uses an AI system called Stacked Generative Adversarial Networks (StackGAN).
Creating photorealistic images from text descriptions alone is a challenging problem in computer vision and has many practical applications. Up until recently, samples generated by other text-to-image approaches could only roughly reflect the meaning of the given descriptions and failed to contain necessary details and vivid, recognizable image output.
|Images of birds created with a Stacked Generative Adversarial Network (StackGAN)|
|StackGAN System - Xun Huang, Yixuan Li, Omid Poursaeed, John Hopcroft, and Serge Belongie. (https://arxiv.org/abs/1612.04357)|
As Altraide describes,
If we combine two neural networks together, and make them compete against each other so that they can train and improve themselves without human intervention, that's what StackGAN is doing. It uses one neural network to generate images, and another neural network within the same system to decide if the image generated is real or fake. What ends up happening is that the generative neural network improves itself at generating images based on the feedback given by the deciding network. In the same stride, the deciding network gets better at distinguishing what's real and fake.
This software architecture creates a feedback loop of continuous improvement without human intervention, that keeps cleaning and refining the images generated. As Altraide comments, "The end results are nothing short of stunning."
The video also highlights the AI generated sound system, WaveNet and a system from Carnegie Mellon University that has mastered Texas Hold 'Em poker. WaveNet is capable of producing natural-sounding human speech and music on its own.
Jeff Dean's recent talk about how image recognition is already exceeding human capability is also showcased in the video.
"It seems like playtime is over in regards to AI.""It seems like playtime is over in regards to AI, especially with techniques like deep learning and neural networks," states Altraide. He discusses how social disruption is soon to follow these rapid technological advances.
Another strong video from ColdFusion!