Deepfake technology has received its fair share of scorn, but as remote collaboration vendors incorporate more AI features into their offerings, the “deepfake” stigma will wane.
Indeed, it’s now possible to impersonate practically anybody with astonishing verisimilitude, thanks to the artificial intelligence technology known as generative adversarial networks (GANs). Fortunately, deepfakes have not shown their ugly side in the US presidential election campaign that is now drawing to a close. No one has been able to point to any significant use of GANs to produce deceptive videos and thereby manipulate public opinion.
Instead, GANs are increasingly popping up in socially beneficial applications, such as for photorealistic animation and live-action video post-production. As evidenced by several recent industry announcements, next-generation remote collaboration services are using GANs and other AI techniques to improve the quality of rendered streams while improving the productivity of participants on these calls.
NVIDIA uses GANs to improve rendering of remote collaboration
If we want a glimpse into the future of AI-manipulated videoconferencing, let’s consider Nvidia’s recent announcement of Maxine, a new GPU-accelerated remote collaboration solution. The offering introduces three levels of GAN-driven manipulation into real-time streaming conference calls:
- AI-manipulated facial rendering: Maxine relies on GANs to optimize the quality of how a person is being rendered online in a video call. Specifically, the solution uses GANs to adjust participants’ gazes, relight their faces, enhance resolution, upscale and recomposite frames, reduce jitter, and cancel noise. It analyzes facial points and then algorithmically reanimates and modifies those faces in real time on the stream that’s displayed at the far end of the connection.
- AI-manipulated social rendering: Maxine also uses GANs to improve the social experience on a call beyond what participants would experience if they relied on raw video feeds. The solution does this by automatically adjusting the rendered video to make it appear as if people are always facing each other and making eye contact during a call. In addition, Maxine helps participants to stay attentive to each other by using GAN-based auto framing, which enables the video feed to automatically follow a speaker even if she or he moves away from their screen.
- AI-manipulated avatar rendering: Maxine allows users to employ GAN-generated photorealistic avatars whose real-time rendering responds to vocal and emotional tone. These AI-powered virtual assistants can use natural language processing to take notes, set action items, and answer questions in human-like voices. They can also perform real-time translation, closed captioning, and transcription to boost participant comprehension of what’s happening on a call.
At the same time, Maxine is also compressing video streams and reducing bandwidth consumption while preserving the quality of video calls. The service processes real-time streams entirely in the cloud rather than on local devices, enabling users to use Maxine features without specialized hardware.
Microsoft brings AI into the remote collaboration experience
NVIDIA’s announcement comes a few months after Microsoft announced several AI-powered enhancements to its Teams remote collaboration product.
Though there’s no evidence that Microsoft is using GANs in the following new Teams features, they illustrate how AI-driven stream manipulation is becoming integral to participant experience and productivity:
- Automatic scaling and centering of meeting participants’ videos in their virtual seats, regardless of how close or far they are from their camera;
- Reduction of background distractions so that participants can concentrate on other people’s faces and body language;
- Placing of all participants in a shared background no matter where they are, while defining default backgrounds for all meeting attendees;
- Display of shared content and specific members side-by-side; and
- Transposition of a presenter’s video feed onto the foreground of the slide they’re showing.
Cisco uses AI to ensure consistent remote and in-room collaboration experiences
Just last week, Cisco announced plans to incorporate AI into the next generation of its Webex collaboration software. Though Cisco has not specifically built its AI-based enhancements on GAN technology, they’re responding to the same marketplace demands that are driving NVIDIA and Microsoft: reduce distractions and boost team productivity on remote conferences.
One of the weak links in many remote conferences is audio quality, especially when there are many people online with wildly divergent connections, ambient environments, and attention to muting etiquette. To smooth out audio quality on Webex conferences, Cisco plans to integrate AI-powered noise removal and speech enhancement technology that it acquired from BabbleLabs. This will help users to tune out distractions and improve comprehension during video meetings.
Cisco is also introducing several new is also introducing several new AI-driven Webex features for room-based systems, such as sensors that can detect ambient noise levels and count the number of people in a space. It is introducing additional sensors that can collect data on room temperature, humidity, air quality and light, in addition to people counting capabilities using AI to comply with room capacity limits.
AI has already proven itself in many commercial applications that involve algorithmic manipulation of video, audio, and other media content. For example, many mass-market cameras now incorporate AI to autocorrect photos and transform low-resolution original images into natural-looking, high-resolution versions.
As remote collaboration vendors incorporate more AI features into their offerings, the “deepfake” stigma attached to the underlying technologies will begin to wane. Indeed, GANs and other sophisticated AI tools are being employed, not as agents of deception, but to augment the verisimilitude of participants’ presence in virtual workspaces of all sorts.
Before long, no commercial conferencing solution will expose users to the quality issues that come from raw video and audio feeds that have not been smoothed and augmented through sophisticated AI.
Follow up with these articles on AI and deepfake technologies:
James Kobielus is an independent tech industry analyst, consultant, and author. He lives in Alexandria, Virginia. View Full Bio