Loading Events

« All Events

  • This event has passed.

Face Video Compression with Generative Models

May 3, 2023 @ 8:00 am - 9:00 am

Video coding is a fundamental and ubiquitous technology in the modern society. Generations of international video coding standards, such as the widely-deployed H.264/AVC and H.265/HEVC and the latest H.266/VVC, provide essential means for enabling video conferencing, video streaming, video sharing, e-commerce, entertainment, and many more video applications. These existing standards all rely on the fundamental theory of signal processing and information theory to encode generic video efficiently with a favorable rate distortion behavior.

In recent years, rapid advancement in deep learning and artificial intelligence technology has allowed people to manipulate images and videos using deep generative models. Among these, of particular interest to the field of video coding is the application of deep generative models towards compressing talking-face video at ultra-low bit rates. By focusing on talking faces, generative models can effectively learn the inherent structure about composition, movement and posture of human faces and deliver promising results using very little bandwidth resource. At ultra-low bit rates, when even the latest video coding standard H.266/VVC is apt to suffer from significant blocking artifacts and blurriness beyond the point of recognition, generative methods can maintain clear facial features and vivid expression in the reconstructed video. Further, generative face video coding techniques are inherently capable of manipulating the reconstructed face and promise to deliver a more interactive experience.

In this talk, we start with a quick overview of traditional and deep learning-based video coding techniques. We then focus on face video coding with generative networks, and present two schemes that send different deep information in the bitstream, one sending compact temporal motion features and the other sending 3D facial semantics. We compare their compression efficiency and visual quality with that of the latest H.266/VVC standard, and showcase the power of deep generative models in preserving vivid facial images with little bandwidth resource. We also present visualization results to exhibit the capability of the 3D facial semantics-based scheme in terms of interacting with the reconstructed face video and animating virtual faces.

Co-sponsored by: Fairleigh Dickinson University

Speaker(s): Dr. Yan Ye,

Agenda:
Video coding is a fundamental and ubiquitous technology in the modern society. Generations of international video coding standards, such as the widely-deployed H.264/AVC and H.265/HEVC and the latest H.266/VVC, provide essential means for enabling video conferencing, video streaming, video sharing, e-commerce, entertainment, and many more video applications. These existing standards all rely on the fundamental theory of signal processing and information theory to encode generic video efficiently with a favorable rate distortion behavior.

In recent years, rapid advancement in deep learning and artificial intelligence technology has allowed people to manipulate images and videos using deep generative models. Among these, of particular interest to the field of video coding is the application of deep generative models towards compressing talking-face video at ultra-low bit rates. By focusing on talking faces, generative models can effectively learn the inherent structure about composition, movement and posture of human faces and deliver promising results using very little bandwidth resource. At ultra-low bit rates, when even the latest video coding standard H.266/VVC is apt to suffer from significant blocking artifacts and blurriness beyond the point of recognition, generative methods can maintain clear facial features and vivid expression in the reconstructed video. Further, generative face video coding techniques are inherently capable of manipulating the reconstructed face and promise to deliver a more interactive experience.

In this talk, we start with a quick overview of traditional and deep learning-based video coding techniques. We then focus on face video coding with generative networks, and present two schemes that send different deep information in the bitstream, one sending compact temporal motion features and the other sending 3D facial semantics. We compare their compression efficiency and visual quality with that of the latest H.266/VVC standard, and showcase the power of deep generative models in preserving vivid facial images with little bandwidth resource. We also present visualization results to exhibit the capability of the 3D facial semantics-based scheme in terms of interacting with the reconstructed face video and animating virtual faces.

Virtual: https://events.vtools.ieee.org/m/352355

Details

Date:
May 3, 2023
Time:
8:00 am - 9:00 am
Event Category:
Website:
https://events.vtools.ieee.org/m/352355

Organizer

fang_luo@stonybrook_edu
Email
fang_luo@stonybrook_edu
Social Media Auto Publish Powered By : XYZScripts.com