40 Focus | Video encoding determine areas within each frame that are least consequential so that they can be compressed more than the others. Therefore, one of the key elements in the work of JVET AhG11 is to develop a framework capable of fully specifying, evaluating and comparing the outputs of AI-based algorithms. In essence, JVET is establishing clear rules by which AI methods can be assessed and made reproducible. Inevitably though, as AI-based methods continue to evolve, it is abundantly clear that the technology will become deeply entrenched in video encoding and decoding solutions. That goes beyond existing audio-visual applications, as more video will be consumed by the processors in the vehicles and robots. This will lead to AIdriven codecs with different adjustments for human and machine interpretation. In the longer term, ML could be used to generate video frames based on reference images, video context and metadata descriptions that together build a credible approximation of the original video. New methods of creating video are expected to emerge, first for broadcast and then for embedded wireless links for UAVs and autonomous vehicles on the road. Generative AI can create convincing audio and visual assets entirely artificially, so the future for video coding has to consider synthetic media. Codecs are already being launched that don’t compress a video file but reconstruct it based on context and reference images, to preserve quality despite a drastic reduction in file size. This can be particularly relevant for remotely operated road vehicles. This synthetic data can protect the identity of other road users by removing any identifiable features while still delivering the accurate situational awareness a remote operator needs. Bandwidth-friendly By sending only the base information of each frame as mathematical elements rather than pixels and rebuilding frames on the fly, ML compression is bandwidth-friendly and can scale infinitely to any size of display. This scaling significantly reduces the demand on the feed from the uncrewed system. Video content encoded by an AI codec can be rebuilt from scratch to fit any resolution and frame rate. New immersive formats for video for VR and AR are being developed. In these applications, low latency is of paramount importance since noticeable delays in video encoding or decoding not only destroy the levels of immersion but can lead to operator nausea and sickness. Sensors on VR headsets are being developed to operate in tandem with video processing engines to reduce socalled motion-to-photon latency. Further, solutions in video coding for 360o video capture and rendering aim to reduce the computational load necessary. Techniques such as foveated rendering provide a shortcut by encoding portions of each frame at full resolution, depending on where the viewer is concentrating on. However, the computational requirements are so high that many headsets have to be tethered to high-performance PCs for the visual processing. Sequence Spatial Frame Frame Bit Scene name resolution count rate depth feature Basketball 960 x 528 100 24 8 Outdoors ground Grassland 1344 x 752 100 24 8 Outdoors Intersection 1360 x 752 100 24 8 Outdoors Night mall 1920 x 1072 100 30 8 Outdoors Soccer ground 1904 x 1056 100 30 8 Outdoors Circle 1360 x 752 100 24 8 Outdoors Cross-bridge 2720 x 1520 100 30 8 Outdoors Highway 1344 x 752 100 24 8 Outdoors Classroom 640 x 352 100 24 8 Indoors Elevator 640 x 352 100 24 8 Indoors Hall 640 x 352 100 24 8 Indoors Campus 1024 x 528 100 24 8 Outdoors Road by the sea 1024 x 528 100 24 8 Outdoors Theatre 1024 x 528 100 24 8 Outdoors Class A VisDroneSOT Class B VisDroneMOT Class C Corridor Class D UAVDT S Video sequence characteristics of the proposed learned UAV video coding benchmark (Courtesy of MPAI) Codecs are being launched that don’t compress a video file but reconstruct it based on context, to preserve quality despite drastically reducing file size October/November 2023 | Uncrewed Systems Technology
RkJQdWJsaXNoZXIy MjI2Mzk4