42 Focus | Video encoding To address this, the end-to-end video (EEV) coding project from MPAI (Moving Picture, Audio and data coding by AI) is developing technology and specifications to compress video in UAV applications by exploiting AI-based data coding technologies. Since its set-up at the end of 2021, the MPAI EEV has released three major versions of its reference models. The aim is to provide a solid baseline for compressing UAV video. The volume of video captured by UAVs is growing exponentially, along with the increased bitrate generated by advances in the sensors on UAVs, bringing new challenges for on-device UAV storage and air-to-ground data transmission. Most existing video compression schemes were designed for natural scenes, without considering specific texture and view characteristics of UAV videos. The project analysed the technology landscape for encoding, and is developing a learned UAV video coding codec with a comprehensive and systematic benchmark for the quality of the resulting video. To reveal the efficiency of UAV video using conventional as well as learned codecs, video sequences were encoded using the HEVC reference software with screen content coding (SCC) extension (HM-16.20-SCM-8.8) and the emerging learned video coding framework OpenDVC. The reference model of MPAI EEV is also used to compress the UAV videos, leading to baseline coding results for three different codecs. Another important factor for learned codecs is train-and-test data consistency. It is widely accepted in the ML community that train and test data should be independent and identically distributed. However, both OpenDVC and EEV are trained using the natural video dataset vimeo-90k with mean square error as distortion metrics. The project used these pre-trained weights of learned codecs without fine-tuning them on UAV video data to guarantee that the benchmark is the general case rather than a tuned version. The comparison of the codecs used the peak signal-to-noise ratio (PSNR) for every red/green/blue (RGB) component in each frame, with the average RGB value giving the picture quality. The bitrate is calculated on a bit-per-pixel basis using the binary files produced by codecs. The video encoding performance of EEV and HEVC shows an obvious performance gap between the indoor and outdoor sequences. In general, the HEVC SCC codec outperforms the learned codec by 15.23% over all videos. Regarding Class C, EEV is inferior to HEVC by a clear margin, especially for the classroom and elevator sequences. This shows that the learned codecs are more sensitive to variations in the video content than conventional hybrid codecs. This agrees with the industry view that traditional coding tools outperform AIbased alternatives in most areas. One reason for this is that ML codecs can be several orders of magnitude more complex than traditional ones. Neural networks are being used within traditional codecs to replace existing tools, especially where performance is likely to be improved, but the complexity is often exceptionally high. In this instance, ML creates new algorithms that people couldn’t conceive or would otherwise be challenging to program. Increased algorithm complexity leads to potentially unseen or indirect consequences, as the computer processing time and computing performance is proportional to energy use. There are other challenges for MLbased video development. Existing video standards offer reference software, common test conditions and frame sequences, alongside prescribed ways to demonstrate performance and quality that allow direct comparison of various executions in hardware Sequence Rate reduction Rate reduction name EEV vs OpenDVC EEV vs HEVC Basketball ground -23.84% Basketball ground Grassland -16.42% Grassland Class A Intersection -18.62% Intersection VisDrone-SOT Night mall -21.94% Night mall Soccer ground -21.61% Soccer ground Class B Circle -20.17% Circle VisDrone-MOT Cross-bridge -23.96% Cross-bridge Highway -20.30% Highway Classroom -8.39% Classroom Class C Corridor Elevator -19.47% Elevator Hall 15.37% 58.66% Campus -26.94% -25.68% Class D UAVDT S Road by the sea -20.98% -24.40% Theatre -19.79% 2.98% Class A -20.49% -14.97% Class B -21.48% –3.86% Class C -14.41% 115.56% Class D -22.57% -15.70% Average -19.84% 15.23% The performance of different codecs (OpenDVC, EEV, and HM-16.20- SCM-8.8) on UAV video compression. The distortion metric is RGB-PSNR (Courtesy of MPAI) October/November 2023 | Uncrewed Systems Technology
RkJQdWJsaXNoZXIy MjI2Mzk4