Table of Contents: Online Supplement Volume 12, Number 4s
It has been observed in the recent literature that the drift error due to watermarking degrades the visual quality of the embedded video. The existing drift error handling strategies for recent video standards such as H.264 may not be directly applicable for upcoming HD video standards (such as HEVC) due to different compression architecture. In this paper, a compressed domain watermarking scheme is proposed for H.265/HEVC bit stream which can handle drift error propagation both for intra and inter prediction process. Additionally, the proposed scheme shows adequate robustness against re-compression attack as well as common image processing attacks while maintaining the decent visual quality. A comprehensive set of experiments have been carried out to justify the efficacy of the proposed scheme over the existing literature.
From wearable devices to depth cameras, researchers are exploiting various multimodal data to recognize human actions for applications such as video gaming, education, healthcare, etc. While there have been many successful techniques in the literature, most of the current approaches have focused on statistical or local spatio-temporal features and do not explicitly explore the temporal dynamics of the sensor streams. However, human action streams contain rich temporal structure information that can characterize the unique underlying patterns of different action types. From this perspective, we propose a novel temporal order modeling approach to human action recognition. Specifically, we explore subspace projections to extract the latent temporal patterns of the actions. The temporal order between these patterns are compared and the index of the pattern that appears first is encoded. This process is iterated over multiple times and produces a compact feature vector representing the temporal dynamics of the sensor data. Human action recognition can then be efficiently solved by the nearest neighbor search based on the Hamming distance between these compact feature vectors. We further introduce a sequential optimization algorithm to learn the optimized projections that preserve the pairwise label similarity of the original sensor data. The performances are evaluated on two public human action datasets. Experimental results demonstrate the superior performance of the proposed technique in both accuracy and efficiency.
In this paper we introduce a method to overcome one of the main challenges of person re-identification in multi-camera networks, namely cross-view appearance changes. The proposed solution addresses the extreme variability of person appearance in different camera views by exploiting multiple feature representations. For each feature, Kernel Canonical Correlation Analysis (KCCA) with different kernels is employed to learn several projection spaces in which the appearance correlation between samples of the same person observed from different cameras is maximized. An iterative logistic regression is finally used to select and weigh the contributions of each projection and perform the matching between the two views. Experimental evaluation shows that the proposed solution obtains comparable performance on VIPeR and PRID 450s datasets and improves on PRID and CUHK01 datasets with respect to the state of the art.
Both voice conversion and hidden Markov model (HMM)-based speech synthesis can be used to produce artificial voices of a target speaker. They have shown great negative impacts on speaker verification (SV) systems. In order to enhance the security of SV systems, the techniques to detect converted/synthesized speech should be taken into consideration. During voice conversion and HMM-based synthesis, speech reconstruction is applied to transform a set of acoustic parameters to reconstructed speech. Hence, the identification of reconstructed speech can be used to distinguish the converted/synthesized speech from the human speech. Several related works on such identification have been reported. The equal error rates (EERs) lower than 5% of detecting reconstructed speech have been achieved. However, through the cross-database evaluations on different speech databases, we find that the EERs of several testing cases are higher than 10%. The robustness of detection algorithms to different speech databases needs to be improved. In this paper, we propose an algorithm to identify the reconstructed speech. Three different speech databases and two different reconstruction methods are considered in our work, which has not been addressed in the reported works. The high-dimensional data visualization approach is used to analyze the effect of speech reconstruction on Mel-frequency cepstral coefficients (MFCC) of speech signals. The Gaussianmixturemodel (GMM) supervectors of MFCC are used as acoustic features. Furthermore, a set of commonly-used classification algorithms are applied to identify reconstructed speech. According to the comparison among different classification methods, linear discriminant analysis (LDA)-ensemble classifiers are chosen in our algorithm. Extensive experimental results show that the EERs lower than 1% can be achieved by the proposed algorithm in most cases, outperforming the reported state-of-the-art identification techniques.
High Efficiency Video Coding (HEVC/H.265) is the latest and most efficient video compression standard, a successor to H.264/AVC (Advanced Video Coding) that delivers the perceptual quality equivalent to H.264/AVC with up to 50% bitrate savings. Video watermarking in compressed domain has gained much attention in recent years as a promising solution to copyright protection since fully decode and re-encode of the video stream is not required for both embedding and extraction of watermark bits. We propose a robust watermarking framework with a blind extraction process for HEVC encoded video. A readable watermark sequence is embedded invisibly in P-frame for better perceptual quality. Our watermarking framework imposes security and robustness by selecting appropriate blocks using a pseudo-random key and the spatio-temporal characteristics of the compressed video. We analyze the strengths of different compressed domain features for implementing our watermarking framework. We demonstrate the utility of the proposed work with experimental results. The results show that the proposed work effectively restricts the increase in video bit rate and degradation in perceptual quality. The proposed framework is robust against different image and video processing attacks.