It has been observed in the recent literature that the drift error due to watermarking degrades the visual quality of the embedded video. The existing drift error handling strategies for recent video standards such as H.264 may not be directly applicable for upcoming HD video standards (such as HEVC) due to different compression architecture. In this paper, a compressed domain watermarking scheme is proposed for H.265/HEVC bit stream which can handle drift error propagation both for intra and inter prediction process. Additionally, the proposed scheme shows adequate robustness against re-compression attack as well as common image processing attacks while maintaining the decent visual quality. A comprehensive set of experiments have been carried out to justify the efficacy of the proposed scheme over the existing literature.
When running multi-player online games on IP networks with losses and delays, the order of actions may be changed when compared to the order run on an ideal network with no delays and losses. To maintain a proper ordering of events, traditional approaches either use rollbacks to undo certain actions, or use local lags to introduce additional delays. Both may be perceived by players because their changes are beyond the just-noticeable-difference (JND) threshold. In this paper we propose a novel method for ensuring a strongly consistent completion order of actions, where strong consistency refers to the same completion order as well as the same interval between any completion time and the corresponding ideal reference completion time under no network delay. We find that small adjustments within the JND on the duration of an action would not be perceivable, as long as the duration is comparable to the network round-trip time (RTT). We utilize this property to control the vector of durations of actions and formulate the search of the vector as a multi-dimensional optimization problem. By using the property that players are generally more sensitive to the most prominent delay effect (with the highest probability of noticeability Pnotice, or the probability of correctly noticing a change when compared to the reference), we prove that the optimal solution occurs when Pnotice of the individual adjustments are equal. As this search can be done efficiently in polynomial time (<5 ms) with a small amount of space (<160 KB), the search can be done at run time to determine the optimal control. Lastly, we evaluate our approach on a popular open-source online shooting game BZFlag.
Both voice conversion and hidden Markov model (HMM)-based speech synthesis can be used to produce artificial voices of a target speaker. They have shown great negative impacts on speaker verification (SV) systems. In order to enhance the security of SV systems, the techniques to detect converted/synthesized speech should be taken into consideration. During voice conversion and HMM-based synthesis, speech reconstruction is applied to transform a set of acoustic parameters to reconstructed speech. Hence, the identification of reconstructed speech can be used to distinguish the converted/synthesized speech from the human speech. Several related works on such identification have been reported. The equal error rates (EERs) lower than 5% of detecting reconstructed speech have been achieved. However, through the cross-database evaluations on different speech databases, we find that the EERs of several testing cases are higher than 10%. The robustness of detection algorithms to different speech databases needs to be improved. In this paper, we propose an algorithm to identify the reconstructed speech. Three different speech databases and two different reconstruction methods are considered in our work, which has not been addressed in the reported works. The high-dimensional data visualization approach is used to analyze the effect of speech reconstruction on Mel-frequency cepstral coefficients (MFCC) of speech signals. The Gaussianmixturemodel (GMM) supervectors of MFCC are used as acoustic features. Furthermore, a set of commonly-used classification algorithms are applied to identify reconstructed speech. According to the comparison among different classification methods, linear discriminant analysis (LDA)-ensemble classifiers are chosen in our algorithm. Extensive experimental results show that the EERs lower than 1% can be achieved by the proposed algorithm in most cases, outperforming the reported state-of-the-art identification techniques.
High Efficiency Video Coding (HEVC/H.265) is the latest and most efficient video compression standard, a successor to H.264/AVC (Advanced Video Coding) that delivers the perceptual quality equivalent to H.264/AVC with up to 50% bitrate savings. Video watermarking in compressed domain has gained much attention in recent years as a promising solution to copyright protection since fully decode and re-encode of the video stream is not required for both embedding and extraction of watermark bits. We propose a robust watermarking framework with a blind extraction process for HEVC encoded video. A readable watermark sequence is embedded invisibly in P-frame for better perceptual quality. Our watermarking framework imposes security and robustness by selecting appropriate blocks using a pseudo-random key and the spatio-temporal characteristics of the compressed video. We analyze the strengths of different compressed domain features for implementing our watermarking framework. We demonstrate the utility of the proposed work with experimental results. The results show that the proposed work effectively restricts the increase in video bit rate and degradation in perceptual quality. The proposed framework is robust against different image and video processing attacks.
Solfège is a general technique used in the music learning process, which involves the vocal performance of melodies, regarding the time and duration of musical sounds as specified in the music score, properly associated with the meter-mimicking performed by the hand movement. This paper presents an audiovisual approach for automatic assessment of this relevant musical study practice. The proposed system combines the gesture of meter-mimicking (video information) with the melodic transcription (audio information), where the hand movement works as a metronome, controlling the time flow (tempo) of the musical piece. Thus, the meter-mimicking is used to align the music score (ground truth) with the sung melody, allowing the assessment even in time dynamic scenarios. Audio analysis is applied to achieve the melodic transcription of the sung notes and the solfège performances are evaluated by a set of Bayesian classifiers that were generated from real evaluations done by experts listeners.