ACM Transactions on

Multimedia Computing, Communications, and Applications (TOMM)

Latest Articles

Statistical Early Termination and Early Skip Models for Fast Mode Decision in HEVC INTRA Coding

In this article, statistical Early Termination (ET) and Early Skip (ES) models are proposed for fast... (more)


[December 2018]


Special issue call: "Multimodal Machine Learning for Human Behavior Analysis"Call for papers Submission deadline April 15th, 2019

Special issue call: "Computational Intelligence for Biomedical Data and Imaging". Call for papers . Submission deadline May 30th, 2019

Special issue call: "Smart Communications and Networking for Future Video Surveillance". Call for papers Submission deadline June 30th, 2019

Special issue call: "Trusted Multimedia for Smart Digital Environments". Call for papers . Submission deadline September 20th, 2019


News archive
Forthcoming Articles

Cross-domain brain CT image smart segmentation via shared hidden space transfer FCM clustering

Active Balancing Mechanism for Imbalanced Medical Data in Deep Learning based Classification Models Xx

Advanced Stereo Seam Carving by Considering Occlusions on Both Sides

Stereo image retargeting plays a significant role in the field of image processing, which aims at making major objects prominent as possible when the resolution of an image is changed, including maintaining disparity and depth information at the same time. Many researchers in relevant fields have proposed seam carving methods that generally preserve geometric consistency of the images. However, they did not take into account the regions of occlusion on both sides. We propose a solution to this problem using a new strategy of seams finding by considering occluded and occluding regions on both of the input images, and leaving geometric consistency in both images intact. We also introduced line segment detection and superpixel segmentation to further improve the quality of the images. Imaging effects are optimized in the process and visual comfort, which is also influenced by other factors, can be boosted as well.

Eigenvector-Based Distance Metric Learning for Image Classification and Retrieval

Distance metric learning has been widely studied in multifarious research fields. The mainstream approaches learn a Mahalanobis metric or learn a linear transformation. Recent related works propose learning a linear combination of base vectors to approximate the metric. In this way, fewer variables need to be determined, which is efficient when facing high-dimensional data. Nevertheless, such works obtain base vectors using additional data from related domains or randomly generate base vectors. However, obtaining base vectors from related domains requires extra time and additional data, and random vectors introduce randomness into the learning process, which requires sufficient random vectors to ensure the stability of the algorithm. Moreover, the random vectors cannot capture the rich information of the training data, leading to a degradation in performance. Considering these drawbacks, we propose a novel distance metric learning approach by introducing base vectors explicitly learned from training data. Given a specific task, we can make a sparse approximation of its objective function using the top eigenvalues and corresponding eigenvectors of a predefined integral operator on the reproducing kernel Hilbert space. Because the process of generating eigenvectors simply refers to the training data of the considered task, our proposed method does not require additional data and can reflect the intrinsic information of the input features. Furthermore, the explicitly learned eigenvectors do not result in randomness, and we can extend our method to any kernel space without changing the objective function. We only need to learn the coefficients of these eigenvectors, and the only hyperparameter that we need to determine is the number of eigenvectors that we utilize. Additionally, an optimization algorithm is proposed to efficiently solve this problem. Extensive experiments conducted on several datasets demonstrate the effectiveness of our proposed method.

Spatial Structure Preserving Feature Pyramid Network for Semantic Image Segmentation

Recently, progress on semantic image segmentation is substantial, benefitting from the rapid development of Convolutional Neural Networks (CNNs). Semantic image segmentation approaches proposed lately have been mostly based on Fully convolutional Networks (FCNs). However, these FCN-based methods use large receptive fields and too many pooling layers to depict the discriminative semantic information of the images. These operations often cause low spatial resolution inside deep layers, which leads to spatially fragmented prediction. To address this problem, we exploit the inherent multi-scale and pyramidal hierarchy of deep convolutional networks to extract the feature maps with different resolutions, and take full advantages of these feature maps via a gradually stacked fusing way. Specifically, for two adjacent convolutional layers, we upsample the features from deeper layer with stride of 2, and then stack them on the features from shallower layer. Then, a convolutional layer with kernels of 1 × 1 is followed to fuse these stacked features. The fused feature remains the spatial structure information of the image, meanwhile it owns strong discriminative capability for pixel classification. Additionally, to further preserve the spatial structure information and regional connectivity of the predicted category label map, we propose a novel loss term for the network. In detail, two graph model based spatial affinity matrixes are proposed, which are used to depict the pixel-level relationships in the input image and predicted category label map respectively, then their cosine distance is backward propagated to the network. The proposed architecture, called spatial structure preserving feature pyramid network (SSPFPN), significantly improves the spatial resolution of the predicted category label map for semantic image segmentation.

Deep Scalable Supervised Quantization by Self-Organizing Map

Approximate Nearest Neighbor (ANN) search is an important research topic in multimedia and computer vision fields. In this paper, we propose a new deep supervised quantization methods by Self-Organizing Map to address this problem. Our method integrates the Convolutional Neural Networks (CNN) and Self-Organizing Map (SOM) into a unified deep architecture. The overall training objective optimizes supervised quantization loss as well as classification loss. With the supervised quantization objective, we minimize the differences on the maps between similar image pairs, and maximize the differences on the maps between dissimilar image pairs. By optimization, the deep architecture can simultaneously extract deep features and quantize the features into the suitable nodes in the self-organizing map. To make the proposed deep supervised quantization method scalable for large datasets, instead of constructing larger self-organizing map, we propose to divide the input space into several subspaces, and construct self-organizing map in each subspace. The self-organizing maps in all the subspaces implicitly construct a large self-organizing map, which costs less memory and training time than directly constructing a self-organizing map with equal size. The experiments on several public standard datasets prove the superiority of our approaches over the existing ANN search methods. Besides, as a byproduct, our deep architecture can be directly applied to classification task and visualization with little modification, and promising performances are demonstrated on these tasks in the experiments.

HGAN: Holistic Generative Adversarial Networks for 2D Image-based 3D Object Retrieval

In this paper, we propose a novel method to handle the 2D image-based 3D object retrieval problem. First, we extract a set of virtual views to represent each 3D object. And, the soft-attention model is utilized to find the weight of each view to select one characteristic view for each 3D object. Second, we propose a novel Holistic Generative Adversarial Networks (HGAN) to solve the cross-domain feature representation problem and make the feature space of virtual characteristic view more inclined to the feature space of real picture. This will effectively mitigate the distribution discrepancy across the 2D image domain and 3D object domain. Finally, we utilize the generative model of HGAN to obtain the ``virtual real image'' of each 3D object and make the characteristic view of the 3D object and real picture the same feature space for retrieval. To demonstrate the performance of our approach, We set up a new dataset that includes pairs of 2D images and 3D objects, where the 3d objects is based on the ModelNet40 dataset. The experimental results demonstrate the superiority of our proposed method over the state-of-the-art methods.

Learning Discriminative Sentiment Representation from Strongly- and Weakly-Supervised CNNs

Visual sentiment analysis is getting increasing attention with the rapidly growing amount of images uploaded to social websites. Learning rich visual representations often requires training deep Convolutional Neural Networks (CNNs) on massive manually labeled data, which is expensive or scarce especially for a subjective task like visual sentiment analysis. Meanwhile, a large quantity of social image is quite available yet noisy by querying social network using the sentiment categories as keywords, where a various type of images related to the specific sentiment can be easily collected. In this paper, we propose a multiple kernel network (MKN) for visual sentiment recognition, which learns representation from strongly- and weakly- supervised CNNs. Specifically, the weakly-supervised deep model is trained using the large-scale data from social images, while the strongly-supervised deep model is fine-tuned on the affective datasets that are manually labeled. We employ the multiple kernel scheme on the multiple layers of these CNNs, which can automatically select the discriminative representation by learning a linear combination of a set of predefined kernels. In addition, we introduce a large-scale dataset collected from popular comics of various countries, e.g., America, Japan, China and France, which consists of 11,821 images with various artistic styles. Experimental results show that MKCNN achieves consistent improvements over the state-of-the-art methods on the public affective datasets as well as the newly established comics dataset.

AB-LSTM: Attention-Based Bidirectional LSTM Model for Scene Text Detection

Detection of scene text in arbitrary shapes is a challenging task in the field of computer vision. Most existing scene text detection methods exploit the rectangle/quadrangular bounding box to denote the detected text, which fails to accurately fit text with arbitrary shapes, such as the curved text. In addition, recent progress on scene text detection has benefited from Fully Convolutional Network. Text cues contained in multi-level convolutional features are complementary for detecting scene text objects. How to explore these multi-level features is still an open problem. In order to tackle the above issues, we propose an Attention-based Bidirectional Long Short-Term Memory (AB-LSTM) model for scene text detection. First of all, word stroke regions (WSRs) and text center blocks (TCBs) are extracted by two AB-LSTM models, respectively. Then, the union of WSRs and TCBs are used to represent text objects. To validate the effectiveness of the proposed method, we perform experiments on four public benchmarks: CTW1500, Total-text, ICDAR2013, and MSRA-TD500, and compare it with existing state-of-the-art methods. Experiment results demonstrate that the proposed method achieves competitive results, and well handles scene text with arbitrary shapes (horizontal, oriented, and curved form).

Rethinking the Combined and Individual Orders of Derivative of States for Differential Recurrent Neural Networks

Due to the special gating schemes of Long Short-Term Memory (LSTM), LSTMs have shown greater potential to process complex sequential information than the traditional Recurrent Neural Network (RNN). The conventional LSTM, however, fails to take into consideration the impact of salient spatio-temporal dynamics present in the sequential input data. This problem was first addressed by the differential Recurrent Neural Network (dRNN), which uses a differential gating scheme known as Derivative of States (DoS). DoS uses higher orders of internal state derivatives to analyze the change in information gain caused by the salient motions between the successive frames. The weighted combination of several orders of DoS is then used to modulate the gates in dRNN. While each individual order of DoS is good at modeling a certain level of salient spatio-temporal sequences, the sum of all the orders of DoS could distort the detected motion patterns. To address this problem, we propose to control the LSTM gates via individual orders of DoS. To fully utilize the different orders of DoS, we further propose to stack multiple levels of LSTM cells in an increasing order of state derivatives. The proposed model progressively builds up the ability of the LSTM gates to detect salient dynamical patterns in deeper stacked layers modeling higher orders of DoS, and thus the proposed LSTM model is termed deep differential Recurrent Neural Network ($d^2$RNN). The effectiveness of the proposed model is demonstrated on two publicly available human activity datasets: NUS-HGA and Violent-Flows. The proposed model outperforms both LSTM and non-LSTM based state-of-the-art algorithms.

Multi-scale Supervised Attentive Encoder-Decoder Network for Crowd Counting

Crowd counting is a popular topic with widespread applications. Currently, the biggest challenge to crowd counting is large-scale variation in objects. In this paper, we focus on overcoming this challenge by proposing a novel Attentive Encoder-Decoder Network (AEDN), which is supervised on multiple feature scales to conduct crowd counting via density estimation. This work has three main contributions. First, we augment the traditional encoder-decoder architecture with our proposed residual attention blocks, which, beyond skip-connected encoded features, further extend the decoded features with attentive features. AEDN is better at establishing long-range dependencies between the encoder and decoder, therefore promoting more effective fusion of multi-scale features for handling scale-variations. Second, we design a new KL-divergence based distribution loss to supervise the scale-aware structural differences between two density maps, which complements the pixel-isolated MSE loss and better optimizes AEDN to generate high-quality density maps. Third, we adopt a multi-scale supervision scheme, such that multiple KL divergences and MSE losses are deployed at all decoding stages, providing more thorough supervisions for different feature scales. Extensive experimental results on four public datasets, including ShanghaiTech Part A, ShanghaiTech Part B, UCF- CC-50 and UCF-QNRF, reveal the superiority and efficacy of the proposed method, which outperforms most state-of-the-art competitors.

U-Net Conditional GANs for Photo-Realistic and Identity-Preserving Facial Expression Synthesis

Facial expression synthesis is a challenging task since the expression changes are highly non-linear, and depend on the facial appearance. Person identity should also be well preserved in the synthesized face. In this paper, we present a novel U-Net Conditional Generative Adversarial Network (UC-GAN) for facial expression generation. U-Net helps retain the property of the input face, including the identity information and facial details. Category condition is added to the U-Net model so that one-to-many expression synthesis can be reached simultaneously. We also design constraints for identity preserving during facial expression synthesis to further guarantee that the identity of the input face can be well preserved in the generated facial image. Specifically, we pair the generated output with condition image of other identities for the discriminator, so as to encourage it to learn the distinctions both between the synthesized and natural images and between input and other identities, which can help improve its discriminating ability. Additionally, we utilize the triplet loss to maintain the generated facial images closer to the same identity person by imposing a margin between the positive pairs and negative pairs in feature space, in which face feature vector are extracted from discriminator. Both qualitative and quantitative evaluations are conducted on the Oulu-CASIA, RaFD and KDEF datasets, and the results show that our method can generate faces with natural and realistic expressions while preserving the identity information.

Chunk duration-aware SDN-assisted DASH

Although Dynamic Adaptive Streaming over HTTP (DASH) is the pillar of multimedia content delivery mechanisms, its purely client-based adaptive video bitrate mechanisms have quality of experience (QoE) fairness and stability problems in the existence of multiple DASH clients and highly fluctuating background traffic on the same shared bottleneck link. Varying chunk duration among different titles of multiple video providers exacerbates this problem. With the help of the global network view provided by Software-Defined Networking paradigm (SDN), we propose a centralized joint optimization module-assisted adaptive video bitrate mechanism which takes diversity of chunk sizes among different content into account. Our system collects possible video bitrate levels and chunk duration from DASH clients and simply calculates the optimal video bitrates per client based on the available capacity and chunk duration of each client's selected content while not invading users' privacy. By continuously following the background traffic flows, it asynchronously updates the target video bitrate levels to avoid both buffer stall events and network under-utilization issues rather than bandwidth slicing which brings about scalability problems in practice. It also guarantees fair start-up delays for video sessions with various chunk duration. Our experiments clearly show that our proposed approach considering diversity of chunk duration and background traffic fluctuations can significantly provide a better and fair QoE in terms of SSIM-based video quality and start-up delay compared to both purely client-based and state-of-the-art SDN-based adaptive bitrate mechanisms.

Internet of Things Based Trusted Hypertension Management App Using Mobile Technology

App for hypertension management is developed. The web-roadmap technology is used to develop the app; this technology has five steps namely are planning, analysis, design, implementation and evaluation. The hypertension management app is tested with patients, with hypertension (N=56). Their medication possession ratio is calculated before and after using of hypertension management app for the period of five weeks. The total number of participants participated is 56, in 56 participants 45 participants have taken the medication adherence. The medical possession ratio is calculated using morisky scale, and there is an improvement in the patients? health after the usage of Hyperion management app with (p=.001). The calculated score of usefulness is 3.9 in 5. The satisfaction of user is calculated after using the hypertension app for the different process like recording of blood pressure is 4.5, recording of medication ratio is 4.0, score for sending the data is 3.4, for alerting process is 4.3 and for process of alerting about medication process is 5. This paper shows that mobile app for hypertension using clinical practice guidelines is effective in improving the patients help.

A Decision Support System with Intelligent Recommendation for Multi-Disciplinary Medical Treatment

Recent years have witnessed an emerging trend for improving disease treatment by forming multi-disciplinary medical teams. The collaboration among specialists from multiple medical domains has been shown to be significantly helpful for designing comprehensive and reliable regimens, especially for incurable diseases. Although this kind of multi-disciplinary treatment has been increasingly adopted by healthcare providers, a new challenge has been introduced to the decision-making process ? how to efficiently and effectively develop final regimens by searching for candidate treatments and considering inputs from every expert. In this paper, we present a sophisticated decision support system called MdtDSS (a decision support system (DSS) for multi-disciplinary treatment (Mdt)), which is particularly developed to guide the collaborative decision-making in multi-disciplinary treatment scenarios. The system integrates a recommender system which aims to search for personalized candidates from a large-scale high-quality regimen pool, and a voting system which helps collect feedback from multiple specialists without potential bias. Our decision support system optimally combines machine intelligence and human experience and helps medical practitioners make informed and accountable regimen decisions. We deployed the proposed system in a large hospital in Shanghai, China, and collected real-world data on large-scale patient cases. The evaluation shows that the proposed system achieves outstanding results in terms of high-quality multi-disciplinary treatment.

Textual Entailment based Figure Summarization for Biomedical Articles

The current paper proposes a novel approach (FigSum++) for automatic figure summarization in biomedical scientific articles using a multi-objective evolutionary algorithm. The problem is treated as a binary optimization problem where relevant sentences in the summary for a given figure are selected based on various sentence scoring features: the textual entailment score between sentences in the summary and figure{\rq}s caption, the number of sentences referring to figure, semantic similarity between sentences and {figure\rq s} caption, the number of overlapping words between sentences and figure{\rq}s caption etc. These features are optimized simultaneously using multi-objective binary differential evolution (MBDE). MBDE consists of a set of solutions and each solution represents a subset of sentences to be selected in the summary. MBDE generally uses single DE variant, but, here, ensemble of two different DE variants measuring diversity among solutions and convergence towards global optimal solution, respectively, is employed for efficient search. Usually, in any summarization system, diversity amongst sentences (called as anti-redundancy) in the summary is a very critical feature and it is calculated in terms of similarity (like cosine similarity) among sentences. In this paper, a new way of measuring diversity in terms of textual entailment is proposed. To represent the sentences of the article in the form of numeric vectors, recently proposed, BioBERT, a pre-trained language model in biomedical text mining is utilized. An ablation study has also been presented to determine the important of different objective functions. For evaluation of the proposed technique, two benchmark biomedical datasets containing 91 and 84 figures, respectively, are considered. Our proposed system obtains 5% and 11% improvements in terms of F-measure metric over two datasets, respectively, in comparison to the state-of-the-art.

Paillier Cryptosystem based Mean Value Computation for Encrypted Domain Image Processing Operations

Due to its large storage facility and high-end computing capability, cloud computing has received great attention as a huge amount of personal multimedia data and computationally expensive tasks can be outsourced to the cloud. However, the cloud being third-party semi-trusted, are prone to privacy risks. Signal processing in the encrypted domain (SPED) has aroused as a new research paradigm on privacy-preserving processing over outsourced data by semi-trusted cloud. In this paper, we propose a solution for non-integer mean value computation in the homomorphic encrypted domain without any interactive protocol between the client and the service provider. Using the proposed solution, various image processing operations such as local smoothing filter, un-sharp masking and histogram equalization can be performed in the encrypted domain at the cloud server without any privacy concerns. Our experimental results from standard test images reveal that these image processing operations can be performed without pre-processing, without client-server interactive protocol and without any error between the encrypted domain and the plain domain.

Adaptive Chunklets and AQM for Higher Performance Content Streaming

Commercial streaming services such as Netflix and YouTube use proprietary HTTP-based adaptive streaming (HAS) techniques to deliver content to consumers worldwide. MPEG recently developed Dynamic Adaptive Streaming over HTTP (DASH) as a unifying standard for HAS-based streaming. In DASH systems, streaming clients employ adaptive bitrate (ABR) algorithms to maximise user Quality of Experience (QoE) under variable network conditions. In a typical Internet-enabled home, video streams have to compete with diverse application flows for the last-mile Internet Service Provider (ISP) bottleneck capacity. Under such circumstances, ABRs will only act upon the fraction of the network capacity that is available, leading to possible QoE degradation. We have previously proposed chunklets as an approach orthogonal to ABR which uses parallel connections for intra-video chunk retrieval. Chunklets effectively make more bandwidth available for ABRs in the presence of cross-traffic, especially in environments where Active Queue Management (AQM) schemes such as PIE and FQ-CoDel are deployed. However, chunklets consume valuable server/middlebox resources which typically handle hundreds of thousands requests/connections per-second. In this paper, we propose `adaptive chunklets' -- a novel chunklet enhancement that dynamically tunes the number of concurrent connections. We demonstrate that the combination of adaptive chunkleting and FQ-CoDel is the most effective strategy. Our experiments show that adaptive chunklets can reduce the number of connections by almost 35% and consume almost 11% less bandwidth than fixed chunklets while providing the same QoE.

A Pseudo-likelihood Approach For Geo-localization of Events From Crowd-sourced Sensor-Metadata

Events such as live concert, protest march, an exhibition are often video recorded by many people at the same time, typically using smartphone devices. In this work, we address the problem of geo-localizing such events from crowd generated data. Traditional approaches for solving such a problem using multiple video sequences of the event would require highly complex co

Soul Dancer: Emotion-based Human Action Generation

Body language is one of the most common ways of expressing human emotion. In this paper, we make the first attempt to generate action video with a specific emotion from a single person image. The task of emotion based action generation (EBAG) can be defined as: provided with a type of emotion and a human image with full body, action video in which the person of the source image expressing the given type of emotion can be generated. We divide the task into two parts and propose a two-stage framework to generate action video with emotion expressing. In the first stage, we propose an RNN based LS-GAN for translating the emotion to a pose sequence. In the second stage, we generate the target video frames according to the three inputs including the source pose and the target pose as the motion information and source image as the appearance reference by using conditional GAN model with online training strategy. Our framework produces the pose sequence and transforms the action independently, which underlines the fundamental role that the high-level pose feature plays in generating action video with a specific emotion. The proposed method has been evaluated on the "Soul Dancer" dataset which is built for action emotion analysis and generation. The experimental results demonstrate that our framework can effectively solve the emotion-based action generation task. However, the gap in the details of the appearance between the generated action video and the real-world video still exists, which indicates that the emotion-based action generation task has great research potential.

Cell Nuclei Classification In Histopathological Images using Hybrid OLConvNet

Computer-aided histopathological image analysis for cancer detection is a major research challenge in the medical domain. Automatic detection and classification of nuclei for cancer diagnosis impose a lot of challenges in developing state of the art algorithms due to the heterogeneity of cell nuclei and data set variability. Recently, a multitude of classification algorithms has used complex deep learning models for their dataset. However, most of these methods are rigid and their architectural arrangement suffers from inflexibility and non-interpretability. In this research article, we have proposed a hybrid and flexible deep learning architecture OLConvNet that integrates the interpretability of traditional object-level features and generalization of deep learning features by using a shallower Convolutional Neural Network (CNN) named as CNN3L. CNN3L reduces the training time by training fewer parameters and hence eliminating space constraints imposed by deeper algorithms. We used F1-score and multiclass Area Under the Curve (AUC) performance parameters to compare the results. To further strengthen the viability of our architectural approach, we tested our proposed methodology with state of the art deep learning architectures AlexNet, VGG16 and VGG19 as backbone networks. After a comprehensive analysis of classification results from all four architectures, we observed that our proposed model works well and perform better than contemporary complex algorithms.

Subtitle Region Selection of S3D Images in Consideration of Visual Discomfort and Viewing Habit

Emotion Recognition with Multi-hypergraph Neural Networks Combining Multimodal Physiological Signals

Emotion recognition by physiological signals is an effective way to discern the inner state of human beings and therefore has been widely adopted in user-centered work, such as driver status monitoring, telemedicine and other tasks. The majority of present studies regarding emotion recognition are devoted to exploration of the relationship among emotion and physiological signals with subjects seen as a whole. However, given some features of the natural process of emotional expression, it is an urgent task to characterize latent correlations among multimodal physiological signals and pay attention to the influence of individual differences to exploit associations among individual subjects. To tackle the problem, it is proposed in the paper to establish multi-hypergraph neural networks (MHGNN) to recognize emotions by physiological signals. The method constructs multi-hypergraph structure, in which one hypergraph is established by one type of physiological signals to formulate correlations among different subjects. Each one of the vertices in a hypergraph stands for one subject with a description of its related stimuli, and the hyperedges serve as representation of the connections among the vertices. With the multi-hypergraph structure of the subjects, emotion recognition is transformed into classification of vertices in the multi-hypergraph structure. Experimental results with the DEAP dataset and ASCERTAIN dataset demonstrate that the proposed method outperforms the state-of-the-art methods at present. The contrast experiments prove that MHGNN is capable of describing real process of biological response with much higher precision.

Synthesizing facial photometries and corresponding geometries using generative adversarial networks

Artificial data synthesis is currently a well studied topic with useful applications in data science, computer vision, graphics and many other fields. Generating realistic data is especially challenging since human perception is highly sensitive non-realistic appearance. Recent advances in GAN architecture and training procedures have driven the capabilities of synthetic data generation to new heights of realism. These successful models however, are tuned mostly for use with regularly sampled data such as images, audio and video. Despite the wide success on these types of media, applying the same tools to geometric data poses a far greater challenge which is still a hot topic of debate within the academic community. The lack of intrinsic parametrization inherent to geometric objects prohibits the direct use of convolutional filters, a main building block of today's machine learning systems. In this paper we propose a new method for generating realistic human facial geometries coupled with overlayed textures. We circumvent the parametrization issue by imposing a global mapping from our data to the unit rectangle. This mapping enables the representation of our geometric data as regularly sampled 2D images. We further discuss how to design such a mapping in order to control the mapping distortion and conserve area within the mapped image. By representing geometric textures and geometries as images, we are able to use advanced GAN methodologies in order to generate new geometries. We address the often neglected topic of relation between texture and geometry and propose to use this correlation in order to match between generated textures and their corresponding geometries. In addition we widen the scope of our discussion and offer a new method for training GAN models on partially corrupted data. Finally, we provide empirical evidence to support our claim that our generative model is able to produce examples of new people which do not exist within the training data while maintaining high realism and texture detail, two traits that are often at odds.

Efficient Face Alignment with Fast Normalization and Contour Fitting Loss

Face Alignment is a key component of numerous face analysis tasks. In recent years, most existing methods have focused on designing high-performance face alignment systems and paid less attention to efficiency. However more and more face alignment systems are applied on low-cost devices, such as mobile phones. In this paper, we design an efficient light-weight CNN-based regression framework with a novel contour fitting loss, achieving competitive performance with other state-of-art methods. We discover that the maximum error exists in the face contour, where landmarks do not have distinct semantic positions, and thus are randomly labeled along the face contours in training data. To address this problem, we reshape the common L2 loss, to dynamically adjust the regression targets during training network, so that the network can learn more accurate semantic meanings of the contour landmarks and achieve better localization performance. Meanwhile, we systematically analyze the effects of pose variations in face alignment task and design an efficient framework with a Fast Normalization Module (FNM) and a lightweight alignment module(LAM), which fast normalizes the in-plane rotation and efficiently localize the landmarks. Our method achieves competitive performance with state of the arts on 300W benchmark and the speed is significant faster than other CNN-based approaches.

A Unified Tensor-based Active Appearance Model

Appearance variations result in many difficulties in face image analysis. To deal with this challenge, we present a Unified Tensor-based Active Appearance Model (UT-AAM) for jointly modelling the geometry and texture information of 2D faces. For each type of face information, namely shape and texture, we construct a unified tensor model capturing all relevant appearance variations. This contrasts with the variation-specific models of the classical tensor AAM. To achieve the unification across pose variations, a strategy for dealing with self-occluded faces is proposed to obtain consistent shape and texture representations of pose-varied faces. In addition, our UT-AAM is capable of constructing the model from an incomplete training dataset, using tensor completion methods. Last, we use an effective cascaded-regression-based method for UT-AAM fitting. With these advancements, the utility of UT-AAM in practice is considerably enhanced. As an example, we demonstrate the improvements in training facial landmark detectors through the use of UT-AAM to synthesise a large number of virtual samples. Experimental results obtained using the Multi-PIE and 300-W face datasets demonstrate the merits of the proposed approach.

Random Forest with Self-paced Bootstrap Learning in Lung Cancer Prognosis

Training gene expression data with supervised learning approaches, it has the potential to decrease cancer death rates by developing prediction strategies for lung cancer treatment, but the samples of gene features still involved lots of noises. In this study, we presented a random forest with self-paced learning bootstrap for improvement of lung cancer prognosis and classification based on gene expression data. To be specific, we proposed an ensemble learning with random forest approach to improving the model classification performance by selecting multi-classifiers. We also investigated the sampling strategy by gradually embedding from high- to low-quality samples by the self-paced learning. The results based on five public lung cancer datasets showed that our proposed method could select significant genes and improve classification performance compared to existing approaches. We believe that our proposed method has the potential to assist doctors for gene selections and lung cancer prognosis.

Sequential Cross-Modal Hashing Learning via Multi-scale Correlation Mining

Cross-modal hashing aims to map heterogeneous multimedia data into a common Hamming space through hash function, and achieves fast and flexible cross-modal retrieval. Most existing cross-modal hashing methods learn hash function by mining the correlation among multimedia data, but ignore the important property of multimedia data: Each modality of multimedia data has features of different scales, such as texture, object and scene features in the image, which can provide complementary information for boosting retrieval task. The correlations among the multi-scale features are more abundant than the correlations between single features of multimedia data, which reveal finer underlying structure of the multimedia data and can be used for effective hashing function learning. Therefore we propose Multi-scale Correlation Sequential Cross-modal Hashing (MCSCH) approach, and its main contributions can be summarized as follows: 1) Multi-scale feature guided sequential hashing learning method is proposed to share the information from features of different scales through a RNN based network and generate the hash codes sequentially. The features of different scales are used to guide the hash codes generation, which can enhance the diversity of the hash codes and weaken the influence of errors in specific features, such as false object features caused by occlusion. 2) Multi-scale correlation mining strategy is proposed to align the features of different scales in different modalities and mine the correlations among aligned features. These correlations reveal finer underlying structure of multimedia data and can help to boost the hash function learning. 3) Correlation evaluation network evaluates the importance of the correlations to select the worthwhile correlations, and increases the impact of these correlations for hash function learning. Experiments on two widely-used 2-media datasets and a 5-media dataset demonstrate the effectiveness of our proposed MCSCH approach.

Smart Diagnosis: A Multiple-Source Transfer TSK Fuzzy System for EEG Seizure Identification

In order to effectively identify Electroencephalogram (EEG) signals in multiple source domains, a transductive multiple source transfer learning method called as MS-TL-TSK is proposed, which combines together multiple source transfer learning and manifold regularization (MR) learning mechanisms into Takagi-Sugeno-Kang (TSK) fuzzy system. Specifically, the advantages of MS-TL-TSK include: (1) By evaluating the significant of each source domain, a flexible domain weighting index is presented; (2) Using the theory of sample transfer learning, a re-weighting strategy is presented to weigh the prediction of unknown samples in target domain and the output of source prediction functions; (3) By taking into account the MR term, the manifold structure of the target domain is effectively maintained in the proposed system; (4) By inheriting the interpretability of TSK fuzzy system (TSK-FS), MS-TL-TSK has good interpretability that would be understandable by human beings(domain experts) for identifying EEG signals. The effectiveness of the proposed fuzzy system is demonstrated on several EEG multiple source transfer learning problems.

Action Recognition using form and motion modalities

Action recognition has attracted increasing interest in computer vision due to its potential applications in many vision systems. One of the main challenges in action recognition is to extract powerful features from videos. Most existing approaches exploit either hand-crafted techniques or learning based methods to extract features from videos. However, these methods mainly focus on extracting the dynamic motion features, which ignore the static form features. Therefore, these methods cannot fully capture the underlying information in videos accurately. In this paper, we propose a novel feature representation method for action recognition, which exploits hierarchal sparse coding to learn the underlying features from videos. The learned features characterise the form and motion simultaneously and therefore provide more accurate and complete feature representation. The learned form and motion features are considered as two modalities, which are used to represent both the static and motion features. These modalities are further encoded into a global representation via a pair-wise dictionary learning and then fed to a SVM classifier for action classification. Experimental results on several challenging datasets validate the proposed method is superior to several state-of-the-art methods.

Characterizing Subtle Facial Movements via Riemannian Manifold

Facial movements play a crucial role for human beings to communicate and express emotions since they not only transmit communication contents but also contribute to ongoing processes of emotion-relevant information. Characterizing subtle facial movements from videos is one of the most intensive topics in computer vision research. It is, however, challenging because that 1) the intensity of subtle facial muscle movement is usually low; 2) the duration may be transient, and 3) datasets containing spontaneous subtle movements with reliable annotations are painful to obtain and often of small sizes. This paper is targeted at addressing these problems for characterizing subtle facial movements from both the aspects of motion elucidation and description. Firstly, we propose an efficient method for elucidating hidden and repressed movements to make them easier to get noticed. We explore the feasibility of linearizing motion magnification and temporal interpolation, which has been obscured by the implementation of existing methods. We then propose a consolidated framework, termed MOTEL, to expand temporal duration and amplify subtle facial movements simultaneously. Secondly, we make our contribution to dynamic description. One major challenge is how to capture the intrinsic temporal variations caused by movements and omit extrinsic ones caused by different individuals and various environments. To diminish the influences of such diversity, we propose to characterize the dynamics of short-term movements via the differences between points on the tangent spaces to the manifolds, rather than the points themselves. We then significantly relax the trajectory-smooth assumption of the conventional manifold based trajectory modeling method and model longer-term dynamics using statistical observation model within the sequential inference approaches. Finally, we incorporate the tangent delta descriptor with the sequential inference approaches and present a hierarchical representation architecture to cover the period of the facial movements occurrence. The proposed motion elucidation and description approach is validated by a series of experiments on publicly available datasets in the example tasks of micro-expression recognition and visual speech recognition.

Steganographer Detection via Multi-Scale Embedding Probability Estimation

Steganographer detection aims to identify the guilty user, who utilizes steganographic methods to hide secret information in the spread multimedia data, especially image data, from a large amount of innocent users on the social networks. True embedding probability map illustrates the probability distribution of embedding secret information in the corresponding images by specific steganographic methods and settings, which has been successfully used as the guidance for content-adaptive steganographic and steganalytic methods. Unfortunately, in real-world situation, the detailed steganographic settings adopted by the guilty user cannot be known in advance. It thus becomes necessary to propose an automatic embedding probability estimation method. In this paper, we propose a novel content-adaptive steganographer detection method via embedding probability estimation. The embedding probability estimation is firstly formulated as a learning-based saliency detection problem and the multi-scale estimated map is then integrated into the CNN to extract steganalytic features. Finally, the guilty user is detected via an efficient Gaussian vote method with the extracted steganalytic features. The experimental results prove that the proposed method is superior to the state-of-the-art methods in both spatial and frequency domains.

DenseNet-201 based deep neural network with composite learning factor and precomputation for multiple sclerosis classification

(Aim) Multiple sclerosis is a neurological condition that may cause neurologic disability among. To identify multiple sclerosis more accurately, this paper proposed a new transfer-learning based approach. (Method) DenseNet-121, DensetNet-169, and DenseNet-201 neural networks were compared. Besides, we proposed to use a composite learning factor (CLF) that assigns different learning factor to three types of layers: early frozen layers, middle layers, and late newly-replaced layers. How to allocate layers into those three layers remain a problem. Hence, four transfer learning setting (viz., Setting A, B, C, and D) were tested and compared. Precomputation method was utilized to reduce storage burden and accelerate the program. (Results) We observed that DenseNet-201-D can achieve the best performance. The sensitivity, specificity, and accuracy of DenseNet-201-D was 98.27± 0.58, 98.35± 0.69, and 98.31± 0.53, respectively. (Conclusion) Our method gives better performances than state-of-the-art approaches. Furthermore, this composite learning rate gives superior results to traditional simple learning factor (SLF) strategy.

Efficient Image Hashing with Invariant Vector Distance for Copy Detection

Image hashing is an efficient technique of multimedia security for image content protection. It maps an image into a content-based compact code for denoting the image itself. While most existing algorithms focus on improving the classification between robustness and discrimination, little attention has been paid to geometric invariance under normal digital operations, and therefore results in quite fragile to geometric distortion when applied in image copy detection. In this paper, a novel effective image hashing method is proposed based on invariant vector distance in both spatial domain and frequency domain. First, the image is preprocessed by some joint operations to extract robust features. Then, the preprocessed image is randomly divided into several overlapping blocks under a secret key, and two different feature matrices are separately obtained in the spatial domain and frequency domain through invariant moment and low frequency discrete cosine transform coefficients. Furthermore, the invariant distances between vectors in feature matrices are calculated and quantified to form a compact hash code. We conduct various experiments to demonstrate that the proposed hashing not only reaches good classification between robustness and discrimination, but also resists most geometric distortion in image copy detection. In addition, both receiver operating characteristics curve comparisons and mean average precision in copy detection clearly illustrate that the proposed hashing method outperforms state-of-the-art algorithms.

AMIL: Adversarial Multi-Instance Learning for Human Pose Estimation

Human pose estimation has an important impact on a wide range of applications from human-computer interface to surveillance and content-based video retrieval. For human pose estimation, joint obstructions and overlapping upon human bodies result in departed pose estimation. To address these problems, by integrating priors of the structure of human bodies, we present an innovative structure-aware network to discreetly consider such priors during the training of the network. Typically, learning such constraints is a challenging task. Instead, we propose generative adversarial networks as our learning model in which we design two residual multiple instance learning (MIL) models with the identical architecture, one used as the generator and the other one used as the discriminator. The discriminator task is to distinguish the actual poses from the fake ones. If the pose generator generates the results that the discriminator is not able to distinguish from the real ones, the model successfully learns the priors. In the proposed model, the discriminator differentiates the ground-truth heatmaps from the generated ones, and later the adversarial loss back-propagates to the generator. Such procedure assists the generator to learn reasonable body configurations and is proved to be advantageous to improve the prediction accuracy. Meanwhile, we propose a novel function for MIL. It is an adjustable structure for both instance selection and modeling to appropriately pass the information between instances in a single bag. In the proposed residual MIL neural network, the pooling action frequently updates the instance contribution to its bag. The proposed adversarial residual multi-instance neural network that is based on pooling has been validated on two datasets for the human pose estimation task and successfully outperforms the other state-of-arts models.

Features-Enhanced Multi-attribute Estimation with Convolutional Tensor Correlation Fusion Network

To achieve robust facial attribute estimation, a hierarchical prediction system referred to as tensor correlation fusion network (TCFN) is proposed for attribute estimation. The system includes feature extraction, correlation excavation among facial attribute features, score fusion, and multi-attribute prediction. Subnetworks (Age-Net, Gender-Net, Race-Net, and Smile-Net) are used to extract corresponding features while Main-Net extracts features not only from input image but also from corresponding pooling layers of subnetworks. Dynamic tensor canonical correlation analysis (DTCCA) is proposed to explore the correlation of different targets' features in the F7 layers. Then, for binary classifications of gender, race, smile, corresponding robust decisions are achieved by fusing results of subnetworks with these of TCFN while for age prediction, facial image into one of age groups, and then ELM regressor performs the final age estimation. Experimental results on benchmarks with multiple face attributes (MORPH-II, Adience Benchmark datasets, LAP-2016, and CelebA) show that the proposed approach has superior performance compared to state of the art.

Affective Content-aware Adaptation Scheme on QoE Optimization of Adaptive Streaming over HTTP

The paper presents a novel affective content-aware adaptation scheme (ACAA) to optimize QoE for adaptive video streaming over HTTP. Most of existing HTTP-based adaptive streaming schemes conduct video bit-rate adaptation based on an estimation of available network resources, which ignored user preference on affective content (AC) embedded in video data streaming over the network. Since the personal demands to AC is very different among all viewers, to satisfy individual affective demand is critical to improve the QoE in the commercial video services. However, the results of video affective analysis can?t be applied into a current adaptive streaming scheme directly. Considering the AC distributions in user?s viewing history and all streaming segments, the AC relevancy can be inferred as an affective metric for the AC related segments. Further, we have proposed an ACAA scheme to optimize QoE for user desired affective content while taking into account both network status and affective relevancy. We have implemented the ACAA scheme over a realistic traces based evaluation and compared its performance in terms of network performance, Quality of Experience (QoE) with that of Probe and Adaptation (PANDA), buffer-based adaptation (BBA) and Model Predictive Control (MPC). Experimental results show that ACAA can preserve available buffer time for future being delivered affective content preferred by viewer?s individual preference, so as to achieve better QoE in affective contents than those normal contents while remain the overall QoE to be satisfactory.

Machine learning techniques for the diagnosis of Alzheimer's disease: A review

Alzheimer's disease is an incurable neurodegenerative disease primarily affecting the elderly population. Efficient automated techniques are needed for early diagnosis of Alzheimers. Many novel approaches are proposed by researchers for classification of Alzheimer's disease. However, to develop more efficient learning techniques, better understanding of the work done on Alzheimers is needed. Here, we provide a review on 165 papers from 2001-2019 using various feature extraction and machine learning techniques. The machine learning techniques are surveyed under three main categories: support vector machine (SVM), artificial neural network (ANN), and deep learning (DL) and ensemble methods. We present a detailed review on these three approaches for Alzheimers with possible future directions.

Video Retrieval with Similarity-Preserving Deep Temporal Hashing

This paper aims to develop an efficient Content-based Video Retrieval (CBVR) system by hashing videos into short binary codes. It is an appealing research topic with increasing demands in such an Internet era when massive videos are uploaded to the website every day. The main challenge of this task is how to discriminatively map video sequences to compact hash codes by preserving original similarity. Existing video hashing methods are usually built on two isolated steps: frame pooling-based video features extraction and hash codes generation, which have not fully explored the spatial-temporal properties in videos and also inevitably result in severe information loss. To address these issues, in this paper we present an end-to-end video retrieval framework called Similarity-Preserving Deep Temporal Hashing (SPDTH) network. Specifically, we design the hashing module as an encoder Recurrent Neural Network (RNN) which is equipped with the stacked Gated Recurrent Units (GRUs). The benefit of our network is that it explicitly extracts the spatial-temporal properties of videos and yields compact hash codes in an end-to-end manner. Besides, we also introduce a structured ranking loss for deep network training by preserving intra-class similarity and inter-class separability, and the quantization loss between the real-valued output and the binary codes is minimized. Extensive experiments on several challenging datasets have demonstrated that SPDTH can consistently outperform state-of-the-art video hashing methods.

Autonomous Semantic Community Detection via Adaptively Weighted Low-rank Approximation

Identification of semantic community structures is important for understanding the interactions and sentiments of different groups of people. A robust community detection method needs to autonomously determine the number of communities and community structure for a given network. Nonnegative matrix factorization (NMF), a component decomposition approach, has been extensively used for community detection. However, the existing NMF-based methods require the number of communities to be determined \emph{a priori}, limiting their applicability in practice. Here, we develop a novel NMF-based method to autonomously determine the number of communities and community structure simultaneously. In our method, we use an initial number of communities, larger than the actual number, in the NMF formulation, and then suppress some of the communities by introducing an adaptively weighted group-sparse low-rank regularization to derive the target number of communities and at the same time the corresponding community structure. Our method is efficient without increasing the complexity of the original NMF method. We thoroughly examine the new method, showing its superior performance over several competing methods on synthetic and large real-world networks.

Image/Video Restoration via Multiplanar Autoregressive Model and Low-Rank Optimization

In this paper, we introduce an image/video restoration approach by utilizing the high-dimensional similarity in images/videos. After grouping similar patches from neighboring frames, we propose to build a Multiplanar autoregressive (AR) model to exploit the correlation in cross-dimensional planes of the patch group, which has long been neglected by previous AR models. To further utilize the nonlocal self-similarity in images/videos, a joint multiplanar AR and low-rank based approach is proposed (MARLow) to reconstruct patch groups more effectively. Moreover, for video restoration, the temporal smoothness of the restored video is constrained by the Markov random field (MRF), where MRF encodes a priori knowledge about consistency of patches from neighboring frames. Specifically, we treat different restoration results (from different patch groups) of a certain patch as labels of an MRF, and temporal consistency among these restored patches is imposed. Besides image and video restoration, the proposed method is also suitable for other restoration applications such as interpolation and text removal. Extensive experimental results demonstrate that the proposed approach obtains encouraging performance comparing with state-of-the-art methods.

Visual Attention Analysis and Prediction on Human Faces for Children with Autism Spectrum Disorder

The focus of this article is to analyze and predict the visual attention of children with Autism Spectrum Disorder (ASD) when looking at human faces. Social difficulties are the hallmark features of ASD and will lead to atypical visual attention toward various stimuli more or less, especially on human faces. Learning the visual attention of children with ASD could contribute to related research in the field of medical science, psychology, and education. We first construct a Visual Attention on Faces for Autism Spectrum Disorder (VAFA) database, which consists of 300 natural scene images with human faces and corresponding eye movement data collected from 13 children with ASD. Compared with matched typically developing (TD) controls, we quantify atypical visual attention on human faces in ASD. Statistics show that some high-level factors such as face size, facial features, face poses, and face emotions have different impacts on the visual attention of children with ASD. Combining the feature maps extracted from the state-of-the-art saliency models, we get the visual attention model on human faces for the autistic. The proposed model shows the best performance among all competitors. With the help of our proposed model, researchers in related fields could design specialized education contents containing human faces for the children with ASD or produce the specific model for rapidly screening ASD using their eye movement data.

Intelligent Classification and Analysis of Essential Genes Species Using Quantitative Methods

Significance of Essential word needs no further clarifications. Essential genes are considered in the perspective of evolution of different organisms; however, it is quite complicated because we have to recognize the difference between essential cellular processes, essential protein functions and essential genes. There is also a need to identify whether one set of growth conditions may be replaces under another set. It is also contended that most genes are essential in natural selection process. In this article, we applied intelligent method for classification of essential genes of four different species, namely, Human, Arabidopsis Thaliana, Drosophila Melanogaster and Danio Rerio. The primary aim of the current article is to understand the distributions of purines and pyrimidines over the essential genes of four different species Human, Arabidopsis Thaliana, Drosophila Melanogaster and Danio Rerio are considered. Based on quantitative parameters (Shannon Entropy, Fractal Dimension, Hurst Exponent, Distribution of purines- pyrimidines) ten different clusters have be generated for the four species. Some proximity results have been observed among the clusters of all the four species.

Embedding distortion analysis in wavelet-domain watermarking

Imperceptibility and robustness are two complementary fundamental requirements of any watermarking algorithm. Low strength watermarking yields high imperceptibility, but exhibits poor robustness. High strength watermarking schemes achieve good robustness but often infuse distortions resulting in poor visual quality in host image. In this paper we analyse the embedding distortion for wavelet based watermarking schemes. We derive the relationship between the distortion, measured in mean square error (MSE), and the watermark embedding modification and propose the linear proportionality between MSE and the sum of energy of the selected wavelet coefficients for watermark embedding modification. The initial proposition assumes the orthonormality of the discrete wavelet transform. It is further extended for non-orthonormal wavelet kernels using a weighting parameter, that follows the energy conservation theorems in wavelet frames. The proposed analysis is verified by experimental results for non-blind as well as blind watermarking schemes. Such a model is useful to find the optimum input parameters, including, the wavelet kernel, coefficient selection and subband choices for wavelet domain image watermarking.

Hybrid Wolf-Bat algorithm for optimisation of connection weights in multi-layer perceptron

In any neural network, the weights act as parameters for determining the output(s) from a set of inputs. They are used for finding the activation values of nodes of a layer from the values of the previous layer. Finding the ideal set of these weights for training a Multilayer Perceptron neural network such that it minimizes the classification error is a widely known optimization problem. This paper proposes a HybridWolf-Bat algorithm, a novel optimization algorithm, as a solution for solving this problem. The proposed algorithm is a hybrid of two already existing nature-inspired algorithms, which are the Grey Wolf Optimization algorithm and Bat algorithm. This novel approach is tested on ten different datasets of the medical field, obtained from the UCI machine learning repository. These results of the proposed algorithm are compared with those of four recently developed nature-inspired algorithms: Grey Wolf Optimization algorithm (GWO), Cuckoo Search (CS), Bat Algorithm (BA) and Whale Optimization Algorithm (WOA) along with the standard Back-propagation training method. As observed from the results, the proposed method is better in terms of both speed of convergence and accuracy and outperforms the other bio-inspired algorithms.

Stochastic Optimization for Green Multimedia Services in Dense 5G Networks

The many fold capacity magnification promised by dense 5G networks will make possible the provisioning of broadband multimedia services, including virtual reality, augmented reality, mobile immersive video, to name a few. These new applications will coexist with classic ones and contribute to the exponential growth of multimedia services in mobile networks. At the same time, the different requirements of past and old services pose new challenges to the effective usage of 5G resources. In response to these challenges, a novel Stochastic Optimization framework for Green Multimedia Services (SOGMS) is proposed hereby that targets the maximization of system throughput and the minimization of energy consumption in data delivery. In particular, Lyapunov optimization is leveraged to face this optimization objective, which is formulated and decomposed into three tractable subproblems. For each subproblem, a distinct algorithm is conceived, namely Quality of Experience (QoE) based admission control, cooperative resource allocation, and multimedia services scheduling. Finally, extensive simulations are carried out to evaluate the proposed method against state-of-art solutions in dense 5G networks.

A Simplistic Global Median Filtering Forensics Based on Frequency Domain Analysis of Image Residuals

Sophisticated image forgeries introduce digital image forensics as an active area of research. In this area, many researchers have addressed the problem of median filtering forensics. Existing median filtering detectors are adequate to classify median filtered images in uncompressed mode and in compressed mode at high quality factors. Despite that, the field is lacking a robust method to detect median filtering in low resolution images compressed with low quality factors. In this article, a novel feature set (four feature dimensions), based on first order statistics of frequency contents of median filtered residuals (MFRs) of original and median filtered images, has been proposed. The proposed feature set outperforms handcrafted features based state-of-the- art detectors, in terms of feature set dimensions, robustness for low resolution images at all quality factors and robustness against existing anti-forensic method. Also, results reveal the efficacy of proposed method over convolutional neural network (CNN) based median filtering detector. Comprehensive results expose the efficacy of the proposed detector to detect median filtering against other similar manipulations. Additionally, generalization ability test on cross-database images support the cross-validation results on four different databases. Thus, our proposed detector meets the current challenges in the field, to a great extent.

All ACM Journals | See Full Journal Index

Search TOMM
enter search term and/or author name