In a real-world network, shared by several users, telehaptic applications involving delay-sensitive multimedia communication between remote locations demand distinct Quality of Service (QoS) guarantees for different media components. These QoS constraints pose a variety of challenges, especially when the network cross-traffic is unknown and time-varying. In this work, we propose an application layer congestion control protocol for telehaptic applications operating over shared networks, termed as dynamic packetization module (DPM). DPM is a lossless, network-aware protocol which tunes the telehaptic packetization rate based on the level of congestion in the network. To monitor the network congestion, we devise a novel network feedback module, which communicates the end-to-end delays encountered by the telehaptic packets to the respective transmitters, with negligible overhead. Via extensive simulations, we show that DPM meets the QoS requirements of telehaptic applications over a wide range of network cross-traffic conditions. We also report the qualitative results of a real-time telepottery experiment with several human subjects, which reveal that DPM preserves the quality of telehaptic activity even under heavily congested network scenarios. Finally, we compare the performance of DPM with existing telehaptic communication protocols, and conclude that DPM outperforms other techniques.
From wearable devices to depth cameras, researchers are exploiting various multimodal data to recognize human actions for applications such as video gaming, education, healthcare, etc. While there have been many successful techniques in the literature, most of the current approaches have focused on statistical or local spatio-temporal features and do not explicitly explore the temporal dynamics of the sensor streams. However, human action streams contain rich temporal structure information that can characterize the unique underlying patterns of different action types. From this perspective, we propose a novel temporal order modeling approach to human action recognition. Specifically, we explore subspace projections to extract the latent temporal patterns of the actions. The temporal order between these patterns are compared and the index of the pattern that appears first is encoded. This process is iterated over multiple times and produces a compact feature vector representing the temporal dynamics of the sensor data. Human action recognition can then be efficiently solved by the nearest neighbor search based on the Hamming distance between these compact feature vectors. We further introduce a sequential optimization algorithm to learn the optimized projections that preserve the pairwise label similarity of the original sensor data. The performances are evaluated on two public human action datasets. Experimental results demonstrate the superior performance of the proposed technique in both accuracy and efficiency.
In this paper we introduce a method to overcome one of the main challenges of person re-identification in multi-camera networks, namely cross-view appearance changes. The proposed solution addresses the extreme variability of person appearance in different camera views by exploiting multiple feature representations. For each feature, Kernel Canonical Correlation Analysis (KCCA) with different kernels is employed to learn several projection spaces in which the appearance correlation between samples of the same person observed from different cameras is maximized. An iterative logistic regression is finally used to select and weigh the contributions of each projection and perform the matching between the two views. Experimental evaluation shows that the proposed solution obtains comparable performance on VIPeR and PRID 450s datasets and improves on PRID and CUHK01 datasets with respect to the state of the art.
In order to mechanically predict audiovisual quality in interactive multimedia services, we have developed machine learning based no-reference parametric models. We have compared Decision Trees based ensemble methods, Genetic Programming and Deep Learning models that have one and more hidden layers. We have used the INRS audiovisual quality dataset specifically designed to include ranges of parameters and degradations typically seen in real-time communications. Decision Trees based ensemble methods have outperformed both Deep Learning and Genetic Programming based models in terms of RMSE and Pearson correlation values. We have also trained and developed models on various publicly available datasets and have compared our results with those of these original models. Our studies show that Random Forests based prediction models achieve high accuracy for both the INRS audiovisual quality dataset and other publicly available comparable datasets.
Successful computer-aided diagnosis systems typically rely on training datasets containing sufficient of richly annotated images. However, detailed image annotation is often time-consuming and subjective, especially for medical images, which becomes the bottle-neck for the collection of large dataset, and then building computer-aided diagnosis systems. In this paper, we design a novel computer-aided endoscopy diagnosis system to deal with the multi-classification problem of electronic endoscopy medical records (EEMRs) containing sets of frames, while labels of EEMRs can be mined from the corresponding text records using an automatic text-matching strategy without human special labeling. With unambiguous EEMR labels and ambiguous frame labels, we propose a simple but effective pooling scheme called Multi-class Latent Concept Pooling (McLCP), which learns codebook from EEMRs with different classes step by step and encodes EEMRs based on a soft weighting strategy. In our method, computer-aided diagnosis system can be extended to new unseen classes with ease, and applied to the standard single-instance classification problem with seamless even though detailed annotated images are unavailable. In order to validate our system, we totally collect 1889 EEMRs with more than 59K frames and successfully mine labels for 348 of them. The experimental results show that our proposed system significantly outperforms the state-of-the-arts. Moreover, we apply the learned latent concept codebook to detect the abnormalities in endoscopy images and compare it with a supervised learning classifier, and the evaluation shows that our codebook learning method can effectively extract the true prototypes related to different classes from the ambiguous data.