关于BOF改进方法的一些introduction-白红宇

关于BOF改进方法的一些introduction

阅读量：7132 次

发布时间：2019-06-28

本文共 4295 字，大约阅读时间需要 14 分钟。

[1]

Zhang et al.[1] propose a framework that encode spatial information to inverted index by integrating local adjacency of visual words. Descriptive Visual Words (DVWs) and Descriptive Visual Phrases (DVPs) are proposed as the visual correspondences to text words and phrases, where visual phrases refer to the frequently co-occurring visual word pairs. The frequency in [1] was computed between two visual words within short distance (neighboring is not required).

[2]

Zheng et al.[2] propose a high-level visual words representation called visual synset, by constructing an intermediate visual descriptor, delta visual phrase, from frequently co-occurring visual word-set with similar spatial context and clustering delta visual phrases into visual synset, based their probabilistic’ semantics, it strengthens the discrimination and invariance power of traditional BOF.

[3]

Chen et al.[3] propose a new technique for combining local, discriminatively trained classiﬁers over groups of (super-) pixels into a joint model over labels. The method generates samples by iterating forward a weakly chaotic dynamical system instead using a trained CRF.

[4]

Csurka et al.[4] use a system that scores low-level patches according to their class relevance, propagates these posterior probabilities to pixels and uses low-level segmentation to guide the semantic segmentation. Firstly, they describe each patch with a high-level descriptor based on the Fisher kernel and use a set of linear classiﬁers, and then they use global image classiﬁers to take into account the context of the objects to be segmented.

[5]

Herve et al.[5] use visual words pairs despite the co-occurrence of neighboring visual words or other spatial relations, firstly, they construct a base-vocabulary containing n words, then they get the pairs-vocabulary, since they did not capture spatial information between words, the pairs-vocabulary size will be n(n+1)/2. After constructing the pairs-vocabulary, a SVM classifier will be used to train the training samples and then to finish the automatic task.

[6]

Zhang et al.[6] propose to encode more spatial information through the geometry-preserving visual phrases(GVP), that is, incorporate information about relative spatial locations of the features forming a visual phrase into its representation (hence “geometry-preserving”).In addition to co-occurrences, the GVP method can captures the local and long-range spatial layouts of the words.

[7]

Li et al.[7] propose the contextual bag-of-words(CBOW) representation that integrates semantic conceptual relation and spatial neighboring relation. So local spatial consistency from some spatial nearest neighbors is used to filter false visual-word matches. However, they did not consider the co-occurrence of the neighboring words.

[8]

Tirilly et al.[8] propose a new image representation called visual sentences that allows to consider simple spatial relations between visual words, and then use probabilistic Latent Semantic Analysis (PLSA) to eliminate the noisiest visual words. They capture the spatial information by getting an appropriate axis and to project keypoints on it.

[1] Shiliang Zhang, Qi Tian, Gang Hua, Qingming Huang, Shipeng Li; Descriptive Visual Words and Visual Phrases for Image Applications, ACM Int. Conf. on Multimedia 2009

[2] Zheng, Yan-tao and Zhao, Ming and Neo, Shi-yong and Chua, Tat-seng and Tian, Qi; Visual Synset: Towards a Higher-level Visual Representation, CVPR 2008

[3] Yutian Chen, Andrew Gelfand, Charless C. Fowlkes, Max Welling; Integrating Local Classifiers through Nonlinear Dynamics on Label Graphs with an Application to Image Segmentation, ICCV 2011

[4] Gabriela Csurka and Florent Perronnin; A Simple High Performance Approach to Semantic Segmentation, BMVC 2008

[5] Nicolas Herve, Nozha Boujemaa; Visual Word Pairs for Automatic Image Annotation, ICME 2009

[6] Yimeng Zhang, Zhaoyin Jia, Tsuhan Chen; Image Retrieval with Geometry-Preserving Visual Phrases

[7] Teng Li; Contextual Bag-of-Words for Visual Categorization, IEEE Trans. on Circuits and Systems for Video Technology 2011

[8] Pierre Tirilly, Vincent Claveau, Patrick Gros; Language modeling for bag-of-visual words image categorization, CIVR 2008

转载于:https://www.cnblogs.com/moondark/archive/2012/09/13/2683035.html

你可能感兴趣的文章