博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
关于BOF改进方法的一些introduction
阅读量:7132 次
发布时间:2019-06-28

本文共 4295 字,大约阅读时间需要 14 分钟。

[1]

Zhang et al.[1] propose a framework that encode spatial information to inverted index by integrating local adjacency of visual words. Descriptive Visual Words (DVWs) and Descriptive Visual Phrases (DVPs) are proposed as the visual correspondences to text words and phrases, where visual phrases refer to the frequently co-occurring visual word pairs. The frequency in [1] was computed between two visual words within short distance (neighboring is not required).

 

[2]

 

Zheng et al.[2] propose a high-level visual words representation called visual synset, by constructing an intermediate visual descriptor, delta visual phrase, from frequently co-occurring visual word-set with similar spatial context and clustering delta visual phrases into visual synset, based their probabilistic’ semantics, it strengthens the discrimination and invariance power of traditional BOF.

 

[3]

 

Chen et al.[3] propose a new technique for combining local, discriminatively trained classifiers over groups of (super-) pixels into a joint model over labels. The method generates samples by iterating forward a weakly chaotic dynamical system instead using a trained CRF.

 

[4]

 

Csurka et al.[4] use a system that scores low-level patches according to their class relevance, propagates these posterior probabilities to pixels and uses low-level segmentation to guide the semantic segmentation. Firstly, they describe each patch with a high-level descriptor based on the Fisher kernel and use a set of linear classifiers, and then they use global image classifiers to take into account the context of the objects to be segmented.

 

[5]

Herve et al.[5] use visual words pairs despite the co-occurrence of neighboring visual words or other spatial relations, firstly, they construct a base-vocabulary containing n words, then they get the pairs-vocabulary, since they did not capture spatial information between words, the pairs-vocabulary size will be n(n+1)/2. After constructing the pairs-vocabulary, a SVM classifier will be used to train the training samples and then to finish the automatic task.

 

[6]

 

Zhang et al.[6] propose to encode more spatial information through the geometry-preserving visual phrases(GVP), that is, incorporate information about relative spatial locations of the features forming a visual phrase into its representation (hence “geometry-preserving”).In addition to co-occurrences, the GVP method can captures the local and long-range spatial layouts of the words.

 

[7]

 

Li et al.[7] propose the contextual bag-of-words(CBOW) representation that integrates semantic conceptual relation and spatial neighboring relation. So local spatial consistency from some spatial nearest neighbors is used to filter false visual-word matches. However, they did not consider the co-occurrence of the neighboring words.

 

[8]

 

Tirilly et al.[8] propose a new image representation called visual sentences that allows to consider simple spatial relations between visual words, and then use probabilistic Latent Semantic Analysis (PLSA) to eliminate the noisiest visual words. They capture the spatial information by getting an appropriate axis and to project keypoints on it.

 

 

[1] Shiliang Zhang, Qi Tian, Gang Hua, Qingming Huang, Shipeng Li; Descriptive Visual Words and Visual Phrases for Image Applications, ACM Int. Conf. on Multimedia 2009

 

[2] Zheng, Yan-tao and Zhao, Ming and Neo, Shi-yong and Chua, Tat-seng and Tian, Qi; Visual Synset: Towards a Higher-level Visual Representation, CVPR 2008

 

[3] Yutian Chen, Andrew Gelfand, Charless C. Fowlkes, Max Welling; Integrating Local Classifiers through Nonlinear Dynamics on Label Graphs with an Application to Image Segmentation, ICCV 2011

 

[4] Gabriela Csurka and Florent Perronnin; A Simple High Performance Approach to Semantic Segmentation, BMVC 2008

 

[5] Nicolas Herve, Nozha Boujemaa; Visual Word Pairs for Automatic Image Annotation, ICME 2009

 

[6] Yimeng Zhang, Zhaoyin Jia, Tsuhan Chen; Image Retrieval with Geometry-Preserving Visual Phrases

[7] Teng Li; Contextual Bag-of-Words for Visual Categorization, IEEE Trans. on Circuits and Systems for Video Technology 2011

 

[8] Pierre Tirilly, Vincent Claveau, Patrick Gros; Language modeling for bag-of-visual words image categorization, CIVR 2008

 

转载于:https://www.cnblogs.com/moondark/archive/2012/09/13/2683035.html

你可能感兴趣的文章
PAT A1007 动态规划
查看>>
VUE父子组件传递数据
查看>>
前端知识点——图片
查看>>
别人家的程序员是如何使用 Java 进行 Web 抓取的?
查看>>
95%的技术面试必考的JVM知识点都在这,另附加分思路!
查看>>
日期类问题
查看>>
区块链入门之基础知识
查看>>
mysql锁(Innodb)
查看>>
小程序开发之影分身术
查看>>
磨刀霍霍:爬爬爬爬爬爬虫爬起来~
查看>>
RxJava中的Observable,多Subscribers
查看>>
I/O模型和Java NIO源码分析
查看>>
第二天-《企业应用架构模式》-组织领域逻辑
查看>>
日志服务与SIEM(如Splunk)集成方案实战
查看>>
解决packet_write_wait: Connection to...: Broken pipe
查看>>
图学ES6-3.变量的解构赋值
查看>>
web3j的maven插件
查看>>
帮你理清React的生命周期
查看>>
堆和堆排序
查看>>
新手也能看懂,消息队列其实很简单
查看>>