Eric's Web

		首页 \| 新闻 \| 天气 \| 联系我们 \| 管理登陆

逸飞和逸翔

家庭百事

科技纵横

家庭影集

网络文摘

音乐&艺术

友情链结

	Business

	中国瓷器

	Computer/Internet
	ASP/VB
	SQL server
	FLASH
	Home Network
	IIS SERVER
	photoshop
	search engine
	Perl
	General Problem Fix
	Powerpoint
	Router/Switch/Hub
	Excel
	FTP
	.NET
	Internet Security
	Cloud Computing
	GPG
	PHP
	语义搜索(semantic search)

	股票

	Glossaries

	IPHONE

	Books

Fuller

什么是语义搜索引擎 | Streams, Pools and Reservoirs | 语义搜索引擎

最近读了Leigh Dodds的一篇文章Streams, Pools and Reservoirs，可谓长见识，Leigh Dodds认为语义搜索引擎（semantic search engine）和具有语义分析能力的搜索引擎(semantically enabled search engine)是两码事，得出这个结论的根据是对Web内容组织和检索的历史的回顾，类比曾经发生的Web的几个历史阶段，Leigh Dodds展望了基于linked data cloud的语义搜索引擎的特征，下面整理一下该文的要点及其思考

Web内容组织和检索历史回顾

Web的演变过程可以归纳成以下阶段：

内容在线发布，当数据量增大后，用一种分类索引的方式组织Web上的内容，我估计原文作者可能指类似于Yahoo早期的分类索引
搜索引擎自动地为内容建立索引，用了一个词"create a link-base"
搜索引擎的特色化和增值业务

结构化内容（data sets）的组织和检索展望

Leigh Dodds认为当前已经处于类似上述第一阶段的后期了，即，有大量的结构化数据用RDF描述，然后还有LOD项目（Linking Open Data），即将出现语义搜索引擎将data sets联系起来。

当前阶段的描述是：data sets之间的关系和联系的维护在很大程度上还是手工的，引自原文如下：

Not in the sense that members of the LOD community are manually entering data to link datasets together, but rather at the level of looking for opportunities to link together datasets, encouraging data publishers to co-ordinate and inter-relate their data, and by attempting to organically grow the link data web by targeting datasets that would usefully annotate or extend the current Linked Data Cloud.

因此，Leigh Dodds预测：语义搜索就是自动地将data sets联系和组织起来。区别了语义搜索引擎和具有语义分析能力的搜索引擎。

他认为，具有语义分析能力的搜索引擎(semantically enabled search engine)是

use techniques like natural language parsing and improved understanding of document semantics in order to provide an improved search experience for humans

而语义搜索引擎（semantic search engine）是：

A Semantic Web search engine should offer infrastructure for machines. Simple semantic web search engines like Swoogle and Sindice provide a way to for machines to construct '''link bases''', based on some simple expressions of '''what data is of relevance''', in order to find data that is of interest to a particular user, community, or within the context of a particular application. And crucially this can be done without having to always crawl or navigate over the entire linked data web. This process can be commoditised just as it has with the web of documents

思考

在两年前着手开发MetaSeeker工具包的时候，这种声音并不是主流，当时更多的人将重点放在语义识别上，我选择不同的方向不是因为更有眼光，而是凭着一个老程序员的这点技能，搞人工智能或者本体论相关方面的探索想都别想，我更愿意开发一个实用的工具，让建设垂直搜索和社交网站的人能够低成本甚至零成本的提取Web数据。因此，选择了Web内容结构化的路，实际上这条路也不简单，例如原文说的data relevance的组织和建设，至今还没有找到一种很有效的方法。