Not all documentary information is directly retrievable. In 2001 two publications described the problems with search engine limitations. Bergman (2001) who coined the term 'deep web' and Sherman & Price (2001) who used the term 'invisble web'. According to estimates by Bergman the 'deep web' is about 500 times as large as the 'surface web'. However, some of the assumptions in the early studies were likely to be flawed. Whatever its size, the deep web still exists and there is a lot of quality content to be found.
Estimates for some databases indexed by three major search engines give an illustration for the existence of the deep web:
| Site | Yahoo | Live | |
| Worldcat | 433.000 | 3.500.000 | 964 |
| Pubmed | 9.260.000 | 863.000 | 98.272 |
The main causes for the existence of the deep web
Spiders or crawlers of search engines can't deal with database forms. The spiders can't complete a form, and hit the search button to gain access to the information in databases. They can index the search form itself, but not the wealth behind it. Webpages resulting from the database are so called dynamic pages. Dynamic pages can be recognized from the structure of their URL, they contain: ? or clues like: cgi, cfm, php etc. The following URL is an example of a dynamic page http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=9742976 wich is indexed by search engines with some difficulty.
(Re)searcher are human after all. They don't look beyond the first 10 or perhaps 20 results. They will change their search query rather than paging through.
Make sure that you alter the preferences of your favourite search engine. Another possibility is too use another search engine. There is probably not a single search engine in the world that is the best for all your search queries.
Too find the information contained in the deep web, it is most important too find those database that hold the information rather than the information directly. To locate these databases there are four possibilities.
Search for your research topic with addtional terms that point to databases. Terms such as: database, data, dataset, archive, bibliography, index, directory of statistics. Bijvoorbeeld ["plane crash" | "aviation accidents" database].
Search for your search term and add terms in the URL which generate database queries for example: asp, bin, cgi, cfm, search, query, (webquery) or php. Eg.
[mycology inurl:cfm] or [mycology inurl:asp]
Whenever you have found the suitable databases it is important that you understand how to query the database to retrieve your information.
Direct Search http://www.freepint.com/gary/direct.htm
Although Direct Search is no longer updated, it is still a valuable resource to find important databases. This site was started by Gary Price. Recent developments on Web search and Web resrouces are still reported by him and blogged on ResoureShelf and DocuTicker.
Yahoo! Webdirectories http://dir.yahoo.com/
Most subject categories have a special set as webdirectories. On some occasions als databases or bibliographies.
A collection of special search engines http://www.leidenuniv.nl/ub/biv/specials.htm
A bit outdated (last additions from 2000) but still an impressive collection of special directories, bibliographies and databases in the social sciences and humanities.
Complete Planet http://www.completeplanet.com
Covers some 70,000 databases, and Web directories.
IncyWincy http://www.incywincy.com/default
Turbo10 http://turbo10.com/
A meta search engine which can search in 800 databases at once. Some of these databases contain information from the hidden web.
Gosh me http://www.goshme.com/ (Still in Beta, perhaps defunct?)
Promising new search engine.
ScienceResearch http://www.scienceresearch.com/search/
This portal allows access to numerous scientific journals and public science databases. Depending on the source, full text documents may be available. In the event full text is not available, the results pull up an abstract of the article and a link to the source.
Anon. (2004) Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity. Retrieved 2005-05-23, from http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html
Bergman, K. T. (2001). The deep web: surfacing hidden value. The Journal of Electronic Publishing 7(1). http://www.press.umich.edu/jep/07-01/bergman.html
Devine, J. and F. Egger-Sider. (2005). Beyond Google: The invisible Web. Retrieved 2005-05-23, from http://www.lagcc.cuny.edu/LIBRARY/invisibleweb/.
Sherman, C. and G. Price (2001). The invisible web: Discovering information sources search engines can't see. Medford NJ, USA, Information today.
WG 20071009
Page Information
|
Wiki Information |
Recent PBwiki Blog Posts |