Deep Web Article Index for
Deep
Website Links For
Deep
 

Information About

Deep Web




Non-textual files such as multimedia (image) files, Usenet archives and documents in non- HTML File Formats such as PDF and DOC documents used to form a part of deep web, but now are more easily accessible to search engines, especially Google .

The deep web should not be confused with the term ''dark web'' or '' Dark Internet '' which refers to machines or network segments not connected to the Internet. While deep web content is accessible to people online but not visible to conventional Search Engine s, dark internet content is not accessible online by either people or search engines.


SURFACE WEB


To better understand the invisible web consider how conventional search engines construct their databases, thus defining the surface web: Programs called spiders or Web Crawler s start by reading pages on an initial list of websites. Each page they read is indexed and added to the search engine's database. Any Hyperlink s to new pages are added to the list of pages to be indexed. Eventually, all reachable pages have been indexed or the search engine runs out of time or disk space. These reachable pages are the Surface Web . Pages which do not have a chain of links from a page in the spider's initial list are invisible to that spider and ''not'' part of the surface web it defines.

In opposition to the 'surface web' is the 'deep web'. The great majority of the deep web is composed by searchable databases. To understand why these databases are invisible to spiders (and their search engines) consider the following:
:Imagine someone has collected a great amount of information – books, texts, articles, images, etc. – and put them together online in a website, creating a database reachable only via a search field. This database, as most databases, would work like this:
#in a search field the user types the keywords he or she wants
#this searching facility looks inside the database and retrieves the relevant content
#a page of results is presented bringing the links to every important topic related to the user’s query

Once a conventional Search Engine ’s Web Crawler reaches this site, it will capture the text contained in the main page and in the pages which hyperlinks can be found to (usually “about us”, “contact us”, “privacy policy”, etc.). But the great majority of the information – books, texts, articles or images – that are only reachable by querying the search field, cannot be reached by the Web Crawler . The robot cannot predict which words it should type inside the search field. Thus the data is invisible to the search engine.


ACCESSING THE DEEP WEB

As said before, search engines use Web Crawler s that follow Hyperlink s. Such crawlers typically do not submit queries to databases due to the potential infinitude of queries that can be made to a single database. It has been noted that this can be (partially) overcome by having links to query results, thus increasing Google-style PageRank results for a member of the deep web.

In 2005 , Yahoo! made a small part of the deep web searchable by releasing Yahoo! Subscriptions . This search engine searches through a few subscription-only web sites.

Some search tools are being designed to retrieve information from the deep web. Their crawlers are set to identify and somehow interact with searchable databases, aiming to provide access to deep web content.


REFERENCES



EXTERNAL LINKS