Vol. 77, No. 9, September
2004
Searching Smarter:
Finding Legal Resources on the Invisible Web
A great deal of information available on the Internet is found only
on the "invisible Web," and is not searchable using a general search
engine such as Google. Invisible Web content is considered dynamic
because it exists as pieces of information within a database until you
pull it together. Learn what strategies you can use to efficiently
locate this content.
Sidebars:
by Bonnie Shucha
By now, most attorneys have discovered that the Internet can be a
powerful tool for legal research. Increasingly, Web search engines like
Google have moved up in the ranks of computer-assisted legal research
tools alongside more expensive resources such as LexisNexis and Westlaw.
Even some judges are using the Web to check facts and statistics
presented by attorneys and are reporting their findings in written
opinions.1
|
Shucha
|
Bonnie Shucha is the
reference and electronic services librarian at the U.W. Law Library. She
is past president of the Law Librarians Association of Wisconsin. She
may be contacted at bjshucha@wisc.edu.
As the Web makes its way into the courtroom, it is important that
legal practitioners know how to search it effectively. Unfortunately,
this is not always the case. A new study reveals that while
professionals are spending increasingly larger amounts of time doing
computer-based searching, most are dissatisfied with their search
experience.2
It is estimated that most searchers locate only 0.03 percent - or 1
in 3,000 - of the Web pages available to them.3 Although such results may be due, in part, to a
poorly constructed search, a large portion of the blame also falls on
the search engine itself. Even the most experienced searcher, using the
largest search engines, can access only about 16 percent of all Web
content. Why? Because 84 percent of the information available on the
Internet is found only on the "invisible Web," also known as the "deep
Web," and is not searchable using a general search engine such as
Google.4
By recognizing how the invisible Web differs from other Web content,
you will understand how to alter your search strategies to find this
information in a time-efficient manner. This article investigates the
nature of the invisible Web and offers strategies for locating invisible
Web content.
What is the Invisible Web?
To understand the concept of the "invisible Web," it may be helpful
to first explore the nature of the "visible Web." A visible Web page is
one that exists in "static" or unchanging form until its creator alters
it. In this way, it is similar to a document that you might create in a
word processor. Both types physically exist as files on a computer: the
word processed document might be saved as a .doc or .wpd file on your
hard drive whereas a visible Web page might be stored as a .htm or .html
file on a Web server.
These static Web pages are considered visible because standard search
engines are able to index them, that is display them as search results.
Most search engines index new documents in one of two ways: 1) by using
automated "spiders" or "crawlers" to follow links from other documents
that are already indexed; or 2) when a webmaster registers a Web page.
Because this method of indexing documents is well established and
relatively inexpensive, most search engines draw primarily upon visible
Web content.
As you now know, there is another type of Web content known as the
"invisible Web." Most invisible Web content is considered "dynamic"
because it consists of bits of information that are stored in a database
and pulled together on-the-fly into a Web page at your request.5 Invisible Web pages don't actually exist until you
submit a query to the database containing the information and the
matching information is drawn together into a Web page. Usually, an
invisible Web search is conducted via a specialized search interface, or
search box, provided by the database creator.
This concept is somewhat similar to the mail merge feature in most
word processors. In a mail merge, content is drawn from an outside data
source, such as an Excel file, and inserted into a new, customized
document. Like an invisible Web page, the mail merged document did not
previously "physically" exist as a stored file on a computer. Rather,
both types are created at the point of need.
Because it is dynamic, or "physically" nonexistent, most conventional
search engines are unable to retrieve invisible Web content. Traditional
methods of indexing that are based on following links from other
documents or webmaster registration are inadequate because they rely on
the existence of a static file. Once most search engine spiders hit a
database's search form, they are forced to stop because user input is
required. Conventional search engines are simply not capable of
automatically generating that input in the form of a search. As one
author notes, "It's not that the information is really hidden or
invisible. It's there, freely available and waiting to be found. The
problem is that general search engines are built in such a way that they
just can't go into [a] database and search the information contained
[there]."6
Besides dynamic Web pages, other types of content also are considered
"invisible." Very recently created static Web pages are effectively
invisible because search engines' spiders have not yet had a chance to
index them. It is estimated that it takes three to four months before a
static Web page is indexed by a search engine.7 Password protected information also may be
considered part of the invisible Web because search engines are unable
to access and index this content without the proper authorization. Such
content might include information within subscription databases such as
LexisNexis and Westlaw or confidential business databases.
What Type of Content Is (and Is Not) Freely Available on the
Invisible Web?
Fortunately, the vast majority of invisible Web content, 95 percent,
is publicly accessible, free information. Studies reveal that the
quality of documents found on the invisible Web often exceeds that of
documents that are accessible via conventional search engines.8 This includes a vast amount of legal and
governmental documents such as case law, statutes, bills, regulations,
patents, briefs, census data, government reports, treaties, and much
more. A great deal of business and corporate data also is available on
the invisible Web, including SEC filings, stock quotes, company
profiles, and so on. More general types of information can be found on
the invisible Web, such as address and phone number directories, flight
schedules, dictionary definitions, maps, and more.
Searchers should be aware that much of the law-related information
that is freely available on the invisible (and visible) Web is material
that is in the public domain. Despite assertions by some novice
researchers that "everything is free on the Web," there are certain
types of content that are unlikely to be found at no cost on the Web.
These include books and articles that are published for profit, public
domain documents that have editorial enhancements, or other
authoritative materials that are considered to be someone's intellectual
property.9
Tips for Finding Invisible Web Content
The first step in locating any type of information is considering
where an authoritative source of that information might be found. It may
be a print source, the Web (visible or invisible), a subscription
database, a phone call, and so on. If it is available from more than one
source, you will need to consider what will be the quickest, most cost
effective way to obtain it.
If you determine that the information you need might be available on
the invisible Web, how do you find it? Fortunately, there is nothing
magical about finding content on the invisible Web. It's simply a matter
of knowing where to look. Consider that:
- a great deal of excellent legal and business information is freely
available on the Internet;
- unfortunately, much of it is contained within databases and is,
therefore, invisible or inaccessible to most conventional search
engines;
- the most effective way to access this information is using the
database's own search interface, or search box;
- fortunately, the search box is usually found on a static, visible
Web page that is accessible using a conventional search
engine.
The following scenario may help illustrate: You have been asked to
locate Wisconsin statutes concerning livestock. You don't have a copy of
the Wisconsin Statutes in print, but you think that they might be
available on the Internet. You go to Google and do a search containing
the keywords "livestock statutes Wisconsin." You find some interesting
information about your topic and references to the statutes, but not the
statutes themselves.
Take a moment to reconsider the search. If you were doing the
research using print sources, you would first locate a copy of the
Wisconsin Statutes, then search the index for your keyword, "livestock."
The same strategy applies when doing research on the Web. Because
providers of legal and business information often publish their
collections within "invisible" databases, it is more effective to first
limit your search to find the provider's "visible" search interface
page. Once there, you could query the database using your specific
keywords. As one author notes, "often the key to the answer is not
locating the answer itself as the first step, but locating the right
database in which to search for it."10
Back at Google, you decide to try your search again, this time using
just the keywords "Wisconsin Statutes." The very first item in your
search results is the freely available Revisor of Statutes Bureau's
(RSB) Wisconsin Statutes database. In the RSB's search box, you proceed
with your specific search for "livestock" and successfully locate the
specific statutes that you need.
Chances are that you have already used invisible Web content without
realizing it. Maybe a librarian directed you to the RSB's Wisconsin
Statutes page or perhaps you saw a link on WisBar. With a better
understanding of why some Web content is considered "invisible" and a
knowledge of the strategies used to locate it, you will be able to
search smarter and get maximize value from the time you spend
researching on the Web.
Endnotes
1Michael Pena, Google's
Domain Even Takes in Law Offices, East Bay Bus. Times, May 7,
2004; Declan McCullagh, Search
Engines Take the Stand, CNET News.com, May 13, 2004.
2Delphi
Research Asks: Does Search Contribute to Productivity?,
DelphiWeb.com Newsflash, May 5, 2004.
3Michael K. Bergman, The Deep Web:
Surfacing Hidden Value, 7 J. Elec. Pub. 1 (August 2001).
4Id.
5One way to recognize if a Web page
is static or dynamic is to look at its URL, or Web address. Static pages
often contain the extension .htm or .html. Dynamic pages usually include
a question mark indicating that the resulting Web page is based on a
database query.
6Diana Botluk, Mining Deeper Into the
Invisible Web, LLRX.com, Nov. 15, 2000.
7See Berman,
supra note 3.
8Id.
9There are a few exceptions to this
rule. Several news sources, such as the New York Times, offer
full-text content on their Web sites. Usually, however, this includes
current content only and the reader is subjected to numerous
advertisements. Additionally, some authors and publishers have chosen to
offer their content free on the Web for several reasons: to facilitate
the distribution of scholarly information, to market themselves, or to
put forth their own point of view. Beware the latter.
10See Botluck,
supra note 6.
Wisconsin
Lawyer