Wisconsin Lawyer: Practice Tips:

Home > News & Publications > Wisconsin Lawyer > Article

September 01, 2004

Practice Tips

Finding legal resources on the invisible web.

Searching Smarter:
Finding Legal Resources on the Invisible Web

A great deal of information available on the Internet is found only on the "invisible Web," and is not searchable using a general search engine such as Google. Invisible Web content is considered dynamic because it exists as pieces of information within a database until you pull it together. Learn what strategies you can use to efficiently locate this content.

Sidebars:

How to Find Information on the Invisible Web

by Bonnie Shucha

By now, most attorneys have discovered that the Internet can be a powerful tool for legal research. Increasingly, Web search engines like Google have moved up in the ranks of computer-assisted legal research tools alongside more expensive resources such as LexisNexis and Westlaw. Even some judges are using the Web to check facts and statistics presented by attorneys and are reporting their findings in written opinions.¹

Shucha

Bonnie Shucha is the reference and electronic services librarian at the U.W. Law Library. She is past president of the Law Librarians Association of Wisconsin. She may be contacted at bjshucha@wisc.edu.

As the Web makes its way into the courtroom, it is important that legal practitioners know how to search it effectively. Unfortunately, this is not always the case. A new study reveals that while professionals are spending increasingly larger amounts of time doing computer-based searching, most are dissatisfied with their search experience.²

It is estimated that most searchers locate only 0.03 percent - or 1 in 3,000 - of the Web pages available to them.³ Although such results may be due, in part, to a poorly constructed search, a large portion of the blame also falls on the search engine itself. Even the most experienced searcher, using the largest search engines, can access only about 16 percent of all Web content. Why? Because 84 percent of the information available on the Internet is found only on the "invisible Web," also known as the "deep Web," and is not searchable using a general search engine such as Google.⁴

By recognizing how the invisible Web differs from other Web content, you will understand how to alter your search strategies to find this information in a time-efficient manner. This article investigates the nature of the invisible Web and offers strategies for locating invisible Web content.

What is the Invisible Web?

To understand the concept of the "invisible Web," it may be helpful to first explore the nature of the "visible Web." A visible Web page is one that exists in "static" or unchanging form until its creator alters it. In this way, it is similar to a document that you might create in a word processor. Both types physically exist as files on a computer: the word processed document might be saved as a .doc or .wpd file on your hard drive whereas a visible Web page might be stored as a .htm or .html file on a Web server.

These static Web pages are considered visible because standard search engines are able to index them, that is display them as search results. Most search engines index new documents in one of two ways: 1) by using automated "spiders" or "crawlers" to follow links from other documents that are already indexed; or 2) when a webmaster registers a Web page. Because this method of indexing documents is well established and relatively inexpensive, most search engines draw primarily upon visible Web content.

As you now know, there is another type of Web content known as the "invisible Web." Most invisible Web content is considered "dynamic" because it consists of bits of information that are stored in a database and pulled together on-the-fly into a Web page at your request.⁵ Invisible Web pages don't actually exist until you submit a query to the database containing the information and the matching information is drawn together into a Web page. Usually, an invisible Web search is conducted via a specialized search interface, or search box, provided by the database creator.

This concept is somewhat similar to the mail merge feature in most word processors. In a mail merge, content is drawn from an outside data source, such as an Excel file, and inserted into a new, customized document. Like an invisible Web page, the mail merged document did not previously "physically" exist as a stored file on a computer. Rather, both types are created at the point of need.

Because it is dynamic, or "physically" nonexistent, most conventional search engines are unable to retrieve invisible Web content. Traditional methods of indexing that are based on following links from other documents or webmaster registration are inadequate because they rely on the existence of a static file. Once most search engine spiders hit a database's search form, they are forced to stop because user input is required. Conventional search engines are simply not capable of automatically generating that input in the form of a search. As one author notes, "It's not that the information is really hidden or invisible. It's there, freely available and waiting to be found. The problem is that general search engines are built in such a way that they just can't go into [a] database and search the information contained [there]."⁶

Besides dynamic Web pages, other types of content also are considered "invisible." Very recently created static Web pages are effectively invisible because search engines' spiders have not yet had a chance to index them. It is estimated that it takes three to four months before a static Web page is indexed by a search engine.⁷ Password protected information also may be considered part of the invisible Web because search engines are unable to access and index this content without the proper authorization. Such content might include information within subscription databases such as LexisNexis and Westlaw or confidential business databases.

What Type of Content Is (and Is Not) Freely Available on the Invisible Web?

Fortunately, the vast majority of invisible Web content, 95 percent, is publicly accessible, free information. Studies reveal that the quality of documents found on the invisible Web often exceeds that of documents that are accessible via conventional search engines.⁸ This includes a vast amount of legal and governmental documents such as case law, statutes, bills, regulations, patents, briefs, census data, government reports, treaties, and much more. A great deal of business and corporate data also is available on the invisible Web, including SEC filings, stock quotes, company profiles, and so on. More general types of information can be found on the invisible Web, such as address and phone number directories, flight schedules, dictionary definitions, maps, and more.

Searchers should be aware that much of the law-related information that is freely available on the invisible (and visible) Web is material that is in the public domain. Despite assertions by some novice researchers that "everything is free on the Web," there are certain types of content that are unlikely to be found at no cost on the Web. These include books and articles that are published for profit, public domain documents that have editorial enhancements, or other authoritative materials that are considered to be someone's intellectual property.⁹

Tips for Finding Invisible Web Content

The first step in locating any type of information is considering where an authoritative source of that information might be found. It may be a print source, the Web (visible or invisible), a subscription database, a phone call, and so on. If it is available from more than one source, you will need to consider what will be the quickest, most cost effective way to obtain it.

If you determine that the information you need might be available on the invisible Web, how do you find it? Fortunately, there is nothing magical about finding content on the invisible Web. It's simply a matter of knowing where to look. Consider that:

a great deal of excellent legal and business information is freely available on the Internet;
unfortunately, much of it is contained within databases and is, therefore, invisible or inaccessible to most conventional search engines;
the most effective way to access this information is using the database's own search interface, or search box;
fortunately, the search box is usually found on a static, visible Web page that is accessible using a conventional search engine.

The following scenario may help illustrate: You have been asked to locate Wisconsin statutes concerning livestock. You don't have a copy of the Wisconsin Statutes in print, but you think that they might be available on the Internet. You go to Google and do a search containing the keywords "livestock statutes Wisconsin." You find some interesting information about your topic and references to the statutes, but not the statutes themselves.

Take a moment to reconsider the search. If you were doing the research using print sources, you would first locate a copy of the Wisconsin Statutes, then search the index for your keyword, "livestock." The same strategy applies when doing research on the Web. Because providers of legal and business information often publish their collections within "invisible" databases, it is more effective to first limit your search to find the provider's "visible" search interface page. Once there, you could query the database using your specific keywords. As one author notes, "often the key to the answer is not locating the answer itself as the first step, but locating the right database in which to search for it."¹⁰

Back at Google, you decide to try your search again, this time using just the keywords "Wisconsin Statutes." The very first item in your search results is the freely available Revisor of Statutes Bureau's (RSB) Wisconsin Statutes database. In the RSB's search box, you proceed with your specific search for "livestock" and successfully locate the specific statutes that you need.

Chances are that you have already used invisible Web content without realizing it. Maybe a librarian directed you to the RSB's Wisconsin Statutes page or perhaps you saw a link on WisBar. With a better understanding of why some Web content is considered "invisible" and a knowledge of the strategies used to locate it, you will be able to search smarter and get maximize value from the time you spend researching on the Web.

Endnotes

¹Michael Pena, Google's Domain Even Takes in Law Offices, East Bay Bus. Times, May 7, 2004; Declan McCullagh, Search Engines Take the Stand, CNET News.com, May 13, 2004.

²Delphi Research Asks: Does Search Contribute to Productivity?, DelphiWeb.com Newsflash, May 5, 2004.

³Michael K. Bergman, The Deep Web: Surfacing Hidden Value, 7 J. Elec. Pub. 1 (August 2001).

⁴Id.

⁵One way to recognize if a Web page is static or dynamic is to look at its URL, or Web address. Static pages often contain the extension .htm or .html. Dynamic pages usually include a question mark indicating that the resulting Web page is based on a database query.

⁶Diana Botluk, Mining Deeper Into the Invisible Web, LLRX.com, Nov. 15, 2000.

⁷See Berman, supra note 3.

⁸Id.

⁹There are a few exceptions to this rule. Several news sources, such as the New York Times, offer full-text content on their Web sites. Usually, however, this includes current content only and the reader is subjected to numerous advertisements. Additionally, some authors and publishers have chosen to offer their content free on the Web for several reasons: to facilitate the distribution of scholarly information, to market themselves, or to put forth their own point of view. Beware the latter.

¹⁰See Botluck, supra note 6.

Wisconsin Lawyer

News & Pubs Search

Keyword Search:

Show Advanced Filters

Source:

Search Prior to 2002

Topic:

Date Range:

Format: MM/DD/YYYY

Columns
View All Columns
- Practice Management

Practice Tips