Important Concepts underlying searching in the DQC


The Simple Search Template

The DQC simple search template provides for a single input field into which a user may enter query terms to identify documents of interest. The input field is preceded by a selection that specifies whether ANY of or ALL of the words within the input field need appear in a single document. The user may enter multiple words in an input field. Phrases or Saved Lists of Index Words may also be entered into the input area. The system supports the use of wildcard search characters.

The user can apply restrictions to their query via the use of the restriction fields. For example, Author restriction can limit results to a specific author.

Examples

In the example below, it is assumed that the user has selected ANY or ALL from the drop down list preceding the input area and then entered query terms into the input area. The parenthesis are included for clarity.

  • ANY:(heaven hell soul)- This query will retrieve records/documents that contain either heaven or hell, or soul.
  • ALL:(heaven hell soul)- This query will retrieve records/documents that contain heaven and hell and soul.
  • The Organization of the Digital Quaker Collection

    The DQC is comprised of books, journals, and Corpus volumes. A Corpus being an edited compilation of individual works. In total, the DQC encompasses over 400 "works".

    Each book, journal, or work which is contained in a Corpus has been "loaded" into the software upon which the system is built in a manner which allows a user to identify any given work by entering a search query or by browsing the DQC by Title or by Author.

    The browsing functions of the DQC provide a mechanism for listing the Titles and Authors of works based on user selection criteria such as a word in a title or the name of an author. In browse mode, user selection of a title in the result list causes the TOC (Table of Contents) for the work to be displayed. From there the user can navigate to pages in the work and/or page images of pages in the works - or she/he can return to the result list of works or to other system functions.

    Each "work" (also called a "document" within these pages and the interface) is XML encoded data. Each work is composed of many nodes - and in turn each node may contain many other nodes - paragraghs within chapters within volumes; lines within poems within letters; etc.

    The searching functions of the DQC produce a search result set which is a list of the "works" which have been "loaded" into the underlying software and which satisfy the users query. The query is executed against a whole work NOT against specific nodes in it. I.E., the result set is a list of "works" which satisfied the query, not a list of nodes within a work.

    When a user selects a "work" from the list of titles in the result set, the system then displays the nodes within the "work" which contained one or more of the words which comprised the query. Please note that it is quite likely that all of the query terms that you entered will not appear in each node. So if you entered the query: heaven AND hell, the complete work in the result list contains both terms, but any given hit node may only contain one of the query terms. An exception to this rule occurs when you've entered a "phrase" as part of your query - in this case indeed, the hit nodes will contain the phrase.

    From a hit node, the user may elect to display the page on which the hit node occured or may elect to "expand" the hit node to the next larger node containing it. For example, if the hit node represents a paragraph, the expand function may then display the chapter which contained the paragraph - depending on the underlying structure of the XML tagging.

    What is a document?

    A document (or record) in the Digital Quaker Collection (DQC) is a complete work - whether that be a book, a journal or in the case of a Corpus one of the works which comprises the Corpus. The documents are XML encoded entities, and as such are themselves comprised of smaller pieces which are called nodes. Nodes are things like paragraphs, chapters, lines in a poem, etc.

    In terms of the DQC software, a document is that entity which can be retrieved as the result of a search (or by browsing titles or authors). It should be noted that when you elect to view one of the documents in a search result set, the first screens that appear are the nodes of the document that contained the query terms you entered. The query terms should be highlighted. From there you will be able to View the page in the document which contains the node, or to expand the node to the node which encompasses it. (I.E., you can expand a paragraph to the chapter which contains it).

    Please note that it is quite likely that all of the query terms that you entered will not appear in each node. So if you entered the query: heaven AND hell, the documents in the result list contain both terms, but any given hit node for a specific document may only contain one of the query terms.

    Phrases

    A phrase may be entered into an input field by surrounding the phrase with quotes. If a phrase is entered, it should be the only thing entered into the field. Example: "divine light"

    Phrase searching allows a user to identify documents which not only contain the words entered but also contain those words right next to one another in the order specified. Wildcard search characters are not allowed within an entered phrase.

    Saved Lists of Index Words

    In the process of performing an Index Search (searching the list of searchable words)a user can either choose words from a word list for search purposes, or she/he can choose words from that word list and save the list of chosen words as a "named" save entity which lasts for the length of a session.

    If a user has saved off one or more of these "named" save entities during their user session, the system will automatically modify search template forms so that a drop down list of these names is associated with each input box. The user can select a "named" entity from this list and the system will populate the associated input field with the list of chosen words that were saved to the entity.

    Restriction Fields

    After obtaining a result set for the query specified, the system will further refine or limit that result set based upon user entered or selected restriction crirteria. Possible result set restrictions include:

    Wildcard Search Characters

    Query terms which include "wildcard" characters may be entered (except when entering a phrase or when using the proximity search template). An asterisk "*" indicates the system should find a match based on any number of characters appearing in that position, while a question mark "?" indicates the system should find a match based on a single character appearing in that position.

    For example, b*g would find documents containing bag, beg, big, bog, bug, bead, brag, and many others, while b?g would find documents containing only bag, beg, big, bog, and bug.