prev. | next




  This will of course only be possible if the researcher knows something about the appearance of

the specific volume sought.

Alternatively, some sort of rational order could be used. A rational order is a subject

classification. In the Middle Ages a system of nine classes or topics was developed. The topics

ranged in descending order from God to the Abyss1. Viewed from the perspective of this

rational order, an alphabetical classification might be seen as heretical since Angels will be

placed beside the Abyss, and God near the Godless! A modern version of a rationally ordered

text (without overt theological overtones) is found in Roget's Thesaurus. This groups together

words and phrases with the same or similar meanings, and arranges groups of words in

numbered paragraphs. The system is readily manageable, as long as one understands the

rational principles upon which the order is based. These are somewhat variable; what is rational

or natural to an author may seem bizarre or arbitrary to their readers. Also, the system relies

upon alphabetical ordering, without which it would be impossible easily to find a starting point

when seeking a word.


  Some early examples

  A fascinating example of a system combining rational and alphabetical indexing is to be found

in Gilbert of Poitier's Commentary on the Psalms (circa 1150). This includes signs in the

margins of the text which indicate links between the adjacent passage and discussions of the

same subject elsewhere in the book. The signs provide an alternative, non-linear, path through

the pages of the book. The work has been called the first hypertext system2.

In the twelfth century the writers of thematic sermons, in the organization of their writings (in

the form of books), contributed to the impetus towards the development of both Order and

Access. Other interesting combinations of rational and alphabetical indexing were made in the

thirteenth century, when several people were experimenting with procedures for facilitating

access to sections of books. Such efforts were influenced and informed by the new styles of

preaching, and by the developing universities3. In the early thirteenth century Peter of Capua

made a written collection of the 'distinctions' between the different meanings of various key


1Rouse and Rouse (1982: 219).

2Rouse and Rouse (1982: 204-5).

3Rouse and Rouse (1982).


  25








  terms in the Bible. He sorted entries alphabetically by first letter, and then 'rationally' within all

words beginning with the same letter. So, among the A's, Alitisimus came before Aer, and so

on until Abissus, the last of the 81 entries for the letter A1.

These developments, however, did not reach fruition until the advent of printing. Alongside the

development of alphabetical ordering, the development of accessibility has led to the

alphabetical index and the concordance; with the production of concordances and dictionaries:

'we have come to the type of book that can only be searched, for it cannot be read' (Rouse and

Rouse)2.


  Different ways of searching: the concordance and the index

  Concordances and indices developed continuously from the middle of the twelfth century,

although the index only became widespread with the development of printing. A concordance,

now commonly known as a key word in context ('KWIC') index, is an alphabetical list of all

the words in a text, and the context within which each word occurs. The context given is

usually a sentence, but it may be only a few surrounding words, or it can be as much as a

paragraph or more. The presentation of a word within its context enables a more accurate

assessment of the particular sense in which that word was used. This clarifies the relevance of a

particular passage for the researcher, which in turn helps reduce the number of 'false calls'.

False calls are references to words with senses other than that with which the reader is

concerned . So, for example, there may be two index entries for 'apples', one in a theological

discussion, the other horticultural. A KWIC index helps prevent the researcher turning to the

wrong page.

Concordances have, because of their lengthiness, been relatively rare, generally concerning

only a few key texts, such as the Bible and the works of Shakespeare. Now, computers can

quickly and easily generate a concordance for any text. This has been seen as providing an

automated solution to the amount of human labour involved in constructing indices and tables

of contents. However, KWIC indices are not without limitations. When a word appears

frequently then the list of occurrences can be daunting to the reader, who may miss the




1Rouse and Rouse (1982: 220).

2(1982: 222).


  26








  occurrence of greatest relevance. There is no perfect solution to this problem. In a large

document, or collection of documents, some confusion is inevitable.

Tables of contents and indices are nothing more than navigation aids and cannot in themselves

ensure the discovery of the item sought, nor can they protect against exposure to irrelevancies.

One interesting solution would be a subject classification: a large set of occurrences of a single

word would itself be classified by subject (shades of a medieval mixture of rational and

alphabetical indices). This requires human intervention thus limiting the use of computers in a

task that seemed, on the face of it, to be ideally suited to them; so that efficiency decreases as

the size of the document or collection of documents increases. As we have already noted,

computer-generated links do not work very efficiently. It seems that there is at present no

alternative to the conventional method of recording the contents of a document or a collection of

documents. We must continue to rely upon the person who produces the guide to the collection

we are examining, or else we would have to read through the entire collection.

Despite the above, there are advantages to be gained from electronic versions of texts or

archives. We have already seen, in the case of the OED, that the electronic version of a text is

searchable in ways not practicable in the printed version. Electronic collections, despite the

reservations expressed above, have a further benefit: in electronic form it is easy to distribute

copies of an original text together with annotations (for instance, the critical edition or reader's

guide which I have just argued are indispensable). So the evidence upon which a judgement

has been made can be distributed far more easily (more economically) in an electronic medium

than on paper. In this sense arguments are made more accountable, the power of the editor to

suppress variants by simply omitting them is reduced. The means of criticism can be made

readily available to those who are motivated to be critical.


  Computers as indexing tools

  Although Hypertext may not enable the revolution claimed for it (see below), computers have

nonetheless revolutionized our access to texts, facilitating both easy navigation within a text

(see immediately below) and the discovery of published materials, by way of the citation index

(which is discussed in the next chapter).

As for navigation within a text, the concordance or KWIC index has already been referred to. is

speedily generated. There are free (or shareware) programmes now available which can rapidly


  27




  generate a KWIC index from a plain file, and versions exist for every flavour of computer

(IBM PC compatible, Macintosh, UNIX and some mainframes, inter alia). Some of these

programmes also include a hypertextual navigational system, promoting easy access to the text. To

conclude this chapter, it is interesting to consider how computers can facilitate access to a text.

The following illustration is a view of a computer screen, showing how a key word in context

index is used. The programme shows three different windows. Firstly, there is a section of the

text under examination (at the bottom of the screen). Secondly, there is a list of all the words

which appear the text, and the number of occurrences of each. (On this screen, at the top, two

extracts from the word-list are shown). The third window, which is part of the KWIC index,

appears in the centre of the screen. Both the word-list and the KWIC index were generated

automatically by the computer programme from the text.

In this example, I have selected the word 'sacrificing' from the word-list at the top, which tells

me that the word appears twice in the text. Below this the programme shows me an extract

from the KWIC index, revealing both occurrences of the selected word, with supporting

context. Also, I am shown the first appearance of the word 'sacrificing' (highlighted) within

the text proper. I can now, using the KWIC index, make a preliminary decision as to which

occurrences within the text of the word sacrificing (if any) are relevant to my current interest,

and call up any (or none) of those occurrences in the text itself.




  28











  A KWIC generating  application: Mark  
Zimmermann's Free Text in action

  Alphabetic list of words -   KWIC index - The 2 instances  
  rrr_out_1_1.gif a basic alphabetic indexof SACRIFICING in context

  The occurence of SACRIFICING in the main body of the text


Other programmes such as TACT as well as constructing KWIC indices allow the text to be

analysed and searched in variety of different ways for example looking for collocations - words

occurring near one another. This may be helpful in finding or eliminating metaphorical uses of

a word (consider searching for 'apple' occurring near the word 'woman' as opposed to 'apple'

near the word 'pruning')
  The use of indices

  In his review of David Schneider's book 'A critique of the study of kinship' Anthony Good

notes the inclusion of several blank pages and concludes by commenting: 'If the publishers had

used a little of this redundant space to include an index, the scholarly value of the book would

have been much enhanced1. In the ensuing correspondence Schneider expressed his gratitude

to his publisher for 'yielding to my conviction that indexes are a cheap shortcut for the lazy

scholar2. Good replied with the comment that 'indexes may or may not be ìa shortcut for the

lazy scholarî (though what of the poor undergraduate, or the reader who wishes later on to





1Good (1985:584).

2(1986:541).


  29








  locate a particular passage); more often, their absence is a shortcut for the lazy author.'

(1986:542).

A more neutral summary of the utility of indices may be found in a standard textbook for

indexers:

  The purposes of an index, as stated by McColvin are:-

1) to facilitate reference to the specific item;

2) to compensate as far as possible for the fact that a book can be written in one

sequence, according to one plan;

3) to disclose relationships; and

4) to disclose omissions 1.


  To these purposes Borko and Bernier2 have added:-

  5) to answer questions of discovery and foster serendipity;

6) to provide a comprehensive overview of a subject field; and

7) to give nomenclature guidance.


  Keyword searching

  Keyword searching is another means by which computers promote access to texts. It is

available in every word processor, and the usual command is 'Find "X"' where 'X' is a word

or phrase of your choosing. Generally, the programme offers the facility whether or not to

search only for the whole word. (For example, a 'whole word' search for 'sacrifice' will call

up only that word but ignore 'sacrifices'. A search which is not restricted to that whole word

would also find 'sacrificed' and 'sacrifices'). There will usually be a further choice as to

whether the search is to be case-sensitive or not. (A case-sensitive search would find 'sacrifice'

but not 'Sacrifice'). There is also the possibility of using 'wild cards' which can stand for any

letter, like the blank tile in Scrabble, or any series of letters. This is useful when one wants to

find the stem of a word without regard to grammatical inflections. In the screen view illustrated

above, a search for 'sacrific*' (where * is the wild card) would have revealed each instance of


1McColvin (1953: 33-4).

2(1978:4).


  30




  sacrifice,sacrificed and sacrificing. In this simple case, of course, the same task could be

achieved by making a search for sacrific (provided that it was not a whole word search).

The unique (if relatively restricted) value of the wild card is that it can be used to find words with

the same beginning and ending, regardless of the sequence in the middle. For example, it will

search for rhymes or for complete words that begin with sacrific. A wild card is needed to include

all the possible endings to the string of letters sacrific while excluding other words which contain

that string. In otherwords by searching for whole words which satisfy the description sacrific* we

can exclude other words such as desacrifice (if such an ugly term exists). Further possibilities are

opened up where different searches may be combined (e.g. find sacr* OR ritual killing).

So even a simple word processor provides several ways of searching a text which would be

much harder to achieve by other means. With the increasing availability of such tools, it may

well be that the way the way in which a reader approaches a text is changing.

A further development of the use of wild cards is the so-called soundex searching available in

some powerful programmes. Here the programme will automatically look for words which

sound like the word specified by the reader, so that a search for the word sought will include

sort. This is particularly useful when dealing with a historical archive in which spellings are not

standardized. Or in transatlantic modes, where a search for standardized will also reveal

standardised.

All the searching methods described above duplicate, the function of an index, with the

important difference that in an electronic text, each reader can choose their own indexing terms.

Advantages over conventional indices are more clearly seen when different search terms can be

combined with the logical operators and, or and not. This allows a search to be defined much

more precisely. To continue with the illustration above, one may search for occasions where

one particular person, such as Geldua, is shown to have performed the sacrifice. Or one could

search for sacrific* and (Geldua or Selbon) to find instances in the text when either of those

people are mentioned in conjunction with sacrifice.

Finally, a note of caution about this sort of free searching throughout a text. As stated above, it

puts the researcher in the position of indexer. However, the compiler of any index may use





  31







  words which describe, but do not actually occur in, the text, so that a cross-reference from that

word alone will not reach relevant parts of the text. Also, several different terms may be united

under a single indexing head. At different times this may either help or hinder the reader. The

indexer at least has the advantage of reading the work before making the index. Using text

searching to examine a large text without reading it may seem like a strange thing to do but, in

fact, it is less strange than it may seem, as shown in the example below and in the discussion

next chapter.

Originally it was hoped that such search facilities would eliminate the need for active human

intervention in entering indexing terms, so avoiding also the attendant problems of indexers'

inconsistency and arbitrariness. However, as Blair and Maron1 demonstrate, real material

contains a rich variety of expressions that cannot be predicted infallibly. They examined the use

of a free text searching system on a medium size database, and considered the success rate of

computer searches repeated a number of times by people well acquainted with the material in

the database. Even in these favourable circumstances, they found that only one in five searches

successfully located the specified material. That is to say, the computer searches achieved only

a 20% retrieval rate. This illustrates some inherent problems with the use of free text searching

(alone) for relevant material.

Blair and Maron discussed a complex American legal action, where numerous documents were

entered into a computer system so that the defence lawyers could keep track of them. The

defence staff were asked to conduct free text searches which were representative of the sort of

tasks actually undertaken in the course of working on the case. These searches pointed to

documents, and the staff were then asked to assess the relevance of those documents to the

question posed.. They then did a variety of other searches, including random sampling of parts

of the database to generate other document collections that were also given to the staff. This

allowed them to calculate measures for recall and precision2. Recall, the percentage of the

relevant documents retrieved, was 20%. In other words, the random selection turned up many


1(1985).

2Recall was defined as the number of relevant and retrieved documents out of the total number of relevant

documents.  Precision was defined as the number of relevant and retrieved documents out of the total number of

documents retrieved. (Blair and Marron 1985: 290).


  32



  more relevant documents than the search based on that request. Precision, measuring how

many of those retrieved were relevant, was much better, at 79%.

Blair and Marons study clearly demonstrates the dangers of keyword searching. A machine can

tell you that it has found 5, or even 176 instances of a word or an expression within

documentation. However, this gives an impression of comprehensivity which may be

misleading. No machine can tell you about the relevant information which has been

overlooked. Nor can it direct you to all other relevant books, nor advise which of these can be

found in a local library. That is why academic conferences are valued not only for the formal

presentations, but also for the informal and invaluable conversations over coffee and beer in

between. Advice and references from colleagues constitute a resource which it is unwise to

overlook, even if you use the most sophisticated computer searching.

Any researcher will probably have some sort of reading list, even if only from a local library

catalogue. If the author of one of those works is derided by a respected colleague, then the

researcher may delete that work from the list. Conversely, if the respected colleague

recommends an alternative author, their name may replace that of the discredited source. This

leads to the next chapter: choosing what to read.




  33


prev. | next


   Contents


Go to ERA | Go to CSAC Monographs