Sunday, January 19, 2014

WORDSMITH Tools of Text Analysis

The WORDSMITH tools is ‘an integrated suite of programs for looking at how words behave in texts.’ It ‘controls’ the programs it contains: Concord (makes a concordance using plain texts or web text files), KeyWords (locate and identify key words in a given corpora), and WordList (generate word lists based shown in alphabetical and frequency order). 
Johnson & Ensslin (2006) discussed the methodological concerns about keyword analysis and the reliability of the BNC reference corpus when compared to research corpora in order for the latter to be neutrally analyzed. They identified two problems. The BNC constructed by Scott and composed of a set of 90.7 million words taken from the late 1980s and early 1990s, failed to cover themes outside that time frame, the thing that resulted the “problem of age disparity”. The other problem is related to “proper names” in newspapers and media discourse. Proper names may appear as “key” keywords in any newspaper corpus. Scott (2000), however, came to rule out proper names of any kind in view of the fact that they change over time. Sinclear (2004) argued that articles including proper names should be excluded on the basis that they put the homogeneity of the research corpus at risk. But what about articles containing household names that are deeply related to the area one is investigating? Johnson & Ensslin (2006) suggested a couple of exits with dreadful setbacks: Either build one’s own comparator from scratch to generate a more reliable list of the most frequent words, which is time consuming, or conduct an extensive editing work on the keyword lists, which will eventually put the reliability and objectivity of the study into question. 
What some other analysts did to leapfrog these setbacks, like Baker (2004), is promoting a carefully triangulated quantitative and qualitative analytic methodology by combining between statistical findings and what Baker (2004) called “inclusive and subjective” interpretations, to eschew both the lexical-only approach and the subjectively-collected data. 

Johnson, S. & Ensslin, A. (2006). Language in the News: Some Reflections on Keyword Analysis Using Wordsmith Tools and the BNC. Leeds Working Papers in Linguistics and Phonetics, 11.

