-
Notifications
You must be signed in to change notification settings - Fork 0
KeyWordAnalysis
By: Mike Scott & Christopher Tribble
Pg : 55
- method of identifying KeyWords (KW) in text
- identify items of unusual frequency in comparison with a reference corpus
- 2 kinds of outputs in a key words list:
- aboutness indicators
- stylistic indicators
- demonstrate how a plot of keywords can illustrate themes and progressions of text.
Keyness is a quality words may have in a given text or set of texts, suggesting that they are important, they reflect what the text is really about, avoiding trivia and insignificant detail.
To establish term keyness there are 2 methods:
- Kintsch & van Dijk's (1978)
- propositional analysis of text identifies text structure, establishing a hierarchy which we might label heirarchy of keyness
- Hoey (2001)
- analyze sentences
- Author's process
Consider, sample text:
EastEnders star Steve McFadden was 'stable' in St. Thomas's Hospital, London,
last night after being stabbed in the back, arm and hand under Waterloo Bridge,
Central London, on Friday.
Propositional analysis will give us:
1. S. McF is a star
2. S. McF is in EastEnders
3. S.McF was stable
4. someone said that
5. S.McF is in hospital
6. The hospital is called St. Thomas's
7. The hospital is in London
8. was so last night
Then figure out which proposition is referred to most in the entire set ( macropropositions ). The propositions then naturally fall into a hierarchy of prominence, reflecting their keyness. The propositions, more than others, are what the text is really ' about '.
This method takes sentences and not propositions. It seeks out elements which are most linked. A link is a repetition of some kind like
- Grammatical variants ( want - wanted )
- Synonyms ( desire , etc)
- Hyponyms
- Meronyms
- Antonyms
- Equivalents (like McFadden referred as he, McFadden, star, actor) of a word being considered would count as repetitions making for links.
Links alone do not suffice, what makes a sentence truly prominent is when it is bonded by a pattern of at least 3 links with other sentences in the text
Both these methods rely on conceptual repetition to come up with keyness.
Verbatim repetition of words (so wanted, want, wants are same)
Use a reference corpus (preferrably large) to generate an expected frequency of those words
Then find out those words which occur outstandingly more than their expected values based on the reference corpus. These are the keywords (KW)
To find out whether a word is outstanding do a chi-square analysis (observed vs expected)
KeyWords
KeyWords Links
Consider linkage betweek KWs as collocational neighbours. This will give KeyWords which not only share textual keyness but local proximity as well like say having a range of x words left or right of the keyword is the collaction neighbourhood.
It should be noted that KW linkages are not, like ordinary collocational linkages, since they require both node and collocate to be key (general collocate just returns values like strong tea which just classifies the tea. But by looking for keywords in a range of say 5 words left or right of the key, collocate establishes a relation between 2 different key concepts)
Pg : 73
- explain notion of association i.e. the contextual relationship between words that are key in the same texts.
We have established keyness and co-keyness.
- Strings occur when a series of KWs are connected one to the next within a text
- Star linkage occcur when a central KW is shared in linkage by several texts
- Clique has a set of inter-connected KWs
- Clumps are also inter-connected but have links to other parts of the text as well (represented with dashed lines)
Keywords across texts can be plotted as a graph.
- "Strategies of Discourse Comprehension" Kihtsch & van Dijk, 1978
- "Textual Interaction: An Introduction to Written Discourse Analysis" Hoey 2001