Difference between revisions of "Ngram frequency"
WikiLingua (talk | contribs) m |
|||
Line 1: | Line 1: | ||
− | |||
The mean, or summed, frequency of all fragments of a word of a given length. Most commonly used is bigram frequency, using fragments of length 2. The word 'dog' will contain 2 bigrams: 'do' and 'og'. Bigram frequency is considered to be a measure of orthographic regularity and normally has a negative correlation with response times in psycho-linguistic experiments. | The mean, or summed, frequency of all fragments of a word of a given length. Most commonly used is bigram frequency, using fragments of length 2. The word 'dog' will contain 2 bigrams: 'do' and 'og'. Bigram frequency is considered to be a measure of orthographic regularity and normally has a negative correlation with response times in psycho-linguistic experiments. | ||
Line 7: | Line 6: | ||
In the auditory domain the equivalent of bigram is diphone, a group of two phonemes. Mean diphone frequency could be considered a crude measure of phonological regularity. | In the auditory domain the equivalent of bigram is diphone, a group of two phonemes. Mean diphone frequency could be considered a crude measure of phonological regularity. | ||
+ | |||
+ | [[Category: EN]] | ||
+ | [[Category: DICT]] | ||
+ | [[Category:Quantitative Linguistics]] |
Revision as of 13:06, 28 November 2007
The mean, or summed, frequency of all fragments of a word of a given length. Most commonly used is bigram frequency, using fragments of length 2. The word 'dog' will contain 2 bigrams: 'do' and 'og'. Bigram frequency is considered to be a measure of orthographic regularity and normally has a negative correlation with response times in psycho-linguistic experiments.
It is not unusual to extend the word with a couple of 'space' characters, to give the first and last character in the word a special status. The word 'dog' will then become '_dog_' and now contains 4 bigrams: '_d', 'do', 'og' and 'g_'.
Ngram frequency of length 1 is equal to the character frequency, and using length 3 is commonly referred to as trigram frequency. Larger values for N are rare.
In the auditory domain the equivalent of bigram is diphone, a group of two phonemes. Mean diphone frequency could be considered a crude measure of phonological regularity.