Difference between revisions of "Tagger"

From Glottopedia
Jump to navigation Jump to search
m
Line 1: Line 1:
 
==Definition==
 
==Definition==
Ein Tagger versieht jedes Wort in einem [[Korpus]] mit einem Tag (Etikett), welches eine part of speech oder eine andere [[lexikalische Kategorie]] Kategorie bezeichnet. Die Tags stammen aus einem Tagset.  
+
A tagger is a device which assigns symbolic labels (''tags'') to linguistics units. The labels are taken from a predefined set of symbols (''tag-set'').
  
Man unterscheidet zwischen regelbasierten und stochastischen Taggern. Stochastische (oder statistische) Tagger arbeiten mit Übergangswahrscheinlichkeiten von Wortfolgen, während regelbasierte Tagger (oder Brill-Tagger) Regeln zur Wortfolge aufstellen, die mit weiteren Regeln inkrementell verbessert werden. Beide Tagging-Algorithmen lernen die erforderlichen Informationen aus einem Trainingskorpus.
+
==Comments==
 +
In most cases, a tagger assigns tags representing morpho-syntactic information to single word-forms or token. But there are tagger which have been designed to identify semantic role of noun phrases or prepositional phrases (''sense tagging'') and sometimes identiying the discourse structure of a text is considered as a king of tagging.
 +
 
 +
Conceptually, tagging can be considered as a three step process: (i). identification of the relevant units (ii). assigning all possible labels to the units (e.g. by lexical look-up, applying heuristics, etc.) (iii). disambiguation.
 +
 
 +
It is common practice to distinguish between rule-based and stochastic tagger, though in some cases it is not easy to decide
 +
 
Tagger erreichen je nach Textsorte eine Korrektheit von 90-97%.
 
Tagger erreichen je nach Textsorte eine Korrektheit von 90-97%.
  
==Origin==
+
==Subtypes==
engl. ''tag'' - markieren, mit einem Anhängeretikett versehen
+
* [[HMM tagger]]
 +
* [[Brill tagger]]
 +
* [[Memory-based tagger]]
 +
* [[Tree tagger]]
  
 
==Other Languages==
 
==Other Languages==
  
 
* German [[Tagger (de)]]
 
* German [[Tagger (de)]]

Revision as of 17:42, 6 July 2007

Definition

A tagger is a device which assigns symbolic labels (tags) to linguistics units. The labels are taken from a predefined set of symbols (tag-set).

Comments

In most cases, a tagger assigns tags representing morpho-syntactic information to single word-forms or token. But there are tagger which have been designed to identify semantic role of noun phrases or prepositional phrases (sense tagging) and sometimes identiying the discourse structure of a text is considered as a king of tagging.

Conceptually, tagging can be considered as a three step process: (i). identification of the relevant units (ii). assigning all possible labels to the units (e.g. by lexical look-up, applying heuristics, etc.) (iii). disambiguation.

It is common practice to distinguish between rule-based and stochastic tagger, though in some cases it is not easy to decide

Tagger erreichen je nach Textsorte eine Korrektheit von 90-97%.

Subtypes

Other Languages