Difference between revisions of "Tokenizer"

Latest revision as of 11:27, 20 February 2009

Ein Tokenizer segmentiert einen Strom linguistischer Daten (in der Regel: einen Text) in eine Folge von (textuellen) Grundeinheiten: Wortformen und Interpunktionszeichen. Die so identifizierten Einheiten werden als Token bezeichnet.

STUB

This article is a stub. You can help Glottopedia by expanding it.

CAT

This article needs proper categorization. You can help Glottopedia by categorizing it
Please do not remove this block until the problem is fixed.

REF

This article has no reference(s) or source(s).
Please remove this block only when the problem is solved.

@@ Line 1: / Line 1: @@
 Ein '''Tokenizer''' segmentiert einen Strom linguistischer Daten (in der Regel: einen Text) in eine Folge von (textuellen) Grundeinheiten: Wortformen und Interpunktionszeichen. Die so identifizierten Einheiten werden als [[Token]] bezeichnet.
 {{wb}}
+{{stub}}{{cats}}
+{{ref}}

Difference between revisions of "Tokenizer"

Latest revision as of 11:27, 20 February 2009

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Interaction

Tools