Tokenizer

Ein Tokenizer segmentiert einen Strom linguistischer Daten (in der Regel: einen Text) in eine Folge von (textuellen) Grundeinheiten: Wortformen und Interpunktionszeichen. Die so identifizierten Einheiten werden als Token bezeichnet.

STUB

This article is a stub. You can help Glottopedia by expanding it.

CAT

This article needs proper categorization. You can help Glottopedia by categorizing it
Please do not remove this block until the problem is fixed.

REF

This article has no reference(s) or source(s).
Please remove this block only when the problem is solved.

Tokenizer

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Interaction

Tools