From LinguisticAnnotation
Jump to: navigation, search

CES (Nancy Ide, Greg Priest-Dorman,Patrice Bonhomme)

The Corpus Encoding Standard (CES) is a part of the EAGLES Guidelines developed for language engineering research and applications. CES is an SGML-based, [#TEI TEI]-conformant specification of a minimal encoding level that corpora must achieve to be considered standardized in terms of descriptive representation (marking of structural and typographic information) as well as general architecture (so as to be maximally suited for use in a text database). It also provides encoding specifications for linguistic annotation, together with a data architecture for linguistic corpora. A section of CES on speech annotation (part 6) is under construction. Projects using CES are listed here. An XML version of CES called XCES is under development.