Character Classes

Eelco Visser. Character Classes. Technical Report P9708, Programming Research Group, University of Amsterdam, August 1997.


Character classes are used in syntax definition formalisms as compact representations of sets of characters. A character class is a list of characters and ranges of characters. For instance [A-Z0-9] describes the set containing all uppercase characters and all digits. One set of characters can be represented in many ways with character classes. In this paper an algebraic specification of character classes is presented. We define a normalization of character classes that results in unique, most compact normal forms such that equality of character classes becomes syntactic equality of their normal forms.