Inherits Garbage. Inherited by AsciiCodec, Cp932Codec, Cp949Codec, Cp950Codec, EucJpCodec, Gb2312Codec, GbkCodec, Iso2022JpCodec, Iso88591Codec, TableCodec, Utf16BeCodec, Utf16Codec, Utf16LeCodec, Utf7Codec and Utf8Codec.
The Codec class describes a mapping between UString and anything else.
Unicode is used as the native character set and encoding in Warehouse. All other encodings are mapped to or from that: To unicode when e.g. parsing a mail message, from when storing data in the database (as utf-8).
A Codec is responsible for one such mapping. The Codec class also contains a factory to create an instance of the right subclass based on a name.
The source code for the codecs includes a number of generated files, e.g. the list of MIME character set names and map from Unicode to ISO-8859-2. We choose to regard them as source files, because we may want to sever the link between the source and our version. For example, if the source is updated, we may or may not want to follow along.
Constructs an empty Codec for character set cs, setting its state to Valid.
The construction of a codec sets it to its default state, whatever that is for each codec.
Returns a list of all canonical codec names. Aliases are not included in the list.
Appends c to u. If c isn't a legal codepoint or there are other errors, this codec's state is modifed appropriately.
Looks up s in our list of MIME character set names and returns a Codec suitable for mapping that to/from Unicode.
If s is unknown, byName() returns 0.
Returns a codec likely to describe the encoding for s. This uses words lists and many other strategies.
If s contains a Unicode Byte Order Mark, it probably is a UTF-16BE or UTF-16LE string.
If s is a Russian string, it probably contains lots of common Russian words, and we have can identify the character encoding by scanning for KOI8-R and ISO-8859-5 forms of some common words. Ditto for other languages.
If s uses typical Windows punctation and is mostly ASCII, it's in a typical Windows encoding.
This function is a little slower than it could be, since it creates a largish number of short EString objects.
Returns a codec suitable for encoding the unicode string u in such a way that the largest possible number of mail readers will understand the message.
Returns an error message describing why the codec is in Invalid state. If the codec is in Valid or BadlyFormed states, error() returns an empty string.
This pure virtual function maps u from Unicode to the codec's other encoding, and returns a EString containing the result.
Each reimplementation must decide how to handle codepoints that cannot be represented in the target encoding.
Checks whether the last codepoint in u is a leading surrogate, and flags an error if so.
Returns the name of the codec, as supplied to the constructor.
Records that the error s occurred. This is meant for errors other than invalid or undefined codepoints, and should be needed only by a stateful Codec. Also sets the state() to Invalid.
Records that at octet index pos, an error happened and no code point could be found. This also sets the state() to Invalid.
Records that at octet index pos in input, an error happened and no code point could be found. This also sets the state() to Invalid.
Records that codepoint (at octet index pos) is not valid and could not be converted to Unicode. This also sets the state() to Invalid.
Sets the codec's state to st, which is one of Valid, BadlyFormed and Invalid.
Valid is the initial setting, and means that the Codec has seen only valid input. BadlyFormed means that the Codec has seen something it did not like, but was able to determine the meaning of that input. Invalid means that the Codec has seen input whose meaning could not be determined.
Returns the current state of the codec, reflecting the codec's input up to this point.
This pure virtual function maps s from codec's encoding to Uncode, and returns a UString containing the result.
Reimplementations are expected to handle errors only by calling setState(). Each reimplementation is free to recover as seems suitable for its encoding.
Returns true if this codec's input has not yet seen any syntax errors, and false if it has.
Returns true if this codec's input has so far been well-formed, and false if not. The definition of wellformedness is left to each subclass. As general guidance, to be wellformed, the input must avoid features that are discouraged or obsoleted by the relevant standard.
Destroys the Codec.
This web page based on source code belonging to The Archiveopteryx Developers. All rights reserved.