Class UString.

Inherits Garbage

The UString class provides a normalized Unicode string.

Unicode is the common character encoding for all strings except those limited to US-ASCII, but such strings are sparingly manipulated.

Most of the functionality of UString is concerned with conversion to/from other encodings, such as ISO-8859-15, KOI-U, etc, etc. Other functionality is intentionally kept to a minimum, to lighten the testing burden.

Two functions note particular mention are ascii() and the equality operator. ascii() returns something that's useful for logging, but which can often not be converted back to unicode.

There is a fast equality operator which tests against printable ASCII, returning false for every unprintable or non-ASCII character. Very useful for comparing a UString to e.g. "seen" or ".", but nothing more.

UString::UString( const UString & other )

Constructs an exact copy of other.

UString::UString()

Constructs an empty Unicode EString.

Reimplements Garbage::Garbage().

void UString::append( const UString & other )

Appends other to the end of this string.

void UString::append( const char * s )

Appends the ASCII character sequences s to the end of this string.

void UString::append( const uint cp )

Appends unicode code point cp to the end of this string.

EString UString::ascii() const

Returns a copy of this string in 7-bit ASCII. Any characters that aren't printable ascii are changed into '?'. (Is '?' the right choice?)

This looks like AsciiCodec::fromUnicode(), but is semantically different. This function is for logging and debugging and may leave out a different set of characters than does AsciiCodec::fromUnicode().

int UString::compare( const UString & other ) const

Returns -1 if this string is lexicographically before other, 0 if they are the same, and 1 if this string is lexicographically after other.

The comparison is case sensitive - just a codepoint comparison. It does not sort the way humans expect.

bool UString::contains( const UString & s ) const

Returns true if this string contains at least one instance of s.

bool UString::contains( const char c ) const

Returns true if this string contains at least one instance of c.

bool UString::contains( const char * s ) const

Returns true if this string contains at least one instance of s.

bool UString::endsWith( const UString & suffix ) const

Returns true if this string ends with suffix, and false if it does not.

bool UString::endsWith( const char * suffix ) const

Returns true if this string ends with suffix, and false if it does not. suffix must be an ASCII or 8859-1 string.

int UString::find( char c, int i ) const

Returns the position of the first occurence of c on or after i in this string, or -1 if there is none.

int UString::find( const UString & s, int i ) const

Returns the position of the first occurence of s on or after i in this string, or -1 if there is none.

bool UString::isAscii() const

Returns true if this string contains only printable tab, cr, lf and ASCII characters, and false if it contains one or more other characters.

bool UString::isDigit( uint c )

Returns true if c is a digit, and false if not.

bool UString::isLetter( uint c )

Returns true if c is a letter, and false if not.

bool UString::isSpace( uint c )

Returns true if c is a unicode space character, and false if not.

UString UString::mid( uint start, uint num ) const

Returns a string containing the data starting at position start of this string, extending for num bytes. num may be left out, in which case the rest of the string is returned.

If start is too large, an empty string is returned.

uint UString::number( bool * ok, uint base ) const

Returns the number encoded by this string, and sets *ok to true if that number is valid, or to false if the number is invalid. By default the number is encoded in base 10, if base is specified that base is used. base must be at least 2 and at most 36.

If the number is invalid (e.g. negative), the return value is undefined.

If ok is a null pointer, it is not modified.

UString & UString::operator+=( const UString & other )

Appends other to this string and returns a reference to this strng.

UString & UString::operator=( const UString & other )

Makes this string into an exact copy of other and returns a reference to this strng.

void UString::operatordelete( void * p )

Deletes p. (This function exists only so that gcc -O3 doesn't decide that UString objects don't need destruction.)

void UString::reserve( uint num )

Ensures that at least num characters are available for this string. Users of UString should generally not need to call this; it is called by append() etc. as needed.

void UString::reserve2( uint num )

Equivalent to reserve(). reserve( num ) calls this function to do the heavy lifting. This function is not inline, while reserve() is, and calls to this function should be interesting wrt. memory allocation statistics.

Noone except reserve() should call reserve2().

UString UString::simplified() const

Returns a copy of this string where each run of whitespace is compressed to a single space character, and where leading and trailing whitespace is removed altogether. Most spaces are mapped to U+0020, but the Ogham space dominates and ZWNBSP recedes.

Unicode space characters are as listed in http://en.wikipedia.org/wiki/Space_character

bool UString::startsWith( const UString & prefix ) const

Returns true if this string starts with prefix, and false if it does not.

bool UString::startsWith( const char * prefix ) const

Returns true if this string starts with prefix, and false if it does not. prefix must be an ASCII or 8859-1 string.

UString UString::titlecased() const

Returns a titlecased version of this string. Usable for case-insensitive comparison, not much else.

UString UString::trimmed() const

Returns a copy of this string without leading or trailing whitespace.

void UString::truncate( uint l )

Truncates this string to l characters. If the string is shorter, truncate() does nothing. If l is 0 (the default), the string will be empty after this function is called.

EString UString::utf8() const

Returns an UTF8-encoded version of this UString. The string is null-terminated for easy debugging, but remember that it may also contain embedded nulls.

UString::~UString()

Destroys the string. Doesn't free anything.

This web page based on source code belonging to The Archiveopteryx Developers. All rights reserved.