This is a small library I wrote to handle UTF-8.
Usage is meant to be as simple as possible - see for example decoding
a UTF-8 string:
const char* str = "asdf";
uint32_t codepoint;
while ((codepoint = UTF8_next(&str)))
{
// you have a codepoint congrats
}
Or encoding a single codepoint to add it to a string:
std::string result;
result.append(UTF8_encode(0x1234).bytes);
There are some other functions (UTF8_total_codepoints() to get the
total number of codepoints in a string, UTF8_backspace() to get the
length of a string after backspacing one character, and
UTF8_peek_next() as a slightly less fancy version of UTF8_next()), but
more functions could always be added if we need them.
This will allow us to replace utfcpp (utf8::unchecked) and also fix
some less-than-ideal code:
- Some places have to resort to ignoring UTF-8 (next_wrap) or using
UCS-4→UTF-8 functions (VFormat had to use PHYSFS ones, and one other
place has four lines of code including a std::back_inserter just for
one character)
- The iterator stuff is kinda confusing and verbose anyway