mirror of
https://github.com/TerryCavanagh/VVVVVV.git
synced 2024-12-23 01:59:43 +01:00
3ce4735d50
This is a small library I wrote to handle UTF-8. Usage is meant to be as simple as possible - see for example decoding a UTF-8 string: const char* str = "asdf"; uint32_t codepoint; while ((codepoint = UTF8_next(&str))) { // you have a codepoint congrats } Or encoding a single codepoint to add it to a string: std::string result; result.append(UTF8_encode(0x1234).bytes); There are some other functions (UTF8_total_codepoints() to get the total number of codepoints in a string, UTF8_backspace() to get the length of a string after backspacing one character, and UTF8_peek_next() as a slightly less fancy version of UTF8_next()), but more functions could always be added if we need them. This will allow us to replace utfcpp (utf8::unchecked) and also fix some less-than-ideal code: - Some places have to resort to ignoring UTF-8 (next_wrap) or using UCS-4→UTF-8 functions (VFormat had to use PHYSFS ones, and one other place has four lines of code including a std::back_inserter just for one character) - The iterator stuff is kinda confusing and verbose anyway
35 lines
548 B
C
35 lines
548 B
C
#ifndef UTF8_H
|
|
#define UTF8_H
|
|
|
|
#include <stdbool.h>
|
|
#include <stddef.h>
|
|
#include <stdint.h>
|
|
|
|
#ifdef __cplusplus
|
|
extern "C"
|
|
{
|
|
#endif
|
|
|
|
typedef struct
|
|
{
|
|
char bytes[5];
|
|
uint8_t nbytes;
|
|
bool error;
|
|
}
|
|
UTF8_encoding;
|
|
|
|
|
|
uint32_t UTF8_peek_next(const char* s_str, uint8_t* codepoint_nbytes);
|
|
|
|
uint32_t UTF8_next(const char** p_str);
|
|
UTF8_encoding UTF8_encode(uint32_t codepoint);
|
|
|
|
size_t UTF8_total_codepoints(const char* str);
|
|
size_t UTF8_backspace(const char* str, size_t len);
|
|
|
|
|
|
#ifdef __cplusplus
|
|
} /* extern "C" */
|
|
#endif
|
|
|
|
#endif // UTF8_H
|