next_split_s() could potentially commit out-of-bounds indexing if the
amount of source data was bigger than the destination data.
This is because the size of the source data passed in includes the null
terminator, so if 1 byte is not subtracted from it, then after it passes
through the VVV_min(), it will index 1 past the end of the passed buffer
when null-terminating it.
In contrast, the other argument of the VVV_min() does not need 1
subtracted from it, because that length does not include a null
terminator (next_split() returns the length of the substring, after all;
not the length of the substring plus 1).
(The VVV_min() here also shortens the range of values to the size of an
int, but we'll probably make size_t versions anyway; plus who really
cares about supporting massively-sized buffers bigger than 2 billion
bytes in length? That just doesn't make sense.)
Just like is_positive_num(), an empty string is not a number.
I've also decided to unroll iteration 0 of the loop here so readability
is improved; this happens to also knock out the whole "accepting empty
string" thing, too.
To account for empty strings, we simply have to special-case them.
Simple as that.
This was also a problem with the previous std::string implementation of
this function; regardless, this is fixed now.
The current way "arrays" from XML files are loaded (before this commit
is applied) goes something like this:
1. Read the buffer of the contents of the tag using TinyXML-2.
2. Allocate a buffer on the heap of the same size, and copy the
existing buffer to it. (This is what the statement `std::string
TextString = pText;` does.)
3. For each delimiter in the heap-allocated buffer...
a. Allocate another buffer on the heap, and copy the characters from
the previous delimiter to the delimiter you just hit.
b. Then allocate the buffer AGAIN, to copy it into an std::vector.
4. Then re-allocate every single buffer YET AGAIN, because you need to
make a copy of the std::vector in split() to return it to the caller.
As you can see, the existing way uses a lot of memory allocations and
data marshalling, just to split some text.
The problem here is mostly making a temporary std::vector of split text,
before doing any actual useful work (most likely, putting it into an
array or ANOTHER std::vector - if the latter, then that's yet another
memory allocation on top of the memory allocation you already did; this
memory allocation is unavoidable, unlike the ones mentioned earlier,
which should be removed).
So I noticed that since we're iterating over the entire string once
(just to shove its contents into a temporary std::vector), and then
basically iterating over it again - why can't the whole thing just be
more immediate, and just be iterated over once?
So that's what I've done here. I've axed the split() function (both of
them, actually), and made next_split() and next_split_s().
next_split() will take an existing string and a starting index, and it
will find the next occurrence of the given delimiter in the string. Once
it does so, it will return the length from the previous starting index,
and modify your starting index as well. The price for immediateness is
that you're supposed to handle keeping the index of the previous
starting index around in order to be able to use the function; updating
it after each iteration is also your responsibility.
(By the way, next_split() doesn't use SDL_strchr(), because we can't get
the length of the substring for the last substring. We could handle this
special case specifically, but it'd be uglier; it also introduces
iterating over the last substring twice, when we only need to do it
once.)
next_split_s() does the same thing as next_split(), except it will copy
the resulting substring into a buffer that you provide (along with its
size). Useful if you don't particularly care about the length of the
substring.
All callers have been updated accordingly. This new system does not make
ANY heap allocations at all; at worst, it allocates a temporary buffer
on the stack, but that's only if you use next_split_s(); plus, it'd be a
fixed-size buffer, and stack allocations are negligible anyway.
This improves performance when loading any sort of XML file, especially
loading custom levels - which, on my system at least, I can noticeably
tell (there's less of a freeze when I load in to a custom level with
lots of scripts). It also decreases memory usage, because the heap isn't
being used just to iterate over some delimiters when XML files are
loaded.
This fixes a bug where "12" gets properly evaluated as 12, but "148"
gets evaluated as 1408. It's because `place` gets multiplied by `radix`
again, so `retval` gets multipled by 100 instead of 10.
There's no reason to have a `place` variable, so I've removed it
entirely. This simplifies the function a little bit.
This avoids an unnecessary copy of the input std::vector, since we don't
need to modify it for anything. This cuts down on unnecessary memory
operations.
Apart from the std::string, this function no longer uses the STL.
ss_toi() is a simple function - it converts the input into an int,
taking as many digits as possible until it reaches a non-digit
character, at which point it stops. It's trivial to implement this
without the STL.
I could've used Int() here, but that would've required copying the
string to a temporary buffer to insert a null-terminator (we can't just
use a pointer-and-length data type either, the string functions don't
operate like that - one disadvantage of C strings!). Instead, I decided
to implement my own conversion to int here, because I don't think the
way we humans write our Arabic numerals is going to change anytime soon.
Also, the std::string input is now passed by const reference, instead of
making a copy - cutting down on unnecessary memory operations.
I personally like putting the asterisk with the type, because despite
the language parsing the asterisk as a part of the name, the pointer
part is clearly a part of the return type of the function. Also,
put constness here, to indicate that the input won't be modified inside
the function.
This comment indicates that the function is used by
UtilityClass::GCString(). Which is unnecessary, because the reader can
trivially search for usages of GCChar in the file itself (the 'static'
preceding the function should be a good enough hint) - and if there
aren't any, then the reader will know the function is unused, whereas if
they read the comment, they would have been under the assumption that it
wasn't used. (There might also a compiler warning about it being unused,
which would be more confusing if the comment was still there.)
Point is, comments can get outdated, so removing the comment here makes
the code more self-documenting.
During 2.3 development, there's been a gradual shift to using SDL stdlib
functions instead of libc functions, but there are still some libc
functions (or the same libc function but from the STL) in the code.
Well, this patch replaces all the rest of them in one fell swoop.
SDL's stdlib can replace most of these, but its SDL_min() and SDL_max()
are inadequate - they aren't really functions, they're more like macros
with a nasty penchant for double-evaluation. So I just made my own
VVV_min() and VVV_max() functions and placed them in Maths.h instead,
then replaced all the previous usages of min(), max(), std::min(),
std::max(), SDL_min(), and SDL_max() with VVV_min() and VVV_max().
Additionally, there's no SDL_isxdigit(), so I just implemented my own
VVV_isxdigit().
SDL has SDL_malloc() and SDL_free(), but they have some refcounting
built in to them, so in order to use them with LodePNG, I have to
replace the malloc() and free() that LodePNG uses. Which isn't too hard,
I did it in a new file called ThirdPartyDeps.c, and LodePNG is now
compiled with the LODEPNG_NO_COMPILE_ALLOCATORS definition.
Lastly, I also refactored the awful strcpy() and strcat() usages in
PLATFORM_migrateSaveData() to use SDL_snprintf() instead. I know save
migration is getting axed in 2.4, but it still bothers me to have
something like that in the codebase otherwise.
Without further ado, here is the full list of functions that the
codebase now uses:
- SDL_strlcpy() instead of strcpy()
- SDL_strlcat() instead of strcat()
- SDL_snprintf() instead of sprintf(), strcpy(), or strcat() (see above)
- VVV_min() instead of min(), std::min(), or SDL_min()
- VVV_max() instead of max(), std::max(), or SDL_max()
- VVV_isxdigit() instead of isxdigit()
- SDL_strcmp() instead of strcmp()
- SDL_strcasecmp() instead of strcasecmp() or Win32 strcmpi()
- SDL_strstr() instead of strstr()
- SDL_strlen() instead of strlen()
- SDL_sscanf() instead of sscanf()
- SDL_getenv() instead of getenv()
- SDL_malloc() instead of malloc() (replacing in LodePNG as well)
- SDL_free() instead of free() (replacing in LodePNG as well)
This patch cleans up unnecessary exports from header files (there were
only a few), as well as adds the static keyword to all symbols that
aren't exported and are specific to a file. This helps the linker out in
not doing any unnecessary work, speeding it up and avoiding silent
symbol conflicts (otherwise two symbols with the same name (and
type/signature in C++) would quietly resolve as okay by the linker).
By "unnecessary qualifiers to self", I mean something like using the
'game.' qualifier for a variable on the Game class when you're inside a
function on the Game class itself. This patch is to enforce consistency
as most of the code doesn't have these unnecessary qualifiers.
To prevent further unnecessary qualifiers to self, I made it so the
extern in each header file can be omitted by using a define. That way,
if someone writes something referring to 'game.' on a Game function,
there will be a compile error.
However, if you really need to have a reference to the global name, and
you're within the same .cpp file as the implementation of that object,
you can just do the extern at the function-level. A good example of this
is editorinput()/editorrender()/editorlogic() in editor.cpp. In my
opinion, they should probably be split off into their own separate file
because editor.cpp is getting way too big, but this will do for now.
It's possible that SDL_atoi() could call the libc atoi(), and if a
string is provided that's too large to fit into an integer, then that
would result in undefined behavior. To avoid this, use SDL_strtol()
instead.
If necessary, the caller can provide a fallback to be returned in case
the input given isn't a valid integer, instead of having to duplicate
the is_number() check.
This is simply a wrapper function around SDL_atoi(), because SDL_atoi()
could call libc atoi(), and whether or not invalid input passed into the
libc atoi() is undefined behavior depends on the given libc. So it's
safer to just add a wrapper function that checks that the string given
isn't bogus.
std::isdigit() can be replaced with SDL_isdigit(), but there's no
equivalent for std::isxdigit() in the SDL standard library. So we'll
just use libc isxdigit() instead, it's fine.
Okay, so basically here's the include layout that this game now
consistently uses:
[The "main" header file, if any (e.g. Graphics.h for Graphics.cpp)]
[blank line]
[All system includes, such as tinyxml2/physfs/utfcpp/SDL]
[blank line]
[All project includes, such as Game.h/Entity.h/etc.]
And if applicable, another blank line, and then some special-case
include screwy stuff (take a look at editor.cpp or FileSystemUtils.cpp,
for example, they have ifdefs and defines with their includes).
That's how it should be done, because the SDL headers aren't going to be
installed in this repository. The game was a bit inconsistent before but
now it isn't anymore.
This ensures that endsWith() can be used outside of editor.cpp.
When leo60228 originally wrote endsWith(), it was static, but I asked
him on Discord just now and he more-or-less confirmed that it's fine if
it's not static. If it was static, it would be confined to
UtilityClass.cpp now instead!
Apparently it results in Undefined Behavior if the argument given isn't
representable as an unsigned char. This means that (potentially) plain
char and signed chars could be unsafe to use as well.
It's rare that this could happen in practice, though. std::isdigit() is
only used by is_positive_num() which is only used by find_tag(), so
someone would have to deliberately put something crazy after an `&#` in
a custom level file in order for this to happen. Still, better to be
safe than sorry and all.
This fixes a compile error that could happen where the compiler doesn't
know what std::isdigit() is, but I'm puzzled as to why this wasn't
happening earlier, 'cause I've only been reported that it happens by
only one person.
Unlike, say, the scriptclass i/j/k stuff, these tempstrings are only
purely visual, and there's no known glitches to manipulate them anyway.
Thus I think it's safe to make this cleanup.
This fixes a source of undefined behavior, where the int returned by
ss_toi() would be random garbage memory if the string passed into it
would be empty. That's because if the string is empty, there are no
characters to parse, so nothing simply gets put into x.
The easiest way to pass an empty string in to ss_toi() would be to use
script commands with empty arguments.
Previously, it was a hardcoded list going up to fifty.
This time, it's less hardcoded. Most of the time, it will take a number,
find out its tens-place word, find out its ones-place word, and combine
the two together. There are special cases for the teens, and the numbers
zero through nine, and one hundred.
Also, now if it is given a negative value, it will just return "???"
instead.
If there are more than one hundred it will still say "Lots".