The ISO-8859-1 character encoding (ASCII) is known to have compatibility issues across different computers from different countries (different default languages), with certain language-specific special characters getting garbled. This is the reason Unicode (UTF-8) was invented in the first place.
Correct on UTF-8. Unfortunate this inability to decipher exists within the engine, but it is what it is. I dug up the old conversation I had and it had to do with the CT Historical Immersion and the é character. The differences in codepages can also be a problem for characters that reside outside those consistent, first 128 characters. The é in Windows-1252 is code 233 and is one byte. That 233 is not the same character across different codepages, so you don't get the same result when displayed on-screen. And in in UTF-8, 233 is comprised of 4 bytes, but since this source code always does any sort of character string comparisons for equality, assuming each char within a string is 1 byte, then we also run into problems when determining stringa == stringb.
Since so much of the code presumes each char equals one byte, the only tenable solution was to convert any encountered UTF-8 to the engine's default codepage. But during the course of my attempt, I then broke Russian language, because the game's codepage is different, and there is a Russian modding team using it right now. Given the immensely scattered code doing ReadFile calls and no central handler, that is when I did that quick hack and made an option for the Font.ini file that if flagged, will always assume UTF-8 will convert to our base codepage, because there can be no consistency for what country, codepage, or whether any of the files were saved as UTF-8, ISO-?????, Windows-1251, Windows-1252 and trying to handle and convert on the fly, given the thousands of different ReadFile calls, scatter across many different modules where no initial state of what file was actually in process and what codepage it might be was too much. So for now, this flag I implemented was small, quick and minor in that only the Font display method, when passed a string value for output to screen will be converted, just one way, if my flag is true, and for the rest of us, including the Russians, they can omit the flag and it will operate as it always has for those mapped screen font displays.
It is also possible, as Chez has pointed out, that the engine/interpreter itself has a code processing bug:
I'm not sure what all this really means.
@ChezJfrey would need to clarify this. Why is the null terminator for the strings getting overwritten, and what "non-relevant memory/garbage" is the engine overreaching into?
Why? Because a bug. This source heavily relies on old char string functions like strcpy, strlen. What often happens, is that a string will pass somewhere, and the recipient method will want a copy, so it will want to keep a copy. So it will get the length, size_t len = strlen(x). Then allocate enough memory, char *y = new char[len]...but we must keep in mind that the memory allocated must contain one extra character for null terminating strings, because that is the standard that something like strlen relies on; it will iterate each character until it reaches null and that's how it knows the string is done and how many chars exist. So that allocation is actually supposed to be new char[len+1]. Then when copying, there is enough space to place a null. In the case of the options, there is a case where that extra null was not provided for. Then, what happens is that string without a null, sits in memory with say 2 bytes x = "ab", with no null. It gets passed along, then somewhere else strlen(x) and when it starts to iterate, it finds 'a' count 1, 'b' count 2, but now no null, so it keeps going to whatever the adjacent memory, byte by byte until it (hopefully) encounters a byte with a zero...null. That is the new count 58, because it read through a bunch of irrelevant, "garbage" memory...it could presumably be very valid memory, being used by other allocations, but "garbage" for the purposes that it has no relevance to our char string pointer x...overreaching what we intended. Later, it asks to copy string x somewhere else, again, but now it copies our string x that was supposed to be 2 chars "ab", as 58 characters, "abíå íàøåë ëó÷øåãî, íî ó íàñ ÃÃ âñåãäà èìååò ýòî ÈÄ, ðàáîòàòü áóäåò..............." you get the idea.