Page 1 of 1

encoding

Posted: Thu Aug 29, 2013 5:42 am
by Acid.OMG
what type of character encoding does ut99 use for gamer names?

Re: encoding

Posted: Thu Aug 29, 2013 8:15 am
by UTPe
Windows-1252 ?

Re: encoding

Posted: Thu Aug 29, 2013 10:52 am
by Feralidragon
I am not sure, but I would say perhaps UTF-8, which seems the most logical one to go with given their Linux support.

Re: encoding

Posted: Thu Aug 29, 2013 12:58 pm
by UTPe
Well, I took a look over some params. Don't ask me why but this is what I got from server logs...

the log of a Linux UT99 server v451 shows me this:
Init: Character set: ANSI

...while the log of a Windows UT99 server v451 says this:
Init: Character set: Unicode

I hope this may help you :?

Re: encoding

Posted: Fri Aug 30, 2013 12:42 am
by Feralidragon
But Unicode is not an "encoding", it's a specification on how to define the encodings themselves (such as UTF-8 and UTF-16 for example).

Re: encoding

Posted: Fri Aug 30, 2013 9:21 am
by Acid.OMG
Thanks for the help guys

Re: encoding

Posted: Fri Aug 30, 2013 12:13 pm
by anth
Linux servers use ASCII (not ANSI!) encoding for playernames and UTF-16LE for logfiles.
Windows uses UTF-16LE for both.

Re: encoding

Posted: Thu Sep 05, 2013 10:07 am
by Wormbo
I'm going to state UT2004 stuff, but perhaps it applies to UT1 as well. Text files (INI, INT) are first scanned for the BOM of UTF-16LE/BE. If found, the files are treated correspondingly, if not, then Windows-1252 is assumed for reading the file. Internally the game supports Unicode characters, but doesn't take advantage of that fact in most places. Yes, UT1 stat log files are UTF-16, but the main log file (and custom log files in UT200x) is Windows-1252. Actually the game simply treats bytes as characters and vice versa. The font textures are based on Windows-1252, though.

Re: encoding

Posted: Fri May 22, 2020 7:04 pm
by tgm1024
Ok, humongous necro, sorry.

I'm trying to dig up some inconsistencies in UT99 on linux, and part of the problem is that some of the stuff online I'm reading regards UTF-8 as an 8 bit-only encoding, which it isn't.

ASCII is 7 bits only ("Extended ASCII" isn't an official ASCII representation, though the term is still useful, as is "8 bit ASCII")
ISO-8859(-1) is 8 bits only
UTF-8 is 1-4 groups of 8bits. Yes, despite its name, a UTF-8 file can have 32bit wide characters, cascading down from the preceeding MSb's.

BTW, "UTF-16" is an oddball (1-2 groups of 16 bits) and is so seldom used, I can't find any information regarding it at all in .ini (or any other standard) use. No one uses it for hardly anything AFAICT. Some of the claims about UTF-16 I think might be spurious and are actually referring to UTF-8.

My question: Is there a problem regarding ISO-8859 that anyone has seen? The focus on windows encodings (due to its origin) makes sense, but *which* of the various encodings causes the least trouble on linux?

Thanks!