UTF-8 is necessary
a free cross platform checksum utility, supports 58+ hash algorithms
                
                Brought to you by:
                
                    jonelo
                    
                
            
            
        
        
        
    Please add UTF-8 (both w or w/o BOM)  support by default for hashlist files. It is critical for such platform independent software. At the moment it uses (at least on Windows) local Windows codepage (in my case it's Windows-1251).
So I created testing folder, here's it's content:
| Filename | Size | 
|---|---|
| !!!ĀāĂ㥹.jpg | 531,96 KB | 
| Em—dash.jpg | 288,64 KB | 
| Latin.jpg | 203,07 KB | 
| Ελληνικά.jpg | 269,00 KB | 
| Кириллица.jpg | 397,73 KB | 
| العربية.jpg | 299,19 KB | 
| देवनागरी.jpg | 334,38 KB | 
| かな カナ.jpg | 397,11 KB | 
| 汉字 漢字.jpg | 348,61 KB | 
| 한글 조선글.jpg | 234,47 KB | 
Now I produce hashlist with jacksum -a adler32 -E base64 -m -O hash.txt -r NamesTest
Here's output: 
Jacksum: Meta-Info: version=1.7.0;algorithm=adler32;filesep=\;flags=r;encoding=base64;
Jacksum: Comment: created with Jacksum 1.7.0, http://jacksum.sourceforge.net
Jacksum: Comment: created on Fri Jul 30 20:07:15 MSK 2021
Jacksum: Comment: os name=Windows 10;os version=10.0;os arch=amd64
Jacksum: Comment: jvm vendor=Oracle Corporation;jvm version=25.301-b09
Jacksum: Comment: user dir=C:\Users\Kaiser\Pictures
Jacksum: Comment: param dir=NamesTest
NamesTest:
7L3pgw==    544726  !!!??????.jpg
LYvdpw==    295569  Em—dash.jpg
nnjwbA==    207941  Latin.jpg
CDhu2Q==    275456  ????????.jpg
O9xXbQ==    407278  Кириллица.jpg
tSWrSQ==    306373  ???????.jpg
CEgN2w==    342402  ????????.jpg
JxwXlA==    406643  ?? ??.jpg
fwCwVQ==    356975  ?? ??.jpg
2dXtaQ==    240099  ?? ???.jpg
So this haslist is just useless. Jacksum on Windows just can't handle some filenames. I insist that UTF-8 by default is necessary (of course, there could be parameter to use another specified codepage).
Thanks for attention.
Ticket moved from /p/jacksum/support-requests/15/
Workaround for Jacksum 1.7.0 in order to use UTF-8 for both input and output:
Some environments don't support UTF-8, so setting it as default could lead to unexpected behavior on those systems. The byte order has no meaning in UTF-8. The Unicode Standard permits the byte order mark (BOM) in UTF-8, but does not require or recommend its use. See also https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
Anyway, support for non-default charsets (including UTF-8) comes with the next major release.
Thanks! Workaround is very useful (maybe it would be great for incorporating it to FAQ).
But CLI output (not file) continues to replace chars with '?'. Any way to fix that?
I already know what BOM is. Some editors may be adding BOM sometimes, but it confuses the parser, unabling it to use hashlist file correctly. Read-only BOM support would be great, isn't it?
Last edit: Kaiser 2021-07-31
You're welcome!
AFAIK that is a limitation of the command prompt program. You could use the Window Terminal from Microsoft which comes with full UTF-8 character support. See also https://www.microsoft.com/store/productId/9N0DX20HK701
Once installed you can open the shell you want, and if you open the command prompt in Windows Terminal and change the code page to 65001 by typing
chcp 65001you should see all UTF-8 characters.Thanks for explaining the BOM use case. That makes sense to me. Could you please open a new Feature Request called "Add BOM support" so we can track progress with respect to BOM there?
Sure thing! I'll open a request on BOM.
Actually, we got different situations there. The issue with plain CMD is purely a matter of presentation (I suppose it has to do with font), but one can just copy/paste any Unicode output provided (using tools like RHash, eg).
But in case of Jacksum, in CLI output it actually replaces unavailable chars with U+003f (question marks). I suppose there's a room for improvement.
There's also troubles when pointing Unicode-containing filenames in CLI, I found no workaround here
I am pleased to announce that Jacksum 3 is on the web!
See also https://github.com/jonelo/jacksum
Download and release notes: https://github.com/jonelo/jacksum/releases/tag/v3.0.0
Jacksum 3 comes with charset support for both input and output and new options are available:
which gives the maximum of flexibility for files. So from now on UTF-8 is the default if no charset has been specified explicitly.
Both stdout and stderr still use the charset default of the terminal, because it could be that the users' terminal default is not UTF-8. In this case usually you have to tweak the terminal. In this case you can tell Jacksum that it should try to use UTF-8 also for stdout and stderr using the following options:
or simply
That was a really tricky one! The solution to this is to pass filenames through the pipe rather than as programm arguments. I put an example to demonstrate how this works with Jacksum 3 on Windows' cmd:
You first have to set the UTF-8 codepage which is 65001 in cmd. Then you pass a filename that contains unicode chars through the pipe by echo. You have to tell Jacksum that it should read the file list from the pipe and set the file-list-format to ssv which stands for space separated values. I am going to use that pattern also for the file browser integrations.
Closing this issue since all issues have been resolved.
May I ask you to file new feature requests on github?
Thanks & Regards,
Johann