In UnicodeData.txt, there are some lines like
AC00;<Hangul Syllable, First>;Lo;0;L;;;;;N;;;;;
D7A3;<Hangul Syllable, Last>;Lo;0;L;;;;;N;;;;;
to specify a range 0xAC00--0xD7A3. It isn't parsed properly by unicode-char-prep.pl. Therefore, almost all hangul characters are not letters in unicode-letters.tex.
In fact, we don't need to load LineBreaks.txt to know whether a character is a letter or not. The macro \ID should not change the catcode. So we should also extract the information from
3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;
4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
9FCC;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;
20000;<CJK Ideograph Extension B, First>;Lo;0;L;;;;;N;;;;;
2A6D6;<CJK Ideograph Extension B, Last>;Lo;0;L;;;;;N;;;;;
2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
in UnicodeData.txt
The problem affects the use of our xeCJK package. I hope it will be fixed soon.
Anonymous
I don’t speak Perl, unfortunately (that script was written by Jonathan Kew of course), so patches are highly appreciated.
Does something like this work?
It generates 80+k new entries from the ranges.
Leo Liu, does this patch fix the issue for you?
Yes, rink's patch works. Thanks.
And I think the line
can be safely removed.
Thanks, I applied the patch and will update the
unicode-letters.texfile soon.