Menu

#84 unicode-char-prep.pl wrongly parse UnicodeData.txt for hangul scripts.

v0.99992
closed
nobody
None
5
2014-08-29
2013-08-29
Leo Liu
No

In UnicodeData.txt, there are some lines like

AC00;<Hangul Syllable, First>;Lo;0;L;;;;;N;;;;;
D7A3;<Hangul Syllable, Last>;Lo;0;L;;;;;N;;;;;

to specify a range 0xAC00--0xD7A3. It isn't parsed properly by unicode-char-prep.pl. Therefore, almost all hangul characters are not letters in unicode-letters.tex.

In fact, we don't need to load LineBreaks.txt to know whether a character is a letter or not. The macro \ID should not change the catcode. So we should also extract the information from

3400;<CJK Ideograph Extension A, First>;Lo;0;L;;;;;N;;;;;
4DB5;<CJK Ideograph Extension A, Last>;Lo;0;L;;;;;N;;;;;

4E00;<CJK Ideograph, First>;Lo;0;L;;;;;N;;;;;
9FCC;<CJK Ideograph, Last>;Lo;0;L;;;;;N;;;;;

20000;<CJK Ideograph Extension B, First>;Lo;0;L;;;;;N;;;;;
2A6D6;<CJK Ideograph Extension B, Last>;Lo;0;L;;;;;N;;;;;
2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;

in UnicodeData.txt

The problem affects the use of our xeCJK package. I hope it will be fixed soon.

Discussion

  • Khaled Hosny

    Khaled Hosny - 2013-08-30

    I don’t speak Perl, unfortunately (that script was written by Jonathan Kew of course), so patches are highly appreciated.

     
  • rink

    rink - 2013-11-29

    Does something like this work?

    It generates 80+k new entries from the ranges.

     
  • Khaled Hosny

    Khaled Hosny - 2013-12-06

    Leo Liu, does this patch fix the issue for you?

     
  • Leo Liu

    Leo Liu - 2013-12-06

    Yes, rink's patch works. Thanks.

    And I think the line

    print " \\global\\catcode\\n=11" if m/ID/;
    

    can be safely removed.

     
  • Khaled Hosny

    Khaled Hosny - 2014-07-25
    • status: open --> closed
    • Group: Future --> v0.99992
     
  • Khaled Hosny

    Khaled Hosny - 2014-07-25

    Thanks, I applied the patch and will update the unicode-letters.tex file soon.

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.