kses._no_null() breaks japanese.
Status: Inactive
Brought to you by:
metaur
Hi.
I'm using geeklog with japanese EUC encoding.
When I updated geeklog 1.3.8-1 to 1.3.8-1 sr2,
input strings ware sometimes broken.
Geeklog uses kses since 1.3.8-1 sr2 and in geeklog's
function COM_CheckHTML() calls kses.Parse() and it
calls kses._no_null().
_no_null() removes '/\xad+/', but it is leagal letter
in japanese EUC encoding(0xa4ad is hiragana "ki" and
0xa5ad is katakana "ki").
I just comment out line below in kses.class.php.
--
$string = preg_replace('/\xad+/', '', $string); # deals
with Opera "feature"
--
I don't know why _no_null() removes 0xad but I have no
problem now.
Is there any advice?
# sorry for my poor english.
Logged In: YES
user_id=573278
Hi! Thanks for the bug report.
The problem is that the Opera browser accepts chr(173) in
the middle of URL protocols, so you can write something like a
href="java[character 173 here]script:alert(31337)".
It's mostly a problem in attribute values, though. Do you think
it would work for Japanese users if I would change kses so it
only removed that character from attribute values and not
from the whole document?
Are there any other problems with using kses in the Japanese
language?
// Ulf Harnhammar
Logged In: NO
Thank you for response.
>it would work for Japanese users if I would change kses so it
>only removed that character from attribute values and not
>from the whole document?
I think it is effective to check only URI of "href"
attribute, but not perfect.
There's Internationalizing Domain Names(IDN: RFC3490). It
allows Unicode domain name. It is not popular now but may be
popular in the future.
Note that some other attributes are often written in
japanese(<img alt="... etc.).
> Are there any other problems with using kses in the
Japanese language?
I use kses only by Geeklog(http://www.geeklog.net) and it
works nicely.
Logged In: YES
user_id=573278
Fixed in 0.2.2. It will be fixed in a better way in the next
version.