JTidy goes into infinite loop on specific input document:
http://www.takeovers.govt.nz/enforcement/decisions/2004/meeting-wrightson.php
When we call tidy.parse() the stack traces ends in many calls to Node.checkNodeIntegrity()
and the CPU is pegged at 100%
We're using the latest version of JTidy (r938). I've attached a copy of the input document
which triggers the behaviour.
Hopefully it's not too difficult to fix :)
Many thanks,
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Copy of input which causes infinite loop
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
And here's a more minimal document which exhibits the problem:
View and moderate all "bugs Discussion" comments posted by this user
Mark all as spam, and block user from posting to "Bugs"
Sorry - there was typo in the version I gave to Francis there... Make that...
(i.e. the extra opening html tag is not required, and the close html can be present)
From a quick investigation the problem seems to be that parser is producing a cycle of br tags (with A followed by B and B followed by A) below the dd tag.
e.g.
[Node type=RootNode,element=null,content=
[Node type=StartTag,element=html,content=
[Node type=StartTag,element=head,content=
[Node type=StartTag,element=title,content=null]],
[Node type=StartTag,element=body,content=
[Node type=TextNode,element=null,text="",content=null],
[Node type=StartTag,element=dl,content=
[Node type=StartTag,element=dd,content=
[Node type=StartTag,element=br,content=null],
[Node type=StartTag,element=br,content=null],
[Node type=StartTag,element=br,content=null],
[Node type=StartTag,element=br,content=null],
[Node type=StartTag,element=br,content=null],
[Node type=StartTag,element=br,content=null],
[Node type=StartTag,element=br,content=null],
[Node type=StartTag,element=br,content=null],
...
Though not a proper fix, this patch will detect the cycle and throw a RuntimeException (and will also limit the loop in toString to help see what's happening as above).
Index: src/main/java/org/w3c/tidy/Node.java
--- src/main/java/org/w3c/tidy/Node.java (revision 1261)
+++ src/main/java/org/w3c/tidy/Node.java (working copy)
@@ -1311,7 +1311,11 @@
+
if (child.parent != this || !child.checkNodeIntegrity())
{
return false;
}
@@ -1347,8 +1351,15 @@
String s = "";
Node n = this;
int loopLimit = 1024;
while (n != null)
{
s += "[Node type=";
s += NODETYPE_STRING[n.type];
s += ",element=";
Last edit: Anonymous 2014-01-22