Bilal Taha - 2024-12-30

I have a Java method that uses HTMLCleaner version 2.4. Due to high CVE vulnerabilities, I need to update HTMLCleaner to version 2.29. However, the generated HTML output from version 2.29 contains nested tags, which is incorrect for my use case. For example, given the input:

<aggregationResponse>
  <vanguardAssets>323328.33</vanguardAssets>
  <totalBalanceAsOfDate>2024-12-20T16:00:00.000-05:00</totalBalanceAsOfDate>
  <account>
    <accountName>Stam- 1234</accountName>
    <accountType>TRADITIONAL_IRA</accountType>
  </account>
</aggregationResponse>

The output using version 2.4 is:

<html>
<head>
  <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
  <aggregationresponse>
    <vanguardassets>323328.33</vanguardassets>
    <totalbalanceasofdate>2024-12-20T16:00:00.000-05:00</totalbalanceasofdate>
    <account>
      <accountname>Stam- 1234</accountname>
      <accounttype>TRADITIONAL_IRA</accounttype>
    </account>
  </aggregationresponse>
</body>
</html>

While the output with version 2.29 is with nested tags:

<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
    <aggregationresponse>
        <vanguardassets>323328.33
            <totalbalanceasofdate>2024-12-20T16:00:00.000-05:00
                <account>
                    <accountname>Stam- 1234
                        <accounttype>TRADITIONAL_IRA</accounttype>
                    </accountname>
                </account>
            </totalbalanceasofdate>
        </vanguardassets>
    </aggregationresponse>
</body>
</html>

this is my code:

public static String setSourceTidyTag1(String htmlDoc)   {
        HtmlCleaner htmlCleaner = new HtmlCleaner();
        CleanerProperties cleanerProperties = htmlCleaner.getProperties();
        TagNode tagNode = htmlCleaner.clean(htmlDoc);
        org.w3c.dom.Document doc;
        StringWriter writer = new StringWriter();
        Transformer transformer;
        try {
            CleanerProperties cleanerProps=new CleanerProperties();
            cleanerProps.setRecognizeUnicodeChars(false);
            doc = new DomSerializer(cleanerProps).createDOM(tagNode);
            DOMSource domSource = new DOMSource(doc);
            TransformerFactory tf = TransformerFactory.newInstance();
            transformer = tf.newTransformer();
            StreamResult result = new StreamResult(writer);
            transformer.transform(domSource, result);
        } catch (ParserConfigurationException e) {
        } catch (TransformerConfigurationException e) {
        } catch (TransformerException e) {
        } catch( Exception e) {
        }
        return writer.toString();
    }

how to make the same code with version 2.29 return the same output as 2.4

this start sincve version 2.19

here stackoverflow issue:
https://stackoverflow.com/questions/79305174/resolving-nested-tags-issue-in-html-output-when-upgrading-htmlcleaner-from-versi

 

Last edit: Bilal Taha 2024-12-30