Activity for HtmlCleaner

  • Scott Wilson Scott Wilson posted a comment on discussion Open Discussion

    Hi Simon, I think it's a good time to talk about the future of the library in general. I'm aware that I don't have a lot of time available to maintain it, and these days I'm doing 99% Python development in my day job, so each time I have to review a patch my first task is to get a Java setup working again! Certainly moving the project over to Github is something that's been on my mind for a while as there are clear advantages there. I think in general I'd be happier if people who depend on HtmlCleaner...

  • Simon Urli Simon Urli posted a comment on discussion Open Discussion

    Hello, I recently started trying to contribute directly to the code of HtmlCleaner for fixing a bug I reported, and I discovered that it wasn't that easy to contribute and get feedbacks on sourceforge. At least, not as easy as it can be on Github (or Gitlab). And I'm wondering if moving the project couldn't help get more contributions and have bug fixed more quickly. To give a bit of context, I'm one of the core committer of XWiki (https://www.xwiki.org) whose code in available on Github (https://github.com/xwiki)....

  • Michael Hamann Michael Hamann posted a comment on ticket #241

    I forgot to mention, this is with HTML 5 tag definitions, with HTML 4 the example input is kept as-is.

  • Michael Hamann Michael Hamann created ticket #241

    Wrong children of dl incorrectly wrapped in div

  • Mikhail Dvorkin Mikhail Dvorkin created ticket #240

    Behaviour on unknown tags depends on capitalization of letters

  • Simon Urli Simon Urli posted a comment on ticket #239

    I tried to work on that issue, I think I actually made too much changes: in particular I saw that XmlSerializer#dontEscape is used both for knowing if the content needs to be escaped and to know if CDATA should be added, which is a problem here as we still don't want to escape the content even without a CDATA. So I think same problem might apply to DomSerializer, in which case my code is probably wrong and I might miss adding a unit test somewhere.

  • Simon Urli Simon Urli created ticket #239

    CDATA added for any kind of scripts even for application/json ones

  • Simon Urli Simon Urli created ticket #238

    Various tags incorrectly not marked as phrasing content in HTML5

  • Simon Urli Simon Urli posted a comment on ticket #228

    @scottwilson it seems you forgot to close that one: I can see a commit related to it before release 2.28, see https://sourceforge.net/p/htmlcleaner/code/595/

  • Dave Dave posted a comment on ticket #94

    Sure, let me see what I can do.

  • Scott Wilson Scott Wilson modified ticket #161

    PrettyHtmlSerialiser drops trailing space in tags

  • Scott Wilson Scott Wilson posted a comment on ticket #161

    Duplicate of 162.

  • Scott Wilson Scott Wilson modified ticket #174

    Building fails on JDK < 8 because of -Xdoclint:none flag to Maven Javadoc plugin

  • Scott Wilson Scott Wilson posted a comment on ticket #174

    As Java 7 is no longer supported I guess this is no longer so important.

  • Scott Wilson Scott Wilson posted a comment on ticket #94

    It's definitely something I keep coming back to, though usually there's something more pressing! If you wanted to submit a patch upgrading some of the more problematic classes that would be fantastic. I suspect most users will be directly interacting with TagNode the most, or possibly the serialisers, so those would probably be the first to apply updated patterns to.

  • Ralf Purnhagen Ralf Purnhagen posted a comment on discussion Open Discussion

    Thank you for your quick response!

  • Dave Dave posted a comment on ticket #94

    To modify my original suggestion, it's probably best to just return the object wrapped via an appropriate Collections method, e.g., Collections.unmodifiableList.

  • Scott Wilson Scott Wilson modified ticket #237

    customizing javadocExecutable in pom.xml breaks the build

  • Scott Wilson Scott Wilson posted a comment on ticket #237

    Fixed in 2.29

  • Scott Wilson Scott Wilson posted a comment on discussion Open Discussion

    This has a fix for CVE-2023-34624: Stack overflow with excessive nested tags. Also a fix for bug 237: customizing javadocExecutable in pom.xml breaks the build Thanks to niol, PoppingSnack, and Ralf Purnhagen for bug reports and contributions for this release! Note the addition of a maxDepth cleaner property that defines the maximum nested tag depth. The default is 1000.

  • Scott Wilson Scott Wilson posted a comment on discussion Open Discussion

    OK, fixed. There's a new release, 2.29, that implements an arbitrary maximum nesting depth. Its configurable via cleaner properties.

  • Scott Wilson Scott Wilson posted a comment on discussion Open Discussion

    OK, fixed. There's a new release, 2.29, that implements an arbitrary maximum nesting depth. Its configurable via cleaner properties.

  • HtmlCleaner HtmlCleaner released /htmlcleaner/htmlcleaner v2.29/htmlcleaner-2.29.zip

  • HtmlCleaner HtmlCleaner released /htmlcleaner/htmlcleaner v2.29/htmlcleaner-gui-2.29.zip

  • HtmlCleaner HtmlCleaner released /htmlcleaner/htmlcleaner v2.29/htmlcleaner-src-2.29.zip

  • Scott Wilson Scott Wilson committed [r609] on Code

    [maven-release-plugin] prepare for next development iteration

  • Scott Wilson Scott Wilson committed [r608] on Code

    [maven-release-plugin] copy for tag htmlcleaner-2.29

  • Scott Wilson Scott Wilson committed [r607] on Code

    [maven-release-plugin] prepare release htmlcleaner-2.29

  • Scott Wilson Scott Wilson committed [r606] on Code

    fake commit

  • Scott Wilson Scott Wilson committed [r605] on Code

    fix: Removed JavaDoc extension code - see bug #237. Thanks to niol for the report.

  • Scott Wilson Scott Wilson committed [r604] on Code

    fix: Updated POM version in GUI

  • Scott Wilson Scott Wilson committed [r603] on Code

    fix: Implemented a nesting limit to address CVE-2023-34624

  • Scott Wilson Scott Wilson posted a comment on discussion Open Discussion

    Hi Ralf, Slightly puzzling that it points to the Github fork from 10 years ago while referring to v2.28! I'll take a look at it.

  • Ralf Purnhagen Ralf Purnhagen modified a comment on discussion Open Discussion

    A couple of days ago CVE-2023-34624 has been published. Do you plan to release a fix for this CVE in a next version of HtmlCleaner? Thanks

  • Ralf Purnhagen Ralf Purnhagen posted a comment on discussion Open Discussion

    A couple of days ago CVE-2023-34624 has been published. Do you plan to release a fix for this CVE in a next version of HtmlCleaner? Best regards Ralf

  • Scott Wilson Scott Wilson modified ticket #237

    customizing javadocExecutable in pom.xml breaks the build

  • Scott Wilson Scott Wilson posted a comment on ticket #237

    Thanks - I'll remove it and re-release

  • niol niol created ticket #237

    customizing javadocExecutable in pom.xml breaks the build

  • Scott Wilson Scott Wilson posted a comment on discussion Open Discussion

    That was weird, I definitely uploaded them at the time! Oh well, I've redone the files.

  • HtmlCleaner HtmlCleaner released /htmlcleaner/htmlcleaner v2.28/htmlcleaner-gui-2.28.zip

  • HtmlCleaner HtmlCleaner released /htmlcleaner/htmlcleaner v2.28/htmlcleaner-2.28-src.zip

  • niol niol posted a comment on discussion Open Discussion

    Hi, I cannot find the release zip following the links in the download page. Am I missing something?

  • Joseph Andres Jr Joseph Andres Jr posted a comment on ticket #236

    "<b>Hello</b>" becomes " **Hello** " using PrettyHtmlSerialiser

  • Joseph Andres Jr Joseph Andres Jr created ticket #236

    added whitespace when using tags

  • Scott Wilson Scott Wilson posted a comment on discussion Open Discussion

    🎉 I just realised it's been nearly 10 years since I took over the reins of HtmlCleaner, making my first release as the 'new' maintainer back in May 2013. The project itself was started by Vladimir Nikic back in 2006! I've not always been as responsive as I'd like to be over the years as I've had a lot of other things going on, and I've more or less given up on the idea of doing a major architecture overhaul and rewrite for 'v3.0', but sometimes slow change is good. I'm also still using HC as a command...

  • Scott Wilson Scott Wilson posted a comment on discussion Open Discussion

    April. 29, 2023: HtmlCleaner release 2.28 228 svg incorrectly not marked as phrasing content in HTML5 229 style-tag should not be allowed in body in HTML5 230 Div element wrongly filtered out from dl children when using HTML 5 231 SVG moved after <p> elements Thanks to jlacour31, Simon Urli and Michael Hamann for bug reports and contributions for this release!

  • Scott Wilson Scott Wilson committed [r602] on Code

    [maven-release-plugin] prepare for next development iteration

  • Scott Wilson Scott Wilson committed [r601] on Code

    [maven-release-plugin] copy for tag htmlcleaner-2.28

  • Scott Wilson Scott Wilson committed [r600] on Code

    [maven-release-plugin] prepare release htmlcleaner-2.28

  • Scott Wilson Scott Wilson committed [r599] on Code

    doc: completed missing javadoc comments

  • Scott Wilson Scott Wilson committed [r598] on Code

    fix: remove accidental import!

  • Scott Wilson Scott Wilson committed [r597] on Code

    Fix: Added newline to manifest.

  • Scott Wilson Scott Wilson committed [r596] on Code

    Fix: Implemented test for bug 230. Thanks to Simon Urli for the report.

  • Scott Wilson Scott Wilson posted a comment on ticket #230

    Fixed; will be in 2.28 release.

  • Scott Wilson Scott Wilson posted a comment on ticket #231

    Fixed and will be in v2.28 release.

  • Scott Wilson Scott Wilson posted a comment on ticket #228

    228 is now fixed and will be in release version 2.28. Odd coincidence!

  • Scott Wilson Scott Wilson committed [r595] on Code

    Fix: Implemented fix for bug 228, where SVG was not marked as phrasing content. Thanks to Michael Hamann for the report.

  • Scott Wilson Scott Wilson modified ticket #232

    Wrong parsing

  • Scott Wilson Scott Wilson modified ticket #230

    Div element wrongly filtered out from dl children when using HTML 5

  • Scott Wilson Scott Wilson modified ticket #231

    SVG moved after <p> elements

  • Scott Wilson Scott Wilson posted a comment on ticket #229

    Fixed; will be in 2.28 release

  • Scott Wilson Scott Wilson committed [r594] on Code

    Fix: Implemented fix for bug 229, removing STYLE tags from the document body in HTML5 as per WHATWG. Thanks to Michael Hamann for the report.

  • Scott Wilson Scott Wilson committed [r593] on Code

    Fix: Implemented fix for but 234. Thanks to Michael Hamann for the report and fix.

  • Scott Wilson Scott Wilson modified ticket #21

    Patches to build on openjdk 15

  • Scott Wilson Scott Wilson modified ticket #22

    [Patch] CVE-2021-33813 vulnerability update

  • Scott Wilson Scott Wilson modified ticket #174

    Building fails on JDK < 8 because of -Xdoclint:none flag to Maven Javadoc plugin

  • Scott Wilson Scott Wilson modified ticket #229

    style-tag should not be allowed in body in HTML5

  • Scott Wilson Scott Wilson modified ticket #228

    svg incorrectly not marked as phrasing content in HTML5

  • Scott Wilson Scott Wilson modified ticket #232

    Wrong parsing

  • Scott Wilson Scott Wilson modified ticket #235

    Support for modern JDK?

  • Scott Wilson Scott Wilson posted a comment on ticket #228

    Hi both! Always happy to help. I'll take a look at these next week. If it's just a case of tweaking the tag provider and going through unit tests that shouldn't take much work. I was away from the project for a while (lots happening in the day job) and then found it hard to get back into it again as new versions of Eclipse were making things rather difficult. Since I moved over to IntelliJ I'm feeling a lot more productive, and that I can now dip back into HC and start fixing things again with a...

  • Michael Hamann Michael Hamann posted a comment on ticket #228

    Similar bugs also affect the math, embed, img, data, object, picture, video, iframe, and q tags. All of them are phrasing content in the current HTML standard but not allowed as phrasing content in HtmlCleaner (at least not as children of tags like strong). See https://github.com/xwiki/xwiki-commons/blob/ce23e117d1cd1515250855eab9bcd7226e66a72f/xwiki-commons-core/xwiki-commons-xml/src/main/java/org/xwiki/xml/internal/html/XWikiHTML5TagProvider.java how we currently modify the HTML 5 tag definitions...

  • Vincent Massol Vincent Massol posted a comment on ticket #228

    Hi @scottwilson. Long time no speak! How are you? We have several issues reported quite a while ago, like this one (also reported at https://sourceforge.net/p/htmlcleaner/bugs/231/) or https://sourceforge.net/p/htmlcleaner/bugs/230/ and are wondering if we could expect some fixes. Anything we could do to help out? Thank you very much, you've always been very helpful to the XWiki project. -Vincent

  • Scott Wilson Scott Wilson posted a comment on discussion Open Discussion

    The open and close tags, rather than self-closing tags, are how HTML is typically serialised. If you want to omit the XML tag, use props.setOmitXmlDeclaration(true);

  • Nazarets Danylo Nazarets Danylo posted a comment on discussion Open Discussion

    no, I got some like this. <p> Text for text, <custom><custom/>, <another><another/> 1&gt;0 5&lt;10 </p> But, I fixed this using next code: HtmlCleaner cleaner = new HtmlCleaner(); CleanerProperties props = cleaner.getProperties(); props.setTreatUnknownTagsAsContent(true); props.setNamespacesAware(false); props.setOmitHtmlEnvelope(true); props.setOmitXmlDeclaration(false); TagNode node = cleaner.clean(input); String result = new CompactHtmlSerializer(props).getAsString(node); return result.replaceAll("<\\?xml.*\\?>",...

  • Scott Wilson Scott Wilson posted a comment on discussion Open Discussion

    Unless I'm missing something (Sourceforge does odd things with formatting) then I think thats a reasonable interpretation of the snippet. I presume what you get is something like this? <p> Text for text, <custom/>, <another/> 1&gt;0 5&lt;10 </p>

  • Nazarets Danylo Nazarets Danylo modified a comment on discussion Open Discussion

    I have input some like this <p> Text for text, <custom>, <another/> 1>0 5<10 </p> But replace brackets only for "1>0 5<10" and create <custom> with close slash.</custom>

  • Nazarets Danylo Nazarets Danylo posted a comment on discussion Open Discussion

    I have input some like this <p> Text for text, <custom>, <another/> 1>0 5<10 </p> But replace brackets only for "1>0 5<10"

  • Ruoyu Zhong Ruoyu Zhong posted a comment on ticket #235

    Thanks again, @scottwilson! Yes, I can confirm that the build works fine now.

  • Scott Wilson Scott Wilson posted a comment on ticket #235

    That'll teach me to read my own release docs - I'd constructed the release hierarchy incorrectly. I've been meaning to write a proper release script for some time, maybe this will give me the incentive to actually do it. Hopefully you can get a package build from src now.

  • HtmlCleaner HtmlCleaner updated /htmlcleaner/htmlcleaner v2.27/htmlcleaner-2.27-src.zip

  • Scott Wilson Scott Wilson posted a comment on ticket #235

    Hmm, when I do a clean package from the project root all is well, but not from the releases/src generated from it. I'll investigate further.

  • Ruoyu Zhong Ruoyu Zhong posted a comment on ticket #235

    Thanks for the quick response! Unfortunately now it's org.htmlcleaner.CommandLine that's missing. I inspected the jarfile and noticed that no classes are present. Any ideas? I noticed some warnings in the build which may be useful: [INFO] Scanning for projects... [WARNING] The project net.sourceforge.htmlcleaner:htmlcleaner:bundle:2.27 uses prerequisites which is only intended for maven-plugin projects but not for non maven-plugin projects. For such purposes you should use the maven-enforcer-plugin....

  • Scott Wilson Scott Wilson committed [r592] on Code

    Added location of sonatype repo to release docs so I don't have to keep googling it

  • Scott Wilson Scott Wilson committed [r591] on Code

    Update version of gui

  • Scott Wilson Scott Wilson committed [r590] on Code

    Update license dates

  • Scott Wilson Scott Wilson posted a comment on ticket #235

    OK, give it another try. Its the first time I've created a release for a long time, so I was bound to do something wrong :D

  • HtmlCleaner HtmlCleaner updated /htmlcleaner/htmlcleaner v2.27/htmlcleaner-2.27-src.zip

  • Scott Wilson Scott Wilson posted a comment on ticket #235

    I'll fix that now!

  • Scott Wilson Scott Wilson posted a comment on ticket #235

    Argh, that's entirely my fault. I included the wrong MANIFEST.MF when building the zips

  • Scott Wilson Scott Wilson posted a comment on ticket #235

    That is very strange!

  • Ruoyu Zhong Ruoyu Zhong posted a comment on ticket #235

    Hi @scottwilson, we've encountered test failure in https://github.com/Homebrew/homebrew-core/pull/126548 . Below is the output: ==> /opt/homebrew/Cellar/htmlcleaner/2.27/bin/htmlcleaner src=/private/tmp/htmlcleaner-test-20230324-62611-2muxl1/index.html Picked up _JAVA_OPTIONS: -Duser.home=/Users/brew/Library/Caches/Homebrew/java_cache -Djava.io.tmpdir=/private/tmp Error: Could not find or load main class org.htmlcleaner.GUI Caused by: java.lang.ClassNotFoundException: org.htmlcleaner.GUI We are building...

  • Ruoyu Zhong Ruoyu Zhong modified a comment on ticket #235

    Thanks, @scottwilson! I opened https://github.com/Homebrew/homebrew-core/pull/126548 . I'll let you know how it goes.

  • Ruoyu Zhong Ruoyu Zhong posted a comment on ticket #235

    Thanks, @scottwilson! I opened https://github.com/Homebrew/homebrew-core/pull/126548. I'll let you know how it goes.

  • Scott Wilson Scott Wilson posted a comment on ticket #235

    I've updated the POM to target 1.8 and made a release (2.27). Hopefully that solves the problem!

  • HtmlCleaner HtmlCleaner released /htmlcleaner/htmlcleaner v2.27/htmlcleaner-gui-2.27.zip

  • HtmlCleaner HtmlCleaner released /htmlcleaner/htmlcleaner v2.27/htmlcleaner-2.27.zip

1 >