I have used <programlisting> content hypenation option in DocBook XSL/FO with my book, and noticed a strange (and serious) problem when the programlisting content is a Unix-like system command line.
DocBook XML:
<programlisting>command -S -longeropt -thisisaverylongprogramoption -optionwitharg "ARGUMENT" FILENAME1 FILENAME2</programlisting>
XSL Customization (similar to the one from DocBook XSL manual):
<xsl:param name="hyphenate.verbatim" select="1"/>
<xsl:attribute-set name="monospace.verbatim.properties" use-attribute-sets="verbatim.properties monospace.properties">
<xsl:attribute name="wrap-option">wrap</xsl:attribute>
<xsl:attribute name="hyphenation-character">»</xsl:attribute>
</xsl:attribute-set>
PDF:
command --S --longeropt --thisisaverylongprogramoption --
optionwitharg -"ARGUMENT" FILENAME1 FILENAME2
You would see that in the PDF output:
The desired hyphenation character ("»" a.k.a. ») is not inserted before the line break.
There is an extra dash "-" inserted in front of every word that already have dash (or a quote) in front of it.
As the problem appeared in DocBook's <programlisting> tag, this result is highly undesirable as the reader, when following instruction in the formatted book, will mistake the extra "-" as a part of command line, preventing the command from working (or causing system damage).
Test Docbook XML files, generated XSL-FO intermediate, output PDF file, reference HTML rendering, and build output are attached as longprogramlisting.zip. Counterexample with same document structure but different <programlisting> content, rendered with the same configuration (and not affected by this problem) is also provided in the same ZIP file.
I'm not sure whether this is DocBook XSL or Apache FOP's fault; please correct me if this report should be filed against FOP instead.
DocBook XML DTD: 4.2
DocBook XSL: 1.79.1
XSLT Processor: XSLTproc 1.1.28
XSL-FO Processor: Apache FOP 2.1
Runtime: Cygwin 1.7.28 32-bit
System: Microsoft Windows XP Professional SP3
Side note: DocBook XSLT doesn't seem to explicitly specify UTF-8 as XSL-FO encoding when inserting 0xC2 0xAD byte as soft hyphen sequence. FOP doesn't complain about it, though.
This is an issue with FOP, not DocBook XSL. When hyphenate.verbatim is set, the processor parses the text and inserts soft hyphen characters xAD to provide hyphenation points. However, FOP does not handle soft hyphens properly: it displays them. If you examine the FO output, what looks like a double dash is in fact a dash preceded by a soft hyphen. Unfortunately, FOP converts soft hyphens to hard hyphens in the PDF, making them all visible.
More information on hyphenate.verbatim to break long lines is here, including a mention that FOP does not support it:
http://www.sagehill.net/docbookxsl/FittingText.html#BreakLongLines
Regarding the output encoding, UTF-8 is the default encoding for XML, so it does not have to be specified unless you are using a different encoding.
Okay, soft hyphens seem to be a resonable explanation for the original problem; but are there any information on why the counterexample file (
longprogramlisting-eng.xml) worked flawlessly despite using the same formatting option? Or that case could also be concluded as inconsistency/bug in FOP part?Note: In
longprogramlisting-eng.fo, space between words from<programlisting>are also represented using0xC20xA00xC20xAD( ­) sequence, but there's no unwanted dash seen in rendered PDF file.Good point, so I did a bit more experimenting with the .fo files. It seems that FOP does not handle soft hyphens properly when they precede punctuation. When I removed the dash characters and the quote characters from the cmd.fo programlisting, the cmd example formatted like your eng example. It generated the xBB character at the end of the line and did not display the soft hyphens. I think you should report this inconsistent behavior to the FOP group. It seems they almost have it working, but not quite.
A bug report is now filed against FOP 2.1:
https://issues.apache.org/jira/browse/FOP-2655