WebDjVuTextEd allows to edit the positioned text layer of OCR'ed DjVu documents in a web browser. You can modify the paragraph, line, word structure, create, delete, edit text nodes, modify their container box by mouse, run spellchecker. The program does not directly read the DjVu files, it requires exported text data and images. The server side is a very simple file save routine most of the editor is implemented in
JavaScript.
THIS WORKS ON CHROME ONLY - (More browser support may come later.)
Please try the Online Demo.
Webserver with PHP or ASP.NET required for server side support. Without webserver or your own installation, you can save the document by copy-paste the DjVu XML data from the browser or use "Save as..." to your computer.
You need to extract the XML from your book, copy to "data" dir.
djvutoxml mybook.djvu mybook.xml
To view book pages as background, you need to extract page image files too, copy images to e.g. "data/mybook".
ddjvu -format=tif mybook.djvu mybook.tif
Then explode the multipage TIFF to individual PNG files. On Linux:
mkdir mybook
for i in {0..129}; do convert "mybook.tif[$i]" mybook/mybook-$i.png; done
On windows you can use XnView's Tools -> Multipage File -> Extract all into... In the "tools" directory you can find a Bash shell script that helps extracting data from DjVu.
Point your browser to the installed WebDjVuTextEd DjVu editor, e.g. http: //localhost/webdjvutexted/
In "Load book" form enter the location of the DjVu XML file and the relative (to the XML) path to images and press Load book.
Example: Assuming you use the default data directory for your XML and a subdirectory for images:
data/gozgep_demo.xml
data/gozgep/gozgep-0.png
data/gozgep/gozgep-1.png
...

The Image name is a pattern, but if you enter the first page's file name, the pattern should be automatically created. The (last) number in the file name must be a page index, unless one single file is used. (If images start from "0", the pattern will contain "%", if images start from "1", pattern will contain "#". If numbers are padded with zeroes, there will be multiple %%% or ###. More info see [FileOpenSave] )
You can also choose the XML and PNG files from your PC using the file browse buttons. For the images select multiple files, the same "file pattern" system will be used.
The recommended use of the editor is to install on your own webserver and let PHP or ASP.NET save pages in the "data" directory.
Every DjVu the text layer consists of a structure like this (see DjVu spec)
PAGECOLUMN
|-REGION
|-PARAGRAPH
|-LINE
|-WORD
|-CHARACTER
Any of these may be the "last node" that contain text and coordinates of text in the underlaying image's coordinate system. A node that has child nodes cannot contain text with coordinates.
Content should be on this tree in reading order.
In most cases WORD contains a word in a box. But there are some documents that store every single character in a separeted box (not feasible to edit manually). Also there might be documents that contain LINEs only and text is written into LINE, but WORD is the most common level of separation.
For example, in Document Express, you can choose WORD and CHARACTER level separation.
You can see and edit the mentioned "tree struture" on the left side of the screen. To modify the tree, use right click menu on the tree or on the word boxes. The following options are available.
Note: since most common separation is WORD, the below referred "last node" is usually the WORD node, while container nodes are LINE, PARAGRAPH etc.
Note that some operations may leave useless empty tree nodes behind. (The editor does not know if you plan to use them.)
There are also several [HotKeys].
On the right side you'll see the page image and the overlapping text boxes with boundary border. When you select a box, 5 drag handles will appear. Please keep in mind that DjVu word boxes should not overlap and a "line" - holding maximum extent of all contained words - should not also overlap
with previous or next line - as DjVu standard mentions. This editor does not enforce this. According to my tests overlapping does not cause problems.
When you press "Spellcheck" button, the actual page will be checked and errors will be marked red. Clicking the error will present you suggestions. To remove spellchecking data from the current
page, you can press the "X" (so you can continue editing without getting word suggestions.)
Note that you can change the spell engine and language any time without reloading the book.
More info about the [SpellChecker] setup.
When you switch page, the program will send the whole document to the server to save, which should be able to write into the same XML file that you loaded. So in case you reload with F5, you will load your modified file.
When you are ready with the updates, you can write back the XML data into the DjVu file:
djvuxmlparser -o mybook.djvu mybook.xml
Happy DjVu editing!
Hungarian users, please look at my http://www.djvu.hu website for more information.
WebDjVuEd
(c) 2014-2015 Ferenc Veres, GPL v3
https://sourceforge.net/projects/webdjvutexted/
jquery & jquery-ui
(c) 2014 The jQuery Foundation, MIT license
http://jquery.com/
jquery-spellchecker
(c) 2012 Richard Willis, MIT license
https://github.com/badsyntax/jquery-spellchecker
jstree.com
(c) 2014 Ivan Bozhanov, MIT license
http://jstree.com/
FileSaver.js
(c) 2013 Eli Grey, MIT license
https://github.com/eligrey/FileSaver.js
shortcut.js
(c) Binny V A, BSD license
http://www.openjs.com/scripts/events/keyboard_shortcuts/
jQuery.scrollTo
(c) 2007-2015 Ariel Flesler, MIT license
http://flesler.blogspot.com/2007/10/jqueryscrollto.html
Wiki: FileOpenSave
Wiki: HotKeys
Wiki: MergeTwoColumns
Wiki: ReleaseNotes
Wiki: SpellChecker
Wiki: SplitToWords