This page contains the Autshumato ITE FAQ.
The Autshumato project was initiated by the South African Department of Arts and Culture, and developments are done by the Centre for Text Technology (CTexT®) at the North-West University (Potchefstroom Campus), in collaboration with the University of Pretoria.
The general aim of this project is the development of open source machine-aided translation tools and resources for South African languages. The term "open source" implies that every application developed in this project is freely available to the general public. This definition also extends to the source code of every application.
The objective of establishing this project as an open source project adheres to the South African National Government's policy and strategy for open source implementation. This policy specifies that all new software developed for government should be based on open standards. Furthermore, government also encourages and supports the use of open content and open standards within South Africa.
The main aims for this project are:
The translation tools developed in this project do not only aim at meeting the needs of the Department of Arts and Culture, but also that of a wide variety of South African citizens at various levels in a developing Information Society. This project therefore strongly contributes to the more rapid promotion of a culture of multilingualism in South Africa. It also contributes to language pride, and a consciousness of the importance of promoting, preserving and developing minority languages in South Africa. By involving native speakers of the indigenous languages in this project, the shortage of people who are knowledgeable about and trained in ICT is also partially addressed, as it empowers native speakers of local languages to partake in the growing local and global HLT industry.
Autshumato Integrated Translation Environment (ITE) is a free computer-aided translation (CAT) application. It provides a single translation environment that contains translation memory, machine translation and a glossary to facilitate the translation process. The Autshumato ITE is a derived work of the popular open source OmegaT CAT application.
Although Autshumato ITE is specifically developed for the eleven official South African languages, it is in essence language independent, and can be adapted for translation between any language pair. Autshumato ITE is implemented in the Java programming language, supports open file standards and is licensed under the GNU GPL version 2 or later.
OmegaT is a free translation memory application written in Java. It is a tool intended for professional translators. However, it does not translate for you! (Software that does this is called "machine translation", and you will have to look elsewhere for it.) OmegaT has the following features:
Find out more at: http://www.omegat.org
Anybody may download and use the complete software free of charge.
The Autshumato ITE was developed as a truly South African CAT tool. The focus of the software is to make freely available an environment that will make it easier for translators to work in the South African languages and to equip them with resources that will make their job easier and their efforts more effective.
The Autshumato ITE offers a translation environment through the popular OmegaT interface, and includes resources such as glossaries to see suggested translations for frequently used words, and translation memories to show previous translations for phrases. The target documents that are generated with the Autshumato ITE are also formatted according to the source documents so any text in bold or italics, inserted pictures and bulleted or numbered lists will remain as such in the target document. All of these resources are combined in a user-friendly interface, with the possibility of adding additional resources such as spelling checkers and even machine translation tools.
The Autshumato ITE was designed as a resource to aid translators in their work. It is by no means a way of replacing the valuable skills of a translator, but rather a way to save time on the repetition of work. The Autshumato ITE cannot and will never be able to deliver perfect translations on its own, and the translator should always be on the look-out for mistakes or corrections that need to be made to the suggested translations.
The Autshumato ITE is not a machine translation system that translates documents for you automatically; it can, however, connect to such services in order to aid in the translation process (please refer to the Autshumato ITE user manual or contact the developers for additional help in this regard).
The Autshumato ITE cannot automatically share translation memories and glossaries between several users over a network or the Internet. It is up to users to share and distribute these files if so desired.
As the Autshumato ITE is distributed free of charge, it does not contain spelling or grammar checkers. Please read the Autshumato ITE spelling checker download and installation procedure for complete instructions on downloading and installing spelling checkers.
Personal Computer (PC) with at least:
Operating system:
Follow the Download and Installation Procedure to guide you through the application download and installation on your computer.
By installing the Autshumato ITE on Windows, the installer will automatically check that the correct version of Oracle™ Java is installed. If a correct version is not found, it is installed automatically.
Mac and Linux users have to download and install Oracle™ Java manually by navigating to http://java.com/en/download/manual.jsp and selecting the appropriate installer to download.
Follow the Download and Installation Procedure to guide you through the application download and installation on your computer.
Read the Autshumato ITE spelling checker download and installation procedure for complete instructions.
The Autshumato ITE enables you to define easily which languages, language codes, country codes and diacritic characters should be available in the Autshumato ITE. To add more languages than those currently available in the Autshumato ITE interface, you will have to change the omegat.language.prefs file. Read the Autshumato ITE Reset Available Languages procedure for a basic guide in resetting the available languages. The procedures below provide a more descriptive and detailed explanation on inserting, removing and altering the available languages.
This file can be found in the installation directory (the location in which you installed the Autshumato ITE).
In the Autshumato ITE installation directory, you will find the "omegat.language.prefs" file. This file contains the languages that are present in the Autshumato ITE and the special characters that can easily be inserted using the Insert menu.
To customise the languages represented in the Autshumato ITE, you need to complete the following steps:
An example of the "omegat.language.prefs" file is given below.
# Here you set the Languages and Locale Codes used in the Autshumato ITE.
# The format of this file is as follows:
# Language Name [tab] Locale Code [tab] Locale Country [tab] Diacritic characters (Comma separated)
# EX: "Afrikaans afr ZA à,á,â,ã,ä,è,é,ê,ë,í,î,ï,ó,ô,ö,ù,ú,û,ü,ý"
# Remember to make a copy before editing this file and save in UTF-8 format.
#
# South African languages, ie the original A-ITE list of langs:
#
Afrikaans AFR ZA à,á,â,ã,ä,è,é,ê,ë,í,î,ï,ó,ô,ö,ù,ú,û,ü,ý
English ENG GB
IsiNdebele NBL ZA
IsiZulu ZUL ZA
IsiXhosa XHO ZA
Sesotho SOT ZA
Siswati SSW ZA
Setswana TSN ZA Š,š
Sepedi NSO ZA Š,š
Tshivenḓa VEN ZA Ḓ,Ḽ,Ṋ,Ṅ,Ṱ,ḓ,ḽ,ṋ,ṅ,ṱ
Xitsonga TSO ZA
To remove a language, simply remove the line containing the language information from the "omegat.language.prefs" file; refer to section 3.1.2.
To add a language, create a new line in the "omegat.language.prefs" file; refer to section 3.1.2.
The newly-entered language will now be available in the application. The special characters will also be available from the Insert menu. Also note that even though a language is not in the list, you can always type in the code manually. (nr-ZA in the case of isiNdebele):

Go to Insert, choose the language and click on the diacritic or special character that you wish to use:

After clicking on the diacritic character you chose, you should be able to see it inserted on your document:

Yes.
The ITE is language independent; it can translate between any two languages.
If there is an exact match for the current source segment in the translation memory, the ITE will insert the translation automatically. Disable this option in Project -> Properties if you do not want auto-propagation.
The ITE already auto-saves every three minutes and this can be adjusted by selecting Options -> Saving and Output....
To create a project in the Autshumato ITE, select Project -> New....

The "Create a New Project" dialog appears. You can now navigate to the folder in which you would like to create the project and specify a name for the project. Click on Save once you are satisfied with the project name and location.

Useful tip: We recommend creating a central folder on your computer for all your translation work. You can then create sub-folders for each project and organise the additional resources effectively in this central folder.
You should make sure that your project name is short, descriptive and to the point. Unnecessarily long project names will only cause problems.
You can either press [Enter] on the keyboard, or select Options -> Use TAB to Advance to use the [Tab] keyboard key to advance to the next segment. Similarly, [Shift] + [Tab] will activate the previous segment.
Go to Options -> Editing Behaviour.

Ensure that _"The source text" option is selected.

This option makes a copy of the source language in the active segment for easy editing. Changes will only be shown when you move to the next segment. Refer to the Autshumato ITE Altering the Editing Behaviour procedure for more information on the editing behaviour settings.
This error occurs when the Autshumato ITE does not have enough memory assigned in order to open large translation memories. By default, only 512Mb are assigned to the application, which may cause problems if you use large translation memories. The following error dialog will be shown when the error occurs:

The application will close after closing the Error dialog. Refer to the Autshumato ITE Resolving out of Memory Error procedure on how to resolve the issue.
Yes, you can. By copying the complete translation project from the other person and opening it with the ITE, you will be able to edit any of the translated segments and the translation memories will be updated.
Open the document in an application capable of reading such documents (Microsoft® Office 2007 or later, OpenOffice.org or LibreOffice) and then select Save As in the application. Now save the document as the relevant .docx, .xlsx or .pptx document. The saved document can now be opened in the ITE.
The Autshumato PDF Extractor is a utility application that extracts text from PDF documents with the aim of making it translatable. It is also able to extract the pages of PDF documents as PNG images. It is free to anyone and is licensed under the Apache License Version 2.0.
A copy of the original source file is created and stored in the Source folder of the project directory.
Go to Project -> Project Files and then choose the file you want to translate. The application will automatically open the next source document in the Editor if the last segment of the current document has been reached and you activate the next segment.
Refer to Chapter 15 (Source Segmentation) in the OmegaT user manual.
You can change the size or the font of the text by selecting Options -> Font.
No. Setting the font size in the ITE only applies to the ITE editing environment; it does not affect the font type or size of the translated document.
You can go to Edit -> Switch Case To -> Choose the option that you require. Alternatively, use the [Shift] + [F3] keyboard shortcut to cycle between cases.
Cycle case is when the case cycles between Upper, Lower and Title case as it is needed.
CTexT® spelling checkers are only compatible with Microsoft® Office. Download the open source spelling checkers from http://extensions.openoffice.org/ to use with the ITE. Refer to the spelling checker download and installation procedure for more detailed instructions.
At any time, if the target document gives an error about styles.xml, parsing error or error opening, then it is most likely a Tag problem. This means that somewhere in the translation a Tag was not copied correctly. This can be fixed by:
Upon fixing all of the tag problems, the target document will generate correctly.
The follow type of error is an indication of a Tag problem:

The following steps will aid in resolving any such issues:
Alternatively
1. Carefully ensure that if the source text contains formatting tags (ex: <f1>some text to translate</f1>) that all the tags are copied to the translation. Use the Edit -> Insert Next Missing Tag option to aid in the process.
2. Use the Tag Validator (Tools -> Validate Tags) to ensure that the tags were copied correctly.
3. Generate the translated documents (Project -> Create Translated Documents).
4. You should now be able to open the translated document.
You can enable or disable this option by selecting Project -> Properties_and check the _Remove Tags option on the dialog.

Go to Tools -> Validate Tags to open the Tag Validator:

After you have clicked on Validate Tags, the screen below should appear:

Only sentence that contain in-line formatting, i.e. a single word in the sentence is bold, italic or underline, will have formatting tags. It is possible for the application to extract the formatting of most sentences and as a result they need not have formatting tags.
The formatting tags are used for in-sentence formatting, i.e. any place in the source document where the application was unable to extract the explicit formatting. To ensure that the source formatting is not lost, it is assigned specific formatting tags. The tags are then used to insert the correct formatting when creating the translated document.
When you remove the tags, you need to check the formatting yourself in the generated document.
If the TransTips are enabled (Options -> TransTips -> Enable TransTips), words in the source text that could be found in the glossaries will be underlined with a blue line. You can simply right click on the underlined word and the possible translations will be shown in the context menu (beneath the Remove translation option). Selecting a word will insert it into the translation.
Alternatively, while translating you can press [Ctrl] + [Spacebar] to open the auto-complete panel which lists the glossary entries. Select an entry using the arrow keys and press [Enter] to select and insert an entry.
Select Edit -> Create Glossary Entry on the main menu to open the Create Glossary Entry dialog. Here you enter the source text, the translation and optionally a comment. The entry will then be placed in your personal glossary file that is available in the project directory under the glossary folder.
Alternatively, you can press [Ctrl] + [Shift] + [G] to create a new glossary entry.
You have to set the option for exact word matching at Options -> TransTips -> Exact Match.
See the OmegaT user manual (screen shot below) on how to create a glossary. If the glossary is built to OmegaT standards, save the glossary in the glossary folder of your project.

Select the appropriate fuzzy match by selecting Edit -> Select Match #1 on the main menu. To insert the selected fuzzy match, select Edit -> Insert Match on the main menu.
Select the appropriate fuzzy match by selecting Edit -> Select Match #1 on the main menu. Select Edit -> Replace with Match on the main menu to replace the current translation with the selected match.
A machine translated text can be inserted into the current active segment by selecting Edit -> Replace with Machine Translation on the main menu. Press [Ctrl] + [M] on the keyboard to achieve the same result.
Refer to the OmegaT User's Manual (Appendix C) on how to setup and operate a Team project in which the translation memories, glossaries and translated documents can be shared between several users.
For every new project, a new translation memory is created. It is also possible to copy another translation memory to the new project's tm folder to include work you have previously done on other projects. Refer to the OmegaT user manual Chapter 14 for more information.
Ensure that the option for machine translation is selected. Go to Options -> Machine Translate -> Choose machine translation system. If these options are selected, make sure that your computer can connect to the Internet. Be sure to enter you required API key for specific machine translation services; refer to the OmegaT user manual (Chapter 20) for more information.
The Autshumato machine translation systems are currently only available to the Department of Arts and Culture, Government of South Africa. We can, however, design and build customised machine translation systems to cater for your organisation or company. Contact us at authsumato@nwu.ac.za for more information.
We recommend creating a central folder on your computer for all your translation work. You can then create sub-folders for each project and organise the additional resources effectively in this central folder. The steps to all your translated work are as follows:
Open the project you created:

Double click on the target folder:

You will see your translated document as shown below:

Read Chapter 9.2 (Other file formats) in the OmegaT user manual. The document can only be translated to the same format; you need to manually save it to the desired format.
No, it is optional.
You may not use any alphabetic (a to z) or numeric (0 to 9) characters as separators.
The following characters are also not allowed, because they are not allowed in file names:
/ \ : * ? " < > |
Press OK and enter another character or use the default character ('.').
This error informs you that no character has been entered to serve as the separator character. When you press OK, the default separator character ('.') will be restored.
This error informs you that no character has been entered to serve as the separator character. When you press OK, the default separator character ('.') will be restored.