| Translation Memory |
Article Index for Translation |
Articles about Translation Memory |
Website Links For Translation Memory |
Information AboutTranslation Memory |
| CATEGORIES ABOUT TRANSLATION MEMORY | |
| computer programming tools | |
| translation | |
|
Some software programs that use translation memories are known as translation memory managers ('''TMM'''). Translation memories are typically used in conjunction with a dedicated computer assisted translation (CAT) tool, Word Processing program, Terminology management system, multilingual dictionary, or even raw Machine Translation output. A translation memory consists of text segments in a source language and their translations into one or more target languages. These segments can be blocks, paragraphs, sentences, or phrases. Individual words are handled by terminology bases and are not within the domain of TM. Research indicates that many companies producing multilingual documentation are using translation memory systems. USING TRANSLATION MEMORIES A translator first supplies a ''source text'' (that is, a text to be translated) to the translation memory. The program will then scan the text, trying to find segments in its database that it will use to generate a partly translated output text. This text is presented to the translator for review. The translator can accept the suggestion, reject it or make modifications and use the modified version. In this case, the modified version is recorded and saved in the database. Some translation memories systems attempt only literal matching, that is to say that they can only retrieve segments of text that match entries in the database exactly, while others employ '' Fuzzy '' matching algorithms to retrieve similar segments, which are presented to the translator with differences flagged. The flexibility and robustness of the matching algorithm largely determine the performance of the translation memory, although for some applications the recall rate of exact matches can be high enough to justify the literal approach. Segments where no match is found will have to be translated by the translator manually. These new segments are stored in the database where they can be used for future translations. Translation memories work best on texts which are highly repetitive, such as technical manuals. They are also helpful for making incremental changes to texts, corresponding, for example, to minor product changes. Traditionally, translation memories have not been considered appropriate for literary or creative texts, for the simple reason that there is so little repetition in the language used. However, others find them of value even for non-repetitive texts, because the database resources created have value for concordance searches to determine appropriate usage of terms. If a translation memory system is used consistently on appropriate texts over a period of time, it can save translators considerable work. Main Benefits Translation memory managers are most suitable for translating technical documentation and documents containing specialized vocabularies. Their benefits include:
Main Obstacles The main problems hindering wider use of translation memory managers include:
FUNCTIONS OF A TRANSLATION MEMORY (TM) The following is a summary of the main functions of a Translation Memory, as described on Design and function of translation memory . Off-line functions Import This function is used to transfer a text and its translation from a text file to the TM. Import can be done from a ''raw format'', in which an external source text is available for importing into a TM along with its translation. Sometimes the texts have to be reprocessed by the user. There is another format that can be used to import: the ''native format''. This format is the one that uses the TM to save translation memories in a file. Analysis The process of analysis is developed through the following steps: ; Textual parsing : It is very important to recognize punctuation in order to distinguish for example the end of sentence from abbreviation. So markup is a kind of pre-editing. Usually, the materials which have been processed through translators' aid programs contain mark-up, as the translation stage is embedded in a multilingual document production line. Other special text elements may be set off by mark-up. There are special elements which do not need to be translated, such as proper names and codes, while others may need to be converted to native format. ; Linguistic parsing : The base form reduction is used to prepare lists of words and a text for automatic retrieval of terms from a term bank. On the other hand, syntactic parsing may be used to extract multi-word terms or phraseology from a source text. So parsing is used to normalise word order variation of phraseology, this is which words can form a phrase. ; Segmentation : Its purpose is to choose the most useful translation units. Segmentation is like a type of parsing. It is done monolingually using superficial parsing and alignment is based on segmentation. If the translators correct the segmentations manually, later versions of the document will not find matches against the TM based on the corrected segmentation because the program will repeat its own errors. Translators usually proceed sentence by sentence, although the translation of one sentence may depend on the translation of the surrounding ones. ; Alignment : It is the task of defining translation correspondences between source and target texts. There should be feedback from alignment to segmentation and a good alignment algorithm should be able to correct initial segmentation. ; Term extraction : It can have as input a previous dictionary. Moreover, when extracting unknown terms, it can use parsing based on text statistics. These are used to estimate the amount of work involved in a translation job. This is very useful for planning and scheduling the work. Translation statistics usually count the words and estimate the amount of repetition in the text. Export Export transfers the text from the TM into an external text file. Import and export should be inverses. Online functions When translating, one of the main purposes of the TM is to retrieve the most useful correspondences in the memory so that the translator can choose the best one. The TM must show both the source and target text pointing out the identities and differences. Retrieval We can retrieve from the TM one or more matching correspondences. ; Exact match : We talk about exact matches when the match between the current source segment and the stored one has been a character by character match. When translating a sentence, an exact match means the same sentence has been translated before. Exact matches are also called "100% matches". ; In Context Exact (ICE) match : An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph. Context is often defined by the surrounding sentences and attributes such as document file name, date, and permissions. ; Fuzzy match : When the match has not been exact, it is a "fuzzy" match. Some systems assign percentages to these kind of matches, in which case a fuzzy match is greater than 0% and less than 100%. Those figures are not comparable across systems unless the method of scoring is specified. Updating A TM is updated with a new translation when it has been accepted by the translator. As always in updating a database, there is the question what to do with the previous contents of the database. A TM can be modified by changing or deleting entries in the TM. Automatic Translation Translation memories can do retrieval and substitution automatically without the help of the translator. ; Automatic retrieval : A TM features automatic retrieval and evaluation of translation correspondences in a translator's workbench. ; Automatic substitution : Exact matches come up in translating new versions of a document. When you translate automatically, you cannot check the translation against the original so if there are any mistakes in the original, they will carry over. Networking When networking during the translation it is possible to translate a text efficiently together with a group of translators. This way, the translations entered by one translator are available to the others. Moreover, if translation memories are shared before the final translation, the mistakes made by one translator will easily be corrected. Text Memory Text memory is a radical innovation in the field of translation memory. Text memory comprises author memory and translation memory. This concept is the basis of the proposed Lisa OSCAR xml:tm standard . = Author Memory A unique identifier is maintained for each text unit within a document during the authoring cycle. A text unit is a subdivision of text into individual sentence, or the text of a document element if no subdivision is possible. = Translation Memory The unique identifiers are remembered during translation so that the target language document is 'exactly' aligned at the text unit level. If the source document is subsequently modified, then those text units that have not changed can be directly transferred to the new target version of the document without the need for any translator interaction. This introduces the concept of 'exact' or 'perfect' matching to the translation memory. This improves on the traditional 'leveraged' matching concept of translation memory. xml:tm can also be used for much more focused translation memory matching by providing mechanisms for in-document leveraged and fuzzy matching. HISTORY OF TRANSLATION MEMORIES The concept behind translation memories is not recent — university research into the concept began in the late 1970s, and the earliest commercializations became available in the late 1980s — but they became commercially viable only in the late 1990s. Originally translation memory systems stored aligned source and target sentences in a database, from which they could be recalled during translation. The problem with this 'leveraged' approach is that there is no guarantee if the new source language sentence is from the same context as the original database sentence. Therefore all 'leveraged' matches require that a translator reviews the memory match for relevance in the new document. Although cheaper than outright translation, this review still carries a cost. Recent trends One recent important innovation is the concept of 'text memory' rather than just translation memory (see Translating XML Documents with xml:tm ). This is also the basis of the proposed LISA OSCAR xml:tm standard. Text memory within xml:tm comprises 'author memory' and 'translation memory'. Author memory is used to keep track of changes during the authoring cycle. Translation memory uses the information from author memory to implement much more focused and cost effective translation memory matching. Although primarily targeted at XML documents, xml:tm can be used on any document that can be converted to XLIFF format. TRANSLATION MEMORY AND RELATED STANDARDS TMX Translation Memory Exchange format . This standard enables the interchange of translation memories between translation suppliers. TMX has been adopted by the translation community as the best way of importing and exporting translation memories. The current version is 1.4b - it allows for the recreation of the original source and target documents from the TMX data. TBX Termbase Exchange format . This standard allows for the interchange of terminology data including detailed lexical information. The framework for TBX is provided by two ISO 12620, ISO 12200 and ISO Committee Draft 16642, known as TMF or Terminological Markup Framework. ISO 12620 provides an inventory of well-defined “data categories” with standardized names that function as data element types or as predefined values. ISO 12200 (also known as MARTIF) provides the basis for the core structure of TBX. TMF includes a structural metamodel for Terminology Markup Languages in general, regardless of which XML style of representation is used. SRX Segmentation Rules Exchange format . SRX is intended to enhance the TMX standard so that translation memory data that is exchanged between applications can be used more effectively. The ability to specify the segmentation rules that were used in the previous translation increases the leveraging that can be achieved. GMX GILT Metrics . GILT stands for (Globalization, Internationalization, Localization, and Translation). The GILT Metrics standard comprises three parts: GMX-V for volume metrics, GMX-C for complexity metrics and GMX-Q for quality metrics. The proposed GILT Metrics standard is tasked with quantifying the workload and quality requirements for any given GILT task. OLIF Open Lexicon Interchange Format . OLIF is an open, XML-compliant standard for the exchange of terminological and lexical data. Although originally intended as a means for the exchange of lexical data between proprietary machine translation lexicons, it has evolved into a more general standard for terminology exchange. XLIFF XML Localization Interchange File Format . It is intended to provide a single interchange file format that can be understood by any localization provider. XLIFF is the preferred way of exchanging data in XML format in the translation industry. TransWS Translation Web Services . TransWS specifies the calls needed to use Web services for the submission and retrieval of files and messages relating to localization projects. It is intended as a detailed framework for the automation of much of the current localization process by the use of Web Services. xml:tm xml:tm This is a radical new approach to translation memory based on the concept of text memory which comproses author and translation memory. One of the first significant advances in translation memory technology since its inception. xml:tm has been donated to Lisa OSCAR by XML INTL . SEE ALSO EXTERNAL LINKS Free translation memory packages
Desktop translation memory -- Commercial Desktop translation memory tools are typically what individual translators use to complete translations. They are a specialized tool for translation in the same way that a word processor is a specialized tool for writing.
Centralized translation memory -- Commercial Centralized translation memory systems store TM on a central server. They work together with desktop TM and can increase TM match rates by 30-60% more than the TM leverage attained by desktop TM alone. They export prebuilt "translation kits" or "t-kits" to desktop TM tools. A t-kit contains content to be translated pre-segmented on the central server and a subset of the TM containing all applicable TM matches. Centralized TM is usually part of a globalization management system (GMS), which may also include a centralized terminology database (or glossary), a workflow engine, cost estimation, and other tools.
References and interesting links
|
|
|