Free to follow every thread. No paywall, no dead ends.
Internationalization and localization | HearLore
Internationalization and localization
In the early 1980s, Lotus 1-2-3 spent two years separating its program code from its text, a decision that ultimately caused it to lose its market leadership in Europe to Microsoft Multiplan. This historical blunder illustrates the immense stakes involved in software internationalization, a field where a single character can dictate the success or failure of a global product. The term internationalization, often abbreviated as i18n, was coined at Digital Equipment Corporation in the 1970s or 1980s to describe the process of designing software that can be adapted to various languages without requiring engineering changes. The abbreviation itself is a clever numeronym, where the number 18 represents the count of letters between the first i and the last n in the word internationalization. This linguistic shortcut became a standard in the industry, allowing developers to discuss complex global adaptation strategies with a single syllable. The concept was born from the necessity to avoid the costly mistakes of the past, where companies like MicroPro found that using an Austrian translator for the West German market resulted in documentation that lacked the proper tone, causing executives to question the entire localization strategy. The stakes were not merely linguistic but financial and reputational, as seen when Tandy Corporation needed French and German translations for the TRS-80 Model 4, only to produce six different versions that varied on the gender of computer components due to the involvement of a Belgium office and five translators in the US. These early failures highlighted the need for a systematic approach to software design that could accommodate the vast differences in language, culture, and technical requirements across the globe.
The Architecture of Global Adaptation
The engineering behind internationalization requires a fundamental restructuring of how software is built, separating every potentially locale-dependent part into a distinct module. This process, known as splitting code, text, and data, allows each module to rely on a standard library or be independently replaced for each specific locale. The prevailing practice today involves placing text in resource files that are loaded during program execution, creating a system where strings are stored in a format that is relatively easy to translate. These strings, often referred to as messages, are stored in a message catalog, which generally comprises a set of files in a specific localization format and a standard library to handle that format. One software library and format that aids this process is gettext, a tool that helps applications support multiple languages by selecting the relevant language resource file at runtime. However, the challenges extend far beyond simple text translation. Writing direction varies significantly, with most European languages flowing left to right, while Hebrew and Arabic flow right to left, and some Asian languages can be written vertically. Complex text layout is another hurdle, as characters in certain languages change shape depending on their context, requiring profound changes in the software that go beyond the surface level of translation. For instance, OpenOffice.org achieves this level of adaptation through compilation switches, ensuring that the software can handle the intricate nuances of different writing systems. The Common Locale Data Repository by Unicode provides a collection of such differences, used by major operating systems including Microsoft Windows, macOS, and Debian, as well as by major Internet companies like Google and the Wikimedia Foundation. This repository ensures that modern systems can represent many different languages with a single character encoding, bridging the gap between diverse scripts and the digital world.
When was the term internationalization coined and by which company?
The term internationalization was coined at Digital Equipment Corporation in the 1970s or 1980s. This abbreviation i18n represents the 18 letters between the first i and the last n in the word internationalization.
What happened to Lotus 1-2-3 in the early 1980s regarding software internationalization?
Lotus 1-2-3 spent two years separating its program code from its text in the early 1980s. This decision ultimately caused the software to lose its market leadership in Europe to Microsoft Multiplan.
How does the Common Locale Data Repository by Unicode support software internationalization?
The Common Locale Data Repository by Unicode provides a collection of differences used by major operating systems including Microsoft Windows, macOS, and Debian. It ensures that modern systems can represent many different languages with a single character encoding.
What specific localization error occurred in the Italian version of Microsoft Office?
The Italian version of Microsoft Office used CTRL + S for sottolineato instead of the universal Save function. This resulted in users losing their work or becoming confused about the software functionality before the advent of AutoSave.
Which companies used machine translation on a large scale in the 1990s for software localization?
Bull used machine translation specifically Systran on a large scale for all their translation activity in the 1990s. Human translators handled pre-editing to make the input machine-readable and post-editing to ensure quality.
Beyond the technicalities of code and text, internationalization must navigate a labyrinth of cultural conventions that vary from country to country. Different nations have distinct economic conventions, including variations in paper sizes, broadcast television systems, and popular storage media. Telephone number formats, postal address formats, and postal codes differ so drastically that a field length designed for one country may be useless in another. Currency symbols, their positions, and the reasonable amounts due to different inflation histories require careful handling, often using ISO 4217 codes for internationalization. The system of measurement, battery sizes, voltage, and current standards also vary, with the United States and Europe differing in most of these cases, while other areas often follow one of these two models. Time zones add another layer of complexity, as products originally designed for a single time zone must be adapted to interact with people across the globe, often using UTC internally and converting to local time zones for display purposes. Legal requirements further complicate the process, as regulatory compliance may require customization for a particular jurisdiction or a change to the product as a whole. Privacy law compliance, additional disclaimers on websites or packaging, and different consumer labeling requirements are just some of the legal hurdles that must be cleared. Compliance with export restrictions and regulations on encryption, as well as adherence to Internet censorship regimes or subpoena procedures, can dictate the very existence of a product in a specific market. Government-assigned numbers, such as passports, Social Security Numbers, and other national identification numbers, have different formats that must be supported. The sensitivity to different political issues, such as geographical naming disputes and disputed borders shown on maps, can turn a simple software update into a crime, as seen in India, which proposed a bill that would make failing to show Kashmir and other areas as intended by the government a crime. These legal and political constraints are just the tip of the iceberg, as cultural factors such as local holidays, personal name and title conventions, and aesthetics must also be considered to ensure the software is culturally appropriate.
The Business of Translation
The commercial considerations of internationalization reveal a complex web of costs and benefits that extend far beyond the engineering team. In a commercial setting, the benefit of localization is access to more markets, but the costs involved go far beyond engineering. Business operations must adapt to manage the production, storage, and distribution of multiple discrete localized products, which are often being sold in completely different currencies, regulatory environments, and tax regimes. Sales, marketing, and technical support must also facilitate their operations in the new languages to support customers for the localized products. Particularly for relatively small language populations, it may never be economically viable to offer a localized product. Even where large language populations could justify localization for a given product, and a product's internal structure already permits localization, a given software developer or publisher may lack the size and sophistication to manage the ancillary functions associated with operating in multiple locales. The process of internationalization requires a broader approach that takes into account cultural factors regarding the adaptation of the business process logic or the inclusion of individual cultural behavioral aspects. Already in the 1990s, companies such as Bull used machine translation, specifically Systran, on a large scale for all their translation activity, with human translators handling pre-editing to make the input machine-readable and post-editing to ensure quality. This approach highlights the tension between efficiency and accuracy, as machine translation can be a powerful tool but requires human oversight to maintain the tone and cultural appropriateness of the content. The business process for internationalizing software is a complex project that involves looking at a variety of markets that the product will foreseeably enter, considering details such as field length for street addresses, unique formats for addresses, and the ability to make the postal code field optional to address countries that do not have postal codes. The introduction of new registration flows that adhere to local laws is just one of the many examples that make internationalization a complex project, requiring a team that understands foreign languages and cultures and has some technical background to handle the basic and central stages of the process.
The Pitfalls of Translation
The history of software localization is littered with cautionary tales of translation gone wrong, where well-intentioned efforts to make software more accessible resulted in confusion and frustration for users. One example of the pitfalls of localization is the attempt made by Microsoft to keep some keyboard shortcuts significant in local languages. This has resulted in some programs in the Italian version of Microsoft Office using CTRL + S, which stands for sottolineato, as a replacement for CTRL + U, which stands for underline, rather than the almost universal Save function. Before the advent of AutoSave, the net result could be a lot of underlined words in an unsaved revision of a document, causing users to lose their work or become confused about the software's functionality. The team responsible for Microsoft Excel localization decided to translate the tokens used in formulas, as a byproduct of localizing number and date formats. For example, the formula =SUM(A1:A10) becomes =SOMME(A1:A10) in French or =SUMME(A1:A10) in German. While making the formulas easier to understand for users with no knowledge of the English language, it has also meant that, before machine translation of web pages, a search for help on the internet could not count on examples from other languages. Manuals and tutorials required translations of every formula in examples and exercises, creating a barrier to knowledge sharing across language barriers. These examples highlight the difficulty of maintaining parallel versions of texts throughout the life of the product, as if a message displayed to the user is modified, all of the translated versions must be changed. Independent software vendors such as Microsoft may provide reference software localization guidelines for developers, but the software localization language may be different from the written language, adding another layer of complexity to the process. The challenge is to balance the need for user-friendly interfaces with the technical requirements of the software, ensuring that the localization process does not introduce new errors or confusion.
The Volunteer Revolution
The democratization of software localization has led to a revolution in how software is translated and maintained, with free and open source software often relying on self-localization by end-users and volunteers. Once properly internationalized, software can rely on more decentralized models for localization, where volunteer translation teams organize themselves to support multiple languages. The GNOME project, for example, has volunteer translation teams for over 100 languages, while MediaWiki supports over 500 languages, of which 100 are mostly complete. This grassroots approach to localization has allowed software to reach a global audience without the need for large corporate resources, enabling communities to take ownership of their digital tools. Specialized technical writers are required to construct a culturally appropriate syntax for potentially complicated concepts, coupled with engineering resources to deploy and test the localization elements. The process involves a team that understands foreign languages and cultures and has some technical background to handle the basic and central stages of the process, ensuring that the software is not only translated but also culturally appropriate. The challenge is to maintain the quality and consistency of the translations, as the software must be able to handle the diverse needs of its global user base. The decentralized model of localization has also led to the development of specialized tools and standards, such as the Common Locale Data Repository by Unicode, which provides a collection of differences that are used by major operating systems and Internet companies. This repository ensures that modern systems can represent many different languages with a single character encoding, bridging the gap between diverse scripts and the digital world. The volunteer revolution has also highlighted the importance of community engagement, as users are more likely to adopt and support software that is available in their native language and culture. The process of internationalization and localization is a continuous journey, with new challenges and opportunities arising as the digital world evolves and expands to include more languages and cultures.