Description
This is part of the series of proposals spanning out of the meta #16.
String Versions
All localization systems have to facilitate the string changes as part of the project life cycle. While adding and removing strings is fairly well understood and covered by Fluent based on the l10n-id model, string updates are more complicated.
We identified three states of invalidation that can happen to a message:
- (trivial) source-locale only change
- (minor) subtle change to tone, punctuation or wording without affecting the meaning
- (major) any other change - meaning, message shape, location in the UI etc.
At the moment at Mozilla we support (1) and (3). For (2), we will usually lean onto (3) and if the change is really minor, we'll put it on (1).
Limitations
That model works quite well, but has several limitations:
1) Any change to the message, even if the message does not lose its meaning, invalidates all translations.
That means that en-US change of tone requires l10n-drivers to decide if we want to invalidate the work of 100 people to inform them about the en-US-specific update?
2) If we deem the change small enough, we have no way to inform localizers of an "optional" update
In case we decide to go with (1) for that particular change, we have no way to communicate to localizers that there's anything to look at.
Solution
Semantic comments create an opportunity to shift that and separate out (2) as soft-fuzzy mode. It would be only applicable for cases where string change is subtle enough that the old message remains valid for production, but allow the localizers to learn about the update and consider updating their translation.
This fits quite well into the feature scope of #139 because it doesn't affect runtime, and in practice it mostly allows us to separate out (1) from (2).
But I believe that this feature can have a more subtle impact on Fluent ecosystem by nurturing the culture of thinking about the social contract. Instead of a culture where developers perceive every change to the string as requiring ID update, developers would be evaluating their changes to the social contract with localizers.
In most cases they'd inflate the ID understanding why are they doing it, while at the same time being incentivized to minimize the changes to copy in order to preserve the social contract and work of the 100 localizers.
It is my hope that the latter will also increase the value of the Fluent system by making it better at salvaging useful translations.
Case study
To illustrate the latter, I'm going to present an example. Two weeks ago we landed this change:
-history-remember-option =
- .label = Remember my browsing and download history
+history-remember-browser-option =
+ .label = Remember browsing and download history
This change is useful and goes being fixing a spelling in the source locale, thus clearly not qualifying for (1). On the other hand, while updating the string will likely be useful for many locales, many others will probably not have to update their translation in result of this change since subject in such a sentence is implicit, or already not present at all.
For example, in polish the exact translation would be "History of browsing and file downloads" and hence no change is required.
But today, we had to update the ID and in result invalidate all localizations of the message, because otherwise we would have no way to notify those who do want to update their translation about the change. It means that either all 100 of the localizers update their string in time, or users will see the message in the en-US locale, just because we wanted to flag this string as potentially worth updates.
With semantic comments and string versions, such a change would look like this:
-history-remember-option =
- .label = Remember my browsing and download history
+# @rev 2
+history-remember-option =
+ .label = Remember browsing and download history
In result of it, all 100 localizations of this message would remain valid, but localizers would be notified in their toolchain that their translation of history-remember-option
is outdated (implicitly set @rev 1
). They then would have a choice - either just mark it as valid (and apply @rev 2
in their localization), or fine tune their translation.
The end result is that we were able to notify the localizers, preserve the translations, and minimize the friction.
p.s. I think an interesting exercise would be to see how many locales changed the string in result of this, and for how many the invalidation was just a friction in the system.