Automatically sentence-case i18next translations

April 9, 2024

Automatically sentence-case i18next translations

We use i18next to handle our localization requirement. We have written in great detail how we use i18next and react-i18next libraries in our applications.

As our translations grew, we realized instead of adding every combination of the texts as separate entries in the translation file, we can reuse most of them by utilizing the i18next interpolation feature.

Interpolation is one of the most used functionalities in i18n. It allows integrating dynamic values into our translations.

{
  "key": "{{what}} is {{how}}"
}

i18next.t("key", { what: "i18next", how: "great" });
// -> "i18next is great"

Problem

As we started to use interpolation more and more, we started seeing lot of text with irregular casing. For instance, in one of our apps, we have an Add button in a few pages.

{
  "addMember": "Add a member",
  "addWebsite": "Add a website"
}

Instead of adding each text as an entry in the translation file as shown above, we took a bit of a generic approach and started using interpolation. Now our translation files started to look like this.

{
  "add": "Add a {{entity}}",
  "entities": {
    "member": "Member",
    "website": "Website"
  }
}

This is great, but it has a slight problem. The final text formed looked like this.

Add a Member

We can see the Member is still capitalized, we needed it to be properly sentence-cased like this.

Add a member

We first thought we would just add .toLocaleLowerCase() to the dynamic value.

t("add", { entity: t("entities.member").toLocaleLowerCase() });

It worked fine. But often, developers would forget to add .toLocaleLowerCase() in a lot of places. Secondly, it started to pollute our code with too much .toLocaleLowerCase().

As always, we decided to extract this problem to our neeto-commons-frontend package.

Solutions we looked at

At first, it seemed like a very simple problem. We thought we can just use the post-processor feature. We just need to sentence-case the entire text on post-process like this.

const sentenceCaseProcessor = {
  type: "postProcessor",
  name: "sentenceCaseProcessor",
  process: text => {
    // Sentence-case text.
    return (
      text.charAt(0).toLocaleUpperCase() + text.slice(1).toLocaleLowerCase()
    );
  },
};

i18next
  .use(LanguageDetector)
  .use(initReactI18next)
  .use(sentenceCaseProcessor)
  .init({
    resources: resources,
    fallbackLng: "en",
    interpolation: {
      escapeValue: false,
      skipOnVariables: false,
    },
    postProcess: [sentenceCaseProcessor.name],
  });

Voila! Now onwards all the texts will be properly sentence-cased, we no longer need to add .toLocaleLowerCase(). Great? Not really.

We soon realized that not every text should be sentence-cased, there are a lot of cases where we need to preserve the original casing. Here are some examples.

Your file is larger than 2MB.
Disconnect Google integration?
No results found with your search query "Oliver".
Your Api Key: AJg3c4TcXXXXXXXXX
No internet, NeetoForm is offline.

These examples clearly show why it's not a simple problem. We require a more targeted and nuanced solution. Upon revisiting the issue, we found that our initial solution of adding .toLocaleLowerCase() does work, but it's a bit verbose.

So we decided to try custom formatters. So instead of adding .toLocaleLowerCase() we created a nice custom formatter called lowercase.

i18next.services.formatter.add("lowercase", (value, lng, options) => {
  return value.toLocaleLowerCase();
});

{
  "add": "Add a {{entity, lowercase}}",
  "entities": {
    "member": "Member",
    "website": "Website"
  }
}

This works perfectly, but it doesn't solve the verbosity problem. Instead of adding .toLocaleLowerCase() in JavaScript files, we're now adding it in translation JSON files - essentially just moving the problem to a different place.

We needed a better solution that required minimal effort.

The idea here is to lowercase all dynamic values by default and create a formatter to handle exceptions. To achieve this, we combined our previous post-processor and a new formatter. The new formatter which we called anyCase can be used to flag any dynamic part in the text that needs to be excluded from lowercasing. The post-processor will ignore these particular parts of the text while sentence-casing.

const ANY_CASE_STR = "__ANY_CASE__";
i18next.services.formatter.add("anyCase", (value, lng, options) => {
  return ANY_CASE_STR + value + ANY_CASE_STR;
});

{
  "message": "Your file is larger than {{size, anyCase}}"
}

The post-processor we wrote attempted to identify these parts of the text marked by anyCase formatter using pattern matching and retaining the original casing. However, this approach failed when the text contained identical words in both the dynamic and static parts of the text. It ended up lowercasing both words, which is not the output we needed.

Final solution

Before we discuss the final solution, i18next recently changed how a formatter is added, which is what we have been using so far, like below.

i18next.services.formatter.add("underscore", (value, lng, options) => {
  return value.replace(/\s+/g, "_");
});

Before this, i18next had different syntax, which they now call legacy formatting is like below.

i18next.use(initReactI18next).init({
  resources: resources,
  fallbackLng: "en",
  interpolation: {
    format: (value, format, lng, options) => {
      // All our formatters should go here.
    },
  },
});

Now back to our original problem.

We need to make sure when applying formatting it only formats dynamic parts. For this, we found that if we use the legacy version of formatting, it offers an option called alwaysFormat: true. One thing to remember here is if we choose to use this flag, the latest style of formatting does not work. That means we need to move all our custom formatters to legacy format function.

i18next.use(initReactI18next).init({
  resources: resources,
  fallbackLng: "en",
  interpolation: {
    escapeValue: false,
    skipOnVariables: false,
    alwaysFormat: true,
    format: (value, format, lng, options) => {
      // All your formatters should go here.
    },
  },
});

This is not a problem for us, because we are already maintaining all our custom formatter in one place(neeto-commons-frontend package). Now the formatter is applied to every dynamic text. This approach also overcame the "identical words in the text problem" that we encountered with the previous version of the formatter. Let's look at our updated formatter.

const LOWERCASED = "__LOWERCASED__";
const lowerCaseFormatter = (value, format) => {
  if (!value || format === ANY_CASE || typeof value !== "string") {
    return value;
  }
  return LOWERCASED + value.toLocaleLowerCase();
};

To elaborate on the code, the formatter lowercases all dynamic texts and prefixes them with __LOWERCASED__. This prefixing is necessary because the formatter lacks information about where this specific piece of text originally appeared in the complete text. By adding this prefix, if the lowercased text happens to be the first part of the output, we can revert it during the post-processing stage. And that's precisely what we accomplished in the post-processor.

const sentenceCaseProcessor = {
  type: "postProcessor",
  name: "sentenceCaseProcessor",
  process: value => {
    const shouldSentenceCase = value.startsWith(LOWERCASED); // Check if first word is lowercased.
    value = value.replaceAll(LOWERCASED, ""); // Remove all __LOWERCASED__

    return shouldSentenceCase ? sentenceCase(value) : value;
  },
};

Below is everything put together, If you're interested in a working example of the same, checkout this gist.

const LOWERCASED = "__LOWERCASED__";
const ANY_CASE = "anyCase";

const sentenceCase = value =>
  value.charAt(0).toLocaleUpperCase() + value.slice(1);

const lowerCaseFormatter = (value, format) => {
  if (!value || format === ANY_CASE || typeof value !== "string") {
    return value;
  }
  return LOWERCASED + value.toLocaleLowerCase();
};

const sentenceCaseProcessor = {
  type: "postProcessor",
  name: "sentenceCaseProcessor",
  process: value => {
    const shouldSentenceCase = value.startsWith(LOWERCASED);
    value = value.replaceAll(LOWERCASED, "");

    return shouldSentenceCase ? sentenceCase(value) : value;
  },
};

i18next
  .use(LanguageDetector)
  .use(initReactI18next)
  .use(sentenceCaseProcessor)
  .init({
    resources: resources,
    fallbackLng: "en",
    interpolation: {
      escapeValue: false,
      skipOnVariables: false,
      alwaysFormat: true,
      format: (value, format, lng, options) => {
        // other formatters
        return lowerCaseFormatter(value, format);
      },
    },
    postProcess: [sentenceCaseProcessor.name],
    detection: {
      order: ["querystring", "cookie", "navigator", "path"],
      caches: ["cookie"],
      lookupQuerystring: "lang",
      lookupCookie: "lang",
    },
  });

If this blog was helpful, check out our full blog archive.

Automatically sentence-case i18next translations

Problem

Solutions we looked at

Final solution

Stay up to date with our blogs.