concept

UTF-16

UTF-16 (16-bit Unicode Transformation Format) is a character encoding standard that represents Unicode code points using one or two 16-bit code units. It is widely used in operating systems like Windows and programming languages such as Java and JavaScript for internal string representation. UTF-16 can encode all Unicode characters, with common characters in the Basic Multilingual Plane (BMP) using a single 16-bit unit and supplementary characters using two 16-bit units (surrogate pairs).

Also known as: UTF16, UTF 16, Unicode UTF-16, 16-bit Unicode, UCS-2 (historically related)

🧊Why learn UTF-16?

Developers should learn UTF-16 when working with systems that natively use it, such as Windows APIs, Java, or JavaScript engines, to handle text processing and internationalization correctly. It is essential for applications requiring support for a wide range of languages and emojis, as it efficiently encodes most common characters while accommodating less common ones. Understanding UTF-16 helps avoid encoding errors, especially when dealing with surrogate pairs or interfacing with UTF-8 or other encodings.