GSM Alphabet

SMS messages transmit in 140 8-bit blocks, allowing for up to 160 characters in a single message when using GSM-7 encoding. Each GSM character is represented by a septet (7 bits), with the ESC character enabling access to additional characters through the basic character extension set.

Standard SMS Capacity with GSM Characters

A typical SMS can include up to 160 characters from the GSM 7-bit alphabet, as defined by GSM 03.38, which incorporates standard ASCII characters and some accented characters, like è and ñ. Characters outside this range fall under Unicode, reducing the SMS limit to 70 characters due to the encoding differences.

Exceeding these limits results in the message being segmented into smaller parts. For GSM SMS, subsequent segments can carry characters from 161 up to 459 through SMS concatenation. Unicode messages can be broken down into several parts, with the second part accommodating characters 71 to 134 and so on.

GSM-7 Encoding Details

GSM-7 is favored for languages with more than 128 symbols, occasionally requiring shift tables or switching to 16-bit UCS-2 encoding for comprehensive language support. Certain symbols in GSM-7, such as the circumflex accent, necessitate an escape code, effectively taking up the space of two characters due to the use of an escape prefix for extended characters.

UCS-2 and UTF-16 Encoding

UCS-2 encoding supports a broader spectrum of characters, including common Latin and Eastern scripts, within the Basic Multilingual Plane. Modern devices, like iPhones, often utilize UTF-16 instead of UCS-2 due to programming limitations, with UTF-16 accommodating characters outside the BMP, like emojis, using surrogate pairs.

Switching to UCS-2 encoding after entering a non-GSM character reduces the message's character limit from 160 to 70, as the entire message is reencoded to accommodate the Unicode character.

GSM vs. Unicode

The GSM encoding supports basic English and Latin characters, allowing more characters per SMS, whereas Unicode supports a wide range of languages and special characters, including emojis, but at the cost of reduced message length due to the increased data size per character.

National Language Shift Tables

Shift tables expand the character set in SMS to include language-specific characters, using the User Data Header to specify language encoding. This allows for the efficient use of 7-bit encoding while accommodating up to 155 characters (or 152 when using both locking and single shift tables) in a message, by dedicating a portion of the message's data payload to language specification.