ASCII, or American Standard Code for Information Interchange, is a character encoding standard used to represent text in computers, communications equipment, and other devices that use text. ASCII is one of the most widely used character encoding systems and serves as the foundation for modern text encoding schemes.
Key Features of ASCII
- Character Set: ASCII defines 128 characters, including:
- Control Characters (0–31): Non-printable characters used for text control (e.g., carriage return, line feed, tab).
- Printable Characters (32–126): Letters, digits, punctuation marks, and a few miscellaneous symbols (e.g., space, A-Z, a-z, 0-9, @, #).
- 7-bit Encoding: ASCII uses a 7-bit binary number to represent each character, allowing for 128 possible characters (2^7 = 128).
- Standardization: ASCII was standardized by the American National Standards Institute (ANSI) in 1963 and is universally recognized and used in various computing and communication systems.
ASCII Character Table
Due to the numerous versions of ASCII character table extensions, it is essential to identify which set corresponds to each text to ensure proper understanding. However, because common characters appear in all sets — including proprietary ones — failing to correctly identify a character set typically doesn’t cause issues if the text is in English. Additionally, many Internet standards use ISO 8859-1, and given that Microsoft Windows (which uses the code page 1252, a superset of ISO 8859-1) is the most prevalent operating system for personal computers, the unannounced use of ISO 8859-1 is very common and is generally assumed unless there is evidence suggesting otherwise.
Here is a simplified version of the ASCII character table showing some common characters and their corresponding codes:
Decimal | Hexadecimal | Binary | Character | Description |
---|---|---|---|---|
0 | 00 | 0000000 | NUL | Null |
9 | 09 | 0001001 | TAB | Horizontal Tab |
10 | 0A | 0001010 | LF | Line Feed |
13 | 0D | 0001101 | CR | Carriage Return |
32 | 20 | 0100000 | (space) | Space |
48 | 30 | 0110000 | 0 | Digit Zero |
65 | 41 | 1000001 | A | Uppercase A |
97 | 61 | 1100001 | a | Lowercase a |
126 | 7E | 1111110 | ~ | Tilde |
Extended ASCII
Extended ASCII is an enhancement of the original ASCII character table. While the basic ASCII table uses 7 bits per character, allowing for 128 unique symbols, the extended ASCII table utilizes 8 bits, adding an extra 128 characters. These additional characters include symbols from non-English languages and special characters for creating graphics. The total number of symbols required for human languages, mathematics, most programming languages, and software applications greatly exceeds the 96 printable ASCII codes. Therefore, various extensions to ASCII have been developed. Since the original ASCII is a 7-bit code and modern PCs handle data in 8-bit bytes, many extensions make use of the additional 128 codes available by employing all eight bits. Extended ASCII allows for the inclusion of many languages not easily represented in standard ASCII. However, even these 8-bit extensions are insufficient to cover all languages used in the countries where computers are sold, leading to the creation of local variants.
Modern Alternatives
While ASCII (American Standard Code for Information Interchange) has been fundamental in the history of computing, its limitations in supporting a wide range of characters and languages led to the development of more comprehensive encoding standards. The most prominent modern alternative to ASCII is Unicode, which includes several encoding forms such as UTF-8, UTF-16, and UTF-32. These encoding standards address the need for a more inclusive and versatile character set, accommodating virtually all the world’s writing systems.
Unicode
Unicode is a universal character encoding standard designed to support the representation of text for computers in all writing systems. It was developed to overcome the limitations of ASCII and extended ASCII, providing a consistent way to encode multilingual text.
Unicode: Unicode is a comprehensive character encoding standard that includes characters from virtually every writing system in the world. It supports over 143,000 characters and provides a consistent encoding scheme for text data. UTF-8, UTF-16, and UTF-32 are common Unicode encodings that provide varying levels of compatibility and efficiency.
Related: Unicode