Char data Type

The data types we introduced in the second chapter were broadly classified into: integer and real. Types of presented whole data were differentiated only by the number of bytes occupied by the variables: short and unsigned short – on 2 bytes, int and unsigned int on 4 bytes, long long and unsigned long 8 bytes long.

There is also one byte integer data type and that is the char type. Now the question arises why the char type does not a was presented together with the other whole types. The reason is that the char data type was intended for something rather than storing integer values on a byte. It also allows working with symbols and more the basis of working with words.

On this site we deal with the rules that must be taken into account in order to understand how to manipulate correctly char data.

Being therefore an integer type on one byte, we have the types:

Char	Signed integer type so values between -2⁷ and 2⁷- 1 so between -128 and 128
Unsigned char	Unsigned integer type so values between 0 and 2⁸ - 1, so between 0 and 255

The following applies whether we are talking about the char type or about unsigned char (obviously, except for the range of values that a variable can take, as shown above).

Here are some examples

We consider them in the context of the statements:

char a, b;
int x;

a = 10;	Correct, we assign an integer constant to an integer variable. More, the value of the constant is in the range of possible values for the variable.
a = 10; b = a * 2;	Right, it's also about integer operations.
x = a + b;	Correct also sums the values on the right and copies the result to the variable on the left. There is no problem because of the fact that it is about different types (int and char respectively) because we know that such are allowed situations.

What's so special about the char data type, though?

As I said above, through it we can work in C/C++ with symbols and words. By convention, each symbol has an associated numerical code called its ASCII code. This code is the stored numerical value when storing the symbol in RAM or on external media. Here are some codes associated with the symbols:

97 for a
65 for A
48 for 0
32 for space

Also, the other letters of the alphabet have consecutive codes, i.e. b has 98, c has 99, B has 66, C has 67 etc. We notice from now on, and it's good to remember, that we have larger codes for lowercase letters than for capital letters.

Here are the three special things about char data:

1. Char constants can also be written by putting a symbol between apostrophes. Here are some examples:

We consider the declaration:

char a, b;
int x;

‘ a ’ ‘ C ’ ‘ & ’ ‘ 2 ’	Char type constants corresponding to numeric values: 97, 67, 38 and 50
a = ' a '	The integer variable a gets the value 97
b = ' 2 '	The integer variable b is assigned the value 50.
x = ‘ a ’ + 2 * ‘ b ’;	The variable x gets the value 97 + 2 * 98 which is 293
a = 99;	Correct, the variable a receives the value of the symbol code c.
a = ‘ 2 ’ * 2;	Here the variable a receives the value 100. We observe the difference between 2 enclosed in apostrophes, which is written using the specific la syntax constants of type char - so they represent value
’ ab ’	Syntax error. A single symbol is written between the apostrophes.

In conclusion, writing a symbol between apostrophes is another way to specify an integer value constant (note that all integer operations are allowed).

2. When displaying with cout a data of type char, specified explicitly as char, it is displayed on the screen the symbol whose code is the numerical value stored in the data of char type.

Let's explain more precisely what we mean when we say that the date to be displayed should be indicated exactly as type char: variable of type char given as a parameter on display, constant written according to the rule of the symbol between apostrophes, expression preceded by the char conversion operator, or a function that returns a date of type char.

We must keep in mind the following: if the char date appears in an expression alongside other dates and operators, the result is automatically converted to int even though the result could be stored on a byte. So the language works when applying operators, so it tries to do all calculations in the most data type comprehensive, and if this is int or less than int, calculations are done in int. In these cases the result is therefore considered to be of type int and the numeric value will be printed instead of the symbol. Here are some examples.

We consider the declaration:

char a, b;
int x;

a = ‘c’; cout << a;	It will be printed on the screen c, being a char type variable given as a parameter at cout.
a = 99; cout << a;	The value of c will also be printed (exactly as in the previous example). Does not have importance how the constant assigned to the type variable was specified char. In both cases it stores the same thing and is specified to cout a char variable.
cout << ’c’;	c is printed. A specified constant of type char was parameterized to cout.
cout << 99;	99 is printed. Even though the value 99 can be stored in a byte it is not specified to cout actually according to the special rule for writing type constants char, so the numeric value will be printed.
cout << (char)99;	It prints to the screen c. Here we have specified through the conversion operator that the final value of the expression is of type char.
cout << ’ a ’ + 1;	98 is printed on the screen. Between the constant char 'a' and 1 the operator + is applied and I said that in this case the calculations are done in int, so it is also of type int the final result.
cout << ’ a ’ + ’ b ’;	It is written correctly but 195 is displayed. The explanation from the example is valid previous.
cout << (int) ’ 0 ’;	It shows on the screen 48. Do not confuse 0 written between apostrophes with the number 0. The code being memorized for symbol 0 is 48. New before display we convert the value to int.
a = ‘ ’; x = a; cout << x;	32 (space character code) is displayed. We consider that between the two apostrophes from the first assignment I wrote a space.

3. When reading with cin, if the parameter is a char variable, what is entered from the keyboard is interpreted as a symbol.

We consider the declaration:

char a, b;

cin>>a; cout << a;	Then we enter d ENTER.	It is displayed on screen d since in right of the char am type variable entered a symbol.
cin>>a>>b; cout << a << ” ” << b << ” ” << a + b;	We enter aA then enter.	97 (symbol code a) is memorized in variable a and 65 in variable b. Se print a A 162
cin>>a>>b; cout << a << ” ” << b << ” ” << a+b;	We enter 1a then enter	49 (symbol code 1) is stored in variable a and 97 in variable b. Se print 1 of 146
cin >> a;	We enter space and then enter.	It doesn't finish reading. It's time to we suggest always reading with cin ignore whitespace characters (space, enter table). That is, it does not consider them characters useful for variable values parameters. So it is expected entering at least one character non-white

The main conclusion should be that char data is stored as numbers, allow use operators generally applicable to whole data, but to interaction with input/output devices, i.e. to read/write, they are interpreted as symbols. Also, there is the special option to indicate the constants of type char (symbol placed between apostrophes).

Writing char constants for special characters.

For the codes from 0 to 31 there are no associated symbols specified in the standard. Characters with these codes they generally have other specifications and not the printing of a particular symbol on the screen. That's why we can't use them specific writing to constants because we simply don't have anything to put between apostrophes as a symbol. There is however a convention that allows us to also refer to them as constants of type char: the escape sequences

An escape sequence is specified by multiple symbols between apostrophes, the first symbol being \ (backslash). Even if several symbols are written, we should not think that it would be several characters, it is about only one but indicated in a certain way.

Here are the escape sequences for some special characters:

‘ \n ’	ASCII code 10, enter character. Displaying this character has the effect of moving cursor on the screen to the next line. We have actually used this character before part of the messages. The effect of moving the cursor to the beginning of the next line also appears if we print it as an isolated character and if we write it in the composition of a string of characters (ie between quotes).
‘ \t ’	ASCII code 9, tab character. Printing it as a symbol has the effect of showing on the screen a a "longer space"
‘ \b ’	ASCII code 8, backspace character. Typing it has the effect of moving the cursor o left position, i.e. below the last symbol printed on the screen on the current line. Don't delete it from the screen but it will disappear, being overwritten, on the next display.
‘ \a ’	ASCII code 9, alarm character. Typing it results in a sound that can be heard from speakers or speakers.

Among these characters, enter is often used, for the others I presented the effect maybe more like fun

If we run the following code, the results will be seen in the console capture shown on the right.

cout << "*" << '\n' << (int)'\n' << "*\n";
cout << "*" << '\t' << (int)'\t' << "*\n";
cout << "*" << '\b' << (int)'\b' << "*\n";
cout << "*" << '\a' << (int)'\a' << "*\n";

*
10*
*         9*
8*
*7*

On each line of code we wrote the print of a character as a "symbol" followed by its ASCII code, the two framed by two star characters.

Let's explain what is on the right. The enter character moves the cursor to the next line, so it stays on the first line only one asterisk, and on the second one, from the beginning is the enter code (10) and the final asterisk of the first line of code.

We have printed a line break for each instruction, so for the next symbol a new line is passed. There we notice a large space between the first star and 9, that is, the tab "symbol" followed by its code was displayed.

On the next line of the screen are those produced by the third line of code. The starting asterisk was displayed, but typing the backspace "symbol" moved the cursor under it. Then when displaying the backspace code (8) it a overwrite it. On the last line only 7 appears between the asterisks because the effect of printing the symbol is actually the sound produced.

Three other particular escape sequences can be used for apostrophe characters, quotation marks, and backspace. They are write preceded by a backslash.

‘ \ ’ ’	The apostrophe character
‘ \ ” ’	The quotation mark character
‘ \\ ’	The backslash character.

How do we refer as characters to the others that have codes between 0 and 31? Here we have presented only 4 of them non-displayable characters. The answer comes from the general form of writing escape sequences: between apostrophes, with backspace in front, we can write a code in base 8. That is, writing in base 8 ASCII code. In the this way we can indicate all characters, including those with associated symbol, as escape sequences.

Example:

cout << ’ \141 ’;	Converting the value 141 from base 8 to base 10 results 97. So the symbol a will be printed on the screen, being about effectively indicating a constant of type char.
cout << ’ * ’ << ’ \12 ’ << ’ * ’;	Two stars are printed, but each on one line. It's like how instead of '\12' we would have written '\n' because 12 is the writing in base 8 of 10 (ASCII code of enter).
cout << ’ \n ’ + ‘ a ’ + ‘ \11 ’;	Obviously, there are only didactic reasons to write like this. Now we it only helps in better fixation of knowledge. Is about char constants, specified in various ways, with operators arithmetic. So the calculations are done in int, adding the ce numbers represent their ASCII codes. The displayed result is 116 (10 + 97 + 9).

Exercises and solved problems:

1. Having stored the code of a small letter of the alphabet in the variable x of char type, change it to memorize the corresponding capital letter code.

Solution:

x = x – ( ‘ a ’ – ‘ A ’ );

Instead of ' a ' - ' A ' we could write ' b '- ' B ' or ' p ' - ' P ' or any difference between the codes of a pair lower case - upper case identical. This is because this difference is constant and we also need to know that lowercase letters have larger codes than uppercase ones. This difference is 32 but we don't so we have to keep this in mind.

2. Having stored in the char variable x the code of a small letter of the alphabet, to display it on the screen the next letter in the alphabet (or a message if already x stores the value of the last letter).

Solution:

if (x == ‘z’)
    cout << ”ultima”;
else
    cout << (char)(x+1);

3. A char date is read from the keyboard. Display the message YES if a lowercase letter a was entered alphabet or the message NO if something else was entered.

Solution:

The lowercase letters of the alphabet have ASCII codes in a range, so the test is equivalent to just comparing with the edges. It is not necessary to know the ASCII codes of the first and last letter, it is enough to write symbol between the apostrophes because we have determined that the operators apply to the numeric code anyway.

#include <iostream>
using namespace std;
char x;
int main () {
cin>>x;
if (x >= 'a' && x <= 'z')
    cout << "DA";
else
    cout << "NU";
return 0;
}

4. A char date is read from the keyboard. Display the letter message if a letter a has been entered alphabet (uppercase or lowercase), the digit message if a digit symbol has been entered, or the message else in another case

Solution:

#include <iostream>
using namespace std;
char x;
int main () {
cin >> x;
if ((x >= 'A' && x <= 'Z') || (x >= 'a' && x <= 'z'))
    cout << "litera";
else
if (x >= '0' && x <= '9')
    cout << "cifra";
else
    cout << "altceva";
return 0;
}

Uppercase and lowercase letters represent two code ranges. So the letter test comes down to testing if we are in one of these ranges. Knowing that the big letters have the smaller codes, maybe we are tempted to we write (x >= 'A' && x <= 'z'). This is not correct as the two intervals are not contiguous. Being 26 letters, A having the code 65, we get that Z has the code 90. So from 91 to 96 there are other characters (97 is the code for a).

We have to be careful with the digit test because apostrophes are often overlooked. It is not receiving compile error but it also doesn't compare to what we would like (ASCII code of the symbol).

5. A char date is read from the keyboard. Check if it represents a lowercase vowel.

Solution:

#include <iostream>
using namespace std;
char x;
int main () {
    cin>>x;
if (x=='a'||x=='e'||x=='i'||x=='o'||x=='u')
    cout << "este vocala";
else
    cout << "nu este vocala";
    return 0;
}

The ASCII codes of the vowels are not consecutive so we can no longer put the condition of belonging to a character to a range. Being only 5 vowels we can write a condition composed of 5 others connected by ||. If we would have to test if the read character is consonant, we should not rush to consider that it is correct just deny the above condition. That would just mean it's not a vowel. We need to bond with && negating the condition above from the condition that the character to be tested is lowercase.