Gurmukhi Unicode . . .

This page shows you the Gurmukhi Unicode mappings so if you are thinking of creating web pages where you want to use Gurmukhi characters but you cannot save the pages in UTF-8 - ie, you can only save them in ASCII - then you can use the ASCII range to display the Unicode characters like this.

All of the fonts on this site have the Unicode characters and you can use them all on modern systems.

It is important that you have a font that is capable of displaying the characters that you need and, displaying them in the right way.

We've seen how to input the UniCode values directly onto a page but there are alternative means that can give you more control if you are prepared to make the effort.

If you are writing your own HTML code, you can write the codes into the page directly. You start off with an ampersand (&) and then a hash (#) followed by the UTF-8 code and then a semicolon. So, the code for a 'ਪ' would look like 'ਪ'.

Gurmukhi lies between 2560 and 2679 but there are two other characters that are important that are usually included in the Devanagari range - they are for:

  • end-of-sentence (danda &2404; ।); and,
  • end-of-paragraph (double-danda &2405; ॥).

Below is a table of the UTF-8 values in the Gurmukhi range with the values in decimal . . .

UTF-8 codes in decimal
   xxx0  xxx1   xxx2  xxx3   xxx4  xxx5   xxx6  xxx7   xxx8  xxx9 
262x ਿ
  xxx0xxx1 xxx2xxx3 xxx4xxx5 xxx6xxx7 xxx8xxx9

To use them in html, you need to have an ampersand followed by a hash mark, then the number and then a semicolon like so ਅ which gives ਅ.

Normally in html, codes are written in a numbering system called hexadecimal so, instead of denary (base 10) - used in the table above - hexadecimal uses base 16. In order to represent single values from 10 to 15, the letters 'A' to 'F' are used so instead of denary's 0 1 2 3 4 5 6 7 8 9, hexadecimal's sequence is 0 1 2 3 4 5 6 7 8 9 A B C D E F. Clearly, the browser needs to know whether it is looking at a denary or a hexadecimal number so in the case of hexadecimal values are represented using ਅ which gives ਅ.

UTF-8 codes in hexadecimal
   xxx0  xxx1   xxx2  xxx3   xxx4  xxx5   xxx6  xxx7   xxx8  xxx9   xxxA  xxxB   xxxC  xxxD   xxxE  xxxF 
0A3x ਿ
0A7x ੿
   xxx0  xxx1   xxx2  xxx3   xxx4  xxx5   xxx6  xxx7   xxx8  xxx9   xxxA  xxxB   xxxC  xxxD   xxxE  xxxF 

Above, you can see the codes for each of the alphabetical characters and the vowel sounds - along with the vowels with their respective characters - and other diacritical marks. To get a 'ਪ', find it on the table and then look at the row ('260x') and the column ('xxx2'), substitute the 'x's with the numbers and you get the numerical value you need.

The table is organised so that it starts off with a few diacritical marks at the beginning of the 256x line and then it gets into the vowel carriers with their respective vowels.

Then, at 2581, it starts off with line 2 of the alphabet with ਕ and proceeds through the alphabet, until it gets to 2600, skips a code and then continues with the sixth line, all the way to 2606 (ਮ). Then, it starts off with the following line, 2607 ਯ 2608 ਰ, skips one, 2610 ਲ, then has the paer-bindi form, skips one then has 2613 ਵ and then we get forms of sassaa and haha and then we go into some other marks.

When we get down to 2649 ਖ਼, we start with some paer-bindi forms and then we get ੜ at 2652. Finally, we get the numbers and some other characters - ੴ   2676 is 'ek onkar' which is a Sikh symbol meaning 'one god'.

The reason for these gaps is that if you want to transliterate Gurmukhi to Devanagari or one of the other Indian writing systems, or vice versa, it is a simple addition or subtraction.

The other marks between 2620 and 2637 need to have a letter to attach themselves to. You can see that they are in their normal pairs and that they are in the same order as the vowels with the carriers at the top of the table. So, here they are...

codeIn situcode
Mukta ਅ ਕ
Kannaa ਆ ਕਾ ਕਾ
Sihari ਇ ਕਿ ਕਿ
Bihari ਈ ਕੀ ਕੀ
Aunkard ਉ ਕੁ ਕੁ
Dulaenkarday ਊ ਕੂ ਕੂ
Laanv ਏ ਕੇ ਕੇ
Dulaavaan ਐ ਕੈ ਕੈ
Hordaa ਓ ਕੋ ਕੋ
Kanaurdaa ਔ ਕੌ ਕੌ
code In situ code
Bindi ਂ ਕਾਂ ਕਾਂ
Tippee ੰ ਕੰ ਕੰ
Adhak ੱ ਕੱਡ ਕੱਡ
Adhak-Bindi ਁ ਕਁਡ ਕਁਡ
Paer-bindi ਼ ਕ਼ ਕ਼
Virama ੍ ਕ੍ ਕ੍

The bottom two lines of the table above are interesting because:

  • the paer-bindi allows us to generate characters that aren't in the set above. You might see a paer-bindi form of ਵ - ਵ਼ - which is sometimes used to differentiate between 'w' and 'v' in some books and there are others that are used - in dictionaries, you might see ਕ਼ and some others in words of Arabic origin. Essentially, the paer-bindi is used to accommodate sounds form other languages; and,
  • The virama is borrowed from another script (Devanagari - ्). In Punjabi, there is an implicit 'a' sound after each letter so ਪਰ sounds like 'para'. The virama cancels out the implicit 'a' of the letter it is joined to. However, the only time this happens explicitly in Punjabi is where you have a paer letter such as a paer-rarra. So the clever people who designed the Unicode standard decided that you can form a paer-rarra by sticking a virama between the two letters so, where: ਪਰ sounds like 'para', ਪ੍ਰ; sounds like 'pra'

Below are the character sequences for the paer characters. These characters do not appear in the unicode standard and have to be added to each font that is designed. They are displayed because the font shaping engine recognises the sequence in the font and displays that glyph like below...

code In situ code
Rarra ੍ਰ ੍ਰ ਕ੍ਰ ਕ੍ਰ
Hahha ੍ਹ ੍ਹ ਕ੍ਹ ਕ੍ਹ
Wawwa ੍ਵ ੍ਵ ਕ੍ਵ ਕ੍ਵ
Yaeya ੍ਯ ੍ਯ ਕ੍ਯ ਕ੍ਯ

So, as an example/test, let's form the word 'Drink'.

Start with the 'English' 'd'    ਡ
Add the paer-rarra so now we have 'dr'    ਡ੍ਰ ਡ੍ਰ
Add a sihari like so   ਡ੍ਰਿ ਡ੍ਰਿ
Now a tippee   ਡ੍ਰਿੰ ਡ੍ਰਿੰ
Finally a kakkaa   ਡ੍ਰਿੰਕ  ਡ੍ਰਿੰਕ

The Unicode letter-shaping engine takes the characters that you have inputted in the correct order for speech and re-orders them and builds the shape above.


Copyright ©2007-2022 Paul Alan Grosse.