--:-- --
↓ Scroll for more

Unit 1.6

Data and Program Representation in a Computer

IT 231: IT and Application

Learning Objectives 🎯

By the end of this chapter, you will be able to:

  • ✅ Explain why computers use the binary system.
  • ✅ Define bit and byte.
  • ✅ Understand the purpose of character encoding standards.
  • ✅ Differentiate between ASCII and Unicode.

Why Binary? 💡

Computers don't understand words. They understand electricity.

At the most basic level, a computer circuit can be in one of two states:

  • ON (electricity is flowing)
  • OFF (no electricity)

We represent these two states using numbers:

  • ON = 1
  • OFF = 0

This two-digit system (0 and 1) is called the Binary System, and it's the fundamental language of all digital computers.

Bits and Bytes: The Building Blocks

All data (text, images, sound) is broken down into these simple units.

Bit (Binary Digit)

The smallest possible unit of data.

0 or 1

Byte

A group of 8 bits.

01000001

A single byte can represent 256 (28) different values.

Measuring Data Storage 📊

The byte is the standard unit for measuring digital storage.

  • Kilobyte (KB) ≈ 1,000 bytes
    (A plain text email)
  • Megabyte (MB) ≈ 1 million bytes
    (A high-quality photo or MP3 song)
  • Gigabyte (GB) ≈ 1 billion bytes
    (A standard-definition movie)
  • Terabyte (TB) ≈ 1 trillion bytes
    (The storage of a typical modern laptop)

Representing Text: The Challenge

How does a computer understand the letter 'A' or the symbol '?'

It uses Character Encoding Standards: a dictionary that maps each character to a unique binary number.

'A' ➡️ [Encoding Standard] ➡️ 01000001

The First Standard: ASCII

ASCII (American Standard Code for Information Interchange)

  • An early, widely adopted standard.
  • Uses 7 or 8 bits per character.
  • Can represent 128 characters:
    • Uppercase English (A-Z)
    • Lowercase English (a-z)
    • Numbers (0-9)
    • Common punctuation (!, @, #, etc.)

Example: In ASCII, the character 'A' is represented by the number 65, which is 01000001 in binary.

The Problem with ASCII 🌍

ASCII was great for English, but what about other languages?

Nepali (नेपाली)

नमस्ते

Japanese (日本語)

こんにちは

Arabic (العربية)

مرحبا

Limitation: With only 128-256 possible values, ASCII cannot represent characters from most of the world's languages.

The Universal Solution: Unicode

Unicode is a modern standard designed to fix ASCII's limitations.

  • Its goal is to represent every character from every language.
  • It can represent over 1 million unique characters (including emojis! 👍🎉).
  • It is the dominant standard for the web, modern operating systems, and software development.

The most common implementation of Unicode is UTF-8, which is backward compatible with ASCII.

ASCII vs. Unicode at a Glance 🔍

ASCII

  • Scope: English-centric
  • Size: 7 or 8 bits
  • Characters: 128-256
  • Use: Legacy systems

Unicode (UTF-8)

  • Scope: Universal (All languages)
  • Size: Variable (1-4 bytes)
  • Characters: Over 149,000+
  • Use: Modern standard

What About Programs?

Just like data, program instructions must also be in binary for the CPU to execute them.

High-Level Code (Human-readable)

print("Hello, World!")

⬇️

A Compiler or Interpreter translates the code.

⬇️

Machine Code (Binary)

01101000 01100101 01101100 ...

Practical Application in Nepal 🇳🇵

  • Language and Script: The Nepali language uses the Devanagari script (e.g., क, ख, ग). ASCII cannot represent these characters.
  • Digital Nepal: Unicode (UTF-8) is essential for all digital services in Nepal, from government websites and the Nagarik App to news portals and social media. It allows us to type and read in Nepali online.
  • Tech Market: When buying a phone or laptop in Nepal, storage capacity (measured in GB or TB) is a key factor, determining how many photos, videos, and apps you can store.

Key Takeaways ⚡

  • Computers represent everything using the binary system (0s and 1s) because of their electronic nature (on/off).
  • A bit is a single 0 or 1. A byte is a group of 8 bits and is the standard unit for measuring storage.
  • ASCII is an older, limited encoding for English text.
  • Unicode is the modern, universal standard that supports all world languages, including Nepali.
  • Program code must be compiled into binary machine code before a computer can run it.

Thank You!

Any questions?

Next Topic: Unit 2.1 - The Central Processing Unit (CPU)

Back to IT 231 Course Notes