Menu
Advertisement
Advertisement
Advertisement

Data

D A T A

Chapter 2 of the sources, titled Data, focuses on defining what data is and providing a framework for how it is organised, classified, and measured. Data are defined as the facts and figures collected, analysed, and summarised for the purpose of presentation and interpretation.


1. Unstructured vs. Structured Data

The sources first distinguish between how data is organised:

đź’ˇ

Unstructured Data

Information that is not organised in a predefined manner and is typically text-heavy. It requires significant work to process and is often of little use when scattered.

Examples: YouTube comments, image files, social media posts, and song lyrics.

đź’ˇ

Structured Data

A standardised, clearly defined format (usually tabular) that is easy to analyse. To be useful, the context of the text and numbers must be known.

Examples: A student dataset showing Name, Gender, and Marks, or a fertiliser dataset showing types and amounts used.


2. Variables and Cases

Within structured data tables, two components are essential:

  • Case (Observation): The individual unit for which data is collected , uniquely identifying each row.

  • Variable: A characteristic or attribute that varies across different units , typically represented by the columns.


    Example: In a student table, each student (Anjali, Pradeep) is a case, while their “Gender” or “Marks” are variables.

3. Classification of Data

Data is broadly classified into two categories:

đź’ˇ

Categorical Data (Qualitative)

Identifies group membership. You cannot perform meaningful mathematical operations on this data.

Example: Gender (Male/Female) or School Board (CBSE, ICSE, State Board).

đź’ˇ

Numerical Data (Quantitative)

Describes numerical properties and allows for mathematical operations.

Example: Marks obtained in an exam or height in centimetres.


4. Time-Series vs. Cross-Sectional Data

This distinction depends on when and where the data is recorded:

đź’ˇ

Time-Series Data

Data recorded for one subject over a period of time in chronological order.

Example: Observing the temperature in Delhi every day for a week.

đź’ˇ

Cross-Sectional Data

Data observed for several subjects at the same time.

Example: Observing the temperature of Delhi, Chennai, and Mumbai on the same specific day.


5. Scales of Measurement

There are four levels of measurement used to collect data:

  1. Nominal: Labels or names used to identify characteristics with no inherent order.
    • Example: Blood group, Hair colour, or Brand names.
  2. Ordinal: Labels where the order or rank is meaningful, but the exact distance between values is not fixed.
    • Example: Service ratings such as “Excellent,” “Good,” and “Poor”.
  3. Interval: Numeric data where the interval between values is fixed, but there is no absolute zero. Ratios are meaningless here because zero is arbitrary.
    • Example: Temperature in Celsius or Fahrenheit (0°C does not mean “no temperature”).
  4. Ratio: Numeric data with all the properties of interval data plus an absolute zero. Ratios between values are meaningful.
    • Example: Height, Weight, and Marks.

Practice Questions

Q1

Data Classification

What kind of data are “Social media posts”?

View Detailed Solution â–Ľ

Unstructured data, because they are not organised in a predefined, standardised tabular format.

Q2

Data Type

Values of temperature and humidity in a room are measured for 24 hours at regular intervals of 30 minutes. What type of data is this?

View Detailed Solution â–Ľ

Time-series data, because the observations are recorded over a period of time for a single location.

Q3

Scale Indentification

Which scale of measurement is used for “Brand name of a mobile phone”?

View Detailed Solution â–Ľ

Nominal scale, as these are simply labels used for identification with no meaningful order.

Q4

Mathematical Operations

If a variable allows for both addition and subtraction, which scales of measurement could it belong to?

View Detailed Solution â–Ľ

Interval and Ratio scales, as both are numeric and allow for calculating differences between values.


đź’ˇ

The Library Analogy

Think of Unstructured Data as a giant pile of books on the floor; you know there is information there, but it’s hard to find anything.

  • Structured Data is like those same books organised on shelves by Case (the specific book) and Variable (the genre, author, or page count).
  • The Scales of Measurement are like the different ways you might categorise the books:
    • Nominal for the colour of the cover.
    • Ordinal for a “Top 10” list.
    • Ratio for the actual weight of the book in grams.
Sponsored Content

finding (solutions) x

A public notebook and learning hub.