Data
D A T A
Chapter 2 of the sources, titled Data, focuses on defining what data is and providing a framework for how it is organised, classified, and measured. Data are defined as the facts and figures collected, analysed, and summarised for the purpose of presentation and interpretation.
1. Unstructured vs. Structured Data
The sources first distinguish between how data is organised:
Unstructured Data
Information that is not organised in a predefined manner and is typically text-heavy. It requires significant work to process and is often of little use when scattered.
Examples: YouTube comments, image files, social media posts, and song lyrics.
Structured Data
A standardised, clearly defined format (usually tabular) that is easy to analyse. To be useful, the context of the text and numbers must be known.
Examples: A student dataset showing Name, Gender, and Marks, or a fertiliser dataset showing types and amounts used.
2. Variables and Cases
Within structured data tables, two components are essential:
Case (Observation): The individual unit for which data is collected , uniquely identifying each row.
Variable: A characteristic or attribute that varies across different units , typically represented by the columns.
Example: In a student table, each student (Anjali, Pradeep) is a case, while their “Gender” or “Marks” are variables.
3. Classification of Data
Data is broadly classified into two categories:
Categorical Data (Qualitative)
Identifies group membership. You cannot perform meaningful mathematical operations on this data.
Example: Gender (Male/Female) or School Board (CBSE, ICSE, State Board).
Numerical Data (Quantitative)
Describes numerical properties and allows for mathematical operations.
Example: Marks obtained in an exam or height in centimetres.
4. Time-Series vs. Cross-Sectional Data
This distinction depends on when and where the data is recorded:
Time-Series Data
Data recorded for one subject over a period of time in chronological order.
Example: Observing the temperature in Delhi every day for a week.
Cross-Sectional Data
Data observed for several subjects at the same time.
Example: Observing the temperature of Delhi, Chennai, and Mumbai on the same specific day.
5. Scales of Measurement
There are four levels of measurement used to collect data:
- Nominal: Labels or names used to identify characteristics with no inherent order.
- Example: Blood group, Hair colour, or Brand names.
- Ordinal: Labels where the order or rank is meaningful, but the exact distance between values is not fixed.
- Example: Service ratings such as “Excellent,” “Good,” and “Poor”.
- Interval: Numeric data where the interval between values is fixed, but there is no absolute zero. Ratios are meaningless here because zero is arbitrary.
- Example: Temperature in Celsius or Fahrenheit (0°C does not mean “no temperature”).
- Ratio: Numeric data with all the properties of interval data plus an absolute zero. Ratios between values are meaningful.
- Example: Height, Weight, and Marks.
Practice Questions
Data Classification
What kind of data are “Social media posts”?
View Detailed Solution â–Ľ
Unstructured data, because they are not organised in a predefined, standardised tabular format.
Data Type
Values of temperature and humidity in a room are measured for 24 hours at regular intervals of 30 minutes. What type of data is this?
View Detailed Solution â–Ľ
Time-series data, because the observations are recorded over a period of time for a single location.
Scale Indentification
Which scale of measurement is used for “Brand name of a mobile phone”?
View Detailed Solution â–Ľ
Nominal scale, as these are simply labels used for identification with no meaningful order.
Mathematical Operations
If a variable allows for both addition and subtraction, which scales of measurement could it belong to?
View Detailed Solution â–Ľ
Interval and Ratio scales, as both are numeric and allow for calculating differences between values.
The Library Analogy
Think of Unstructured Data as a giant pile of books on the floor; you know there is information there, but it’s hard to find anything.
- Structured Data is like those same books organised on shelves by Case (the specific book) and Variable (the genre, author, or page count).
- The Scales of Measurement are like the different ways you might categorise the books:
- Nominal for the colour of the cover.
- Ordinal for a “Top 10” list.
- Ratio for the actual weight of the book in grams.
All Chapters in this Book
Statistics
Introduces the subject as the 'art of learning from data,' covering its collection, description, and analysis.
Data
Focuses on the nature of information itself and how it is categorised.
Describing Categorical Data
Visualising and identifying the 'centre' of qualitative data.
Describing Numerical Data
Tools for organising and measuring the typical values and spread of quantitative variables.
Association Between Two Variables
Explores how information about one variable can provide insight into another.
Basic Principle of Counting
Foundations of probability by teaching how to count possible outcomes.
Factorial
Defines the product of positive integers.
Permutation
Covers the various ways to calculate ordered arrangements of objects.
Combination
Focuses on the mathematical methods for selecting objects when the order of selection does not matter.