Tag: Features

Identify Features and Labels in a Dataset for Machine Learning (AI-900 Exam Prep)

This section of the AI-900: Microsoft Azure AI Fundamentals exam focuses on understanding one of the most important foundational concepts in machine learning: features and labels. You are not expected to build models or write code, but you must be able to recognize features and labels in a dataset and understand their role in different machine learning scenarios.

This topic appears under: Describe Artificial Intelligence workloads and considerations (15–20%) → Describe core machine learning concepts


What Is a Dataset in Machine Learning?

A dataset is a collection of data used to train, validate, and test machine learning models. In supervised learning scenarios (which are emphasized in AI-900), a dataset typically contains:

  • Features: The input values used to make predictions
  • Labels: The output or target values the model learns to predict

Each row in a dataset usually represents a single observation or record, and each column represents either a feature or a label.


What Are Features?

Features are the individual measurable properties or characteristics of the data that are used as inputs to a machine learning model.

Key Characteristics of Features

  • Features describe what you know about each data point
  • They are used by the model to identify patterns
  • Features can be numerical, categorical, or derived

Examples of Features

ScenarioExample Features
House price predictionNumber of bedrooms, square footage, location
Customer churnAccount age, number of support tickets, monthly spend
Email classificationWord frequency, sender domain, message length

In Azure Machine Learning, features are often referred to as input variables.


What Are Labels?

A label is the value that a machine learning model is trained to predict. Labels are only present in supervised learning datasets.

Key Characteristics of Labels

  • Labels represent the outcome or answer
  • A dataset usually has one label column
  • Labels are known during training but unknown during prediction

Examples of Labels

ScenarioLabel
House price predictionSale price
Customer churnChurned (Yes/No)
Image classificationObject category

In Azure Machine Learning, labels are often called target variables.


Features vs Labels: Key Differences

AspectFeaturesLabels
PurposeInput to the modelOutput to predict
QuantityUsually manyTypically one
Known during trainingYesYes
Known during predictionYesNo

Understanding this distinction is critical for AI-900 exam questions.


Features and Labels in Supervised Learning

Supervised learning relies on labeled datasets. The model learns by comparing its predictions to the known labels and adjusting accordingly.

Common Supervised Learning Types

  • Regression
    • Features: numeric or categorical inputs
    • Label: numeric value (e.g., price, temperature)
  • Classification
    • Features: descriptive inputs
    • Label: category or class (e.g., spam/not spam)

Features and Labels in Unsupervised Learning

Unsupervised learning datasets do not contain labels.

  • The model identifies patterns or groupings on its own
  • Common example: clustering

In AI-900, this distinction is important:

If a dataset has no labels, it is not supervised learning.


Real-World Azure Example

Consider a dataset used in Azure Machine Learning to predict whether a customer will cancel a subscription.

  • Features:
    • Number of logins per month
    • Subscription length
    • Customer support interactions
  • Label:
    • Subscription canceled (Yes or No)

The model learns the relationship between the features and the label to make future predictions.


Exam Tips for AI-900

  • If the question asks “what the model uses to make predictions”, look for features
  • If the question asks “what the model predicts”, look for labels
  • If labels are present, it is supervised learning
  • AI-900 focuses on conceptual understanding, not data science implementation

Key Takeaways

  • Features are input variables used to make predictions
  • Labels are the known outcomes the model learns to predict
  • Supervised learning requires labeled data
  • Being able to identify features and labels in a scenario is essential for AI-900

This knowledge forms the foundation for understanding regression, classification, and many Azure AI workloads covered later in the exam.


Go to the Practice Exam Questions for this topic.

Go to the AI-900 Exam Prep Hub main page.

Practice Questions: Identify Features and Labels in a Dataset for Machine Learning (AI-900 Exam Prep)

Practice Exam Questions


Question 1

You are training a model to predict house prices. The dataset includes columns for square footage, number of bedrooms, location, and sale price.
Which column is the label?

A. Square footage
B. Number of bedrooms
C. Location
D. Sale price

Correct Answer: D

Explanation:
The label is the value the model is trained to predict. In this scenario, the goal is to predict the sale price.


Question 2

Which statement best describes a feature in a machine learning dataset?

A. The final prediction made by the model
B. An input value used to make predictions
C. A rule written by a developer
D. The accuracy of the model

Correct Answer: B

Explanation:
Features are the input variables that provide information the model uses to make predictions.


Question 3

A dataset contains customer age, subscription length, monthly charges, and whether the customer canceled the service.
What is the label?

A. Customer age
B. Subscription length
C. Monthly charges
D. Whether the customer canceled

Correct Answer: D

Explanation:
The label represents the outcome being predicted—in this case, whether the customer canceled the service.


Question 4

Which type of machine learning requires both features and labels?

A. Unsupervised learning
B. Reinforcement learning
C. Supervised learning
D. Clustering

Correct Answer: C

Explanation:
Supervised learning uses labeled data so the model can learn the relationship between features and known outcomes.


Question 5

A dataset is used to group customers based on purchasing behavior, but it does not contain any target outcome.
What does this dataset contain?

A. Labels only
B. Features only
C. Training results
D. Predictions

Correct Answer: B

Explanation:
Unsupervised learning datasets contain features but do not include labels.


Question 6

In an email spam detection dataset, which item would most likely be a feature?

A. Spam or not spam
B. Model accuracy score
C. Number of words in the email
D. Final prediction

Correct Answer: C

Explanation:
The number of words is an input characteristic used by the model to make predictions, making it a feature.


Question 7

Which statement about labels is TRUE?

A. Labels are optional in supervised learning
B. Labels are the inputs used by the model
C. Labels represent the value the model predicts
D. Labels are created after predictions are made

Correct Answer: C

Explanation:
Labels are the known outcomes the model is trained to predict in supervised learning scenarios.


Question 8

You are preparing data in Azure Machine Learning to predict product demand.
Which columns should be selected as features?

A. Only the column you want to predict
B. All columns except the target outcome
C. Only numerical columns
D. Only categorical columns

Correct Answer: B

Explanation:
Features are the input columns used to predict the target outcome, which is the label.


Question 9

A dataset includes the following columns: temperature, humidity, wind speed, and weather condition.
If the goal is to predict the weather condition, what are temperature, humidity, and wind speed?

A. Labels
B. Predictions
C. Features
D. Outputs

Correct Answer: C

Explanation:
These values are inputs used to predict the weather condition, making them features.


Question 10

Which scenario best represents a labeled dataset?

A. Customer data grouped by similarity
B. Sensor readings without outcomes
C. Product reviews with sentiment categories
D. Website logs without classifications

Correct Answer: C

Explanation:
Product reviews with sentiment categories include known outcomes, which are labels, making the dataset labeled.


Exam Pattern Tip

On AI-900:

  • Features = inputs
  • Labels = outputs
  • If labels exist → supervised learning
  • If no labels → unsupervised learning

If you can identify those quickly, you’ll eliminate most wrong answers immediately.


Go to the AI-900 Exam Prep Hub main page.