Practice Exam Questions
Question 1
What is the primary purpose of a training dataset in machine learning?
A. To evaluate the model’s accuracy on new data
B. To teach the model patterns using known outcomes
C. To store prediction results
D. To deploy the model to production
Correct Answer: B
Explanation:
The training dataset is used to teach the model by learning relationships between features and labels.
Question 2
Which dataset is used to assess how well a machine learning model performs on unseen data?
A. Training dataset
B. Feature dataset
C. Validation dataset
D. Prediction dataset
Correct Answer: C
Explanation:
The validation dataset is separate from training data and is used to evaluate the model’s ability to generalize.
Question 3
Why should the same dataset not be used for both training and validation?
A. It increases storage costs
B. It slows down training
C. It can lead to misleading performance results
D. It prevents model deployment
Correct Answer: C
Explanation:
Using the same data for training and validation can hide overfitting and give an inaccurate measure of model performance.
Question 4
A model performs very well on training data but poorly on validation data. What is this most likely an example of?
A. Underfitting
B. Overfitting
C. Data labeling
D. Feature engineering
Correct Answer: B
Explanation:
Overfitting occurs when a model memorizes training data but fails to generalize to new, unseen data.
Question 5
Which statement about a validation dataset is TRUE?
A. It is used to adjust model parameters
B. It replaces the need for training data
C. It helps evaluate model performance
D. It contains only unlabeled data
Correct Answer: C
Explanation:
Validation data is used to assess how well the model performs but is not used to train or adjust it.
Question 6
In supervised learning, which datasets typically contain both features and labels?
A. Validation only
B. Training only
C. Both training and validation
D. Neither training nor validation
Correct Answer: C
Explanation:
Both datasets contain features and labels, but they are used for different purposes.
Question 7
What is a key benefit of using a validation dataset during model development?
A. Faster training times
B. Automatic feature creation
C. Detection of overfitting
D. Reduced data storage
Correct Answer: C
Explanation:
Validation data helps identify whether the model is overfitting the training data.
Question 8
A dataset is split into 80% training data and 20% validation data.
What is the purpose of the 20% portion?
A. To retrain the model after deployment
B. To evaluate the model’s predictions
C. To generate new features
D. To label the data
Correct Answer: B
Explanation:
The validation portion is used to evaluate how well the model performs on unseen data.
Question 9
Which phrase best describes how a validation dataset is used?
A. Teaching the model
B. Fine-tuning the labels
C. Testing model generalization
D. Storing predictions
Correct Answer: C
Explanation:
Validation data is used to test how well the model generalizes beyond its training data.
Question 10
Which scenario correctly describes the use of training and validation datasets?
A. Training data is used only after deployment
B. Validation data is used to adjust model weights
C. Training data teaches the model; validation data evaluates it
D. Both datasets are identical
Correct Answer: C
Explanation:
Training data is used for learning, while validation data is used for evaluation.
Exam Strategy Tip
On AI-900:
- Training dataset → learning and pattern recognition
- Validation dataset → evaluation and generalization
- Watch for keywords like overfitting, unseen data, and model performance
If you can map those keywords quickly, these questions become easy points.
Go to the AI-900 Exam Prep Hub main page.
