Tag: Data Science

AI, AI-900, Artificial Intelligence (AI), Data Science, Machine Learning (ML) January 31, 2026

Describe Data and Compute Services for Data Science and Machine Learning (AI-900 Exam Prep)

This topic focuses on understanding which Azure services are used to store data and provide compute power for data science and machine learning workloads — not on how to configure them in depth. For the AI-900 exam, you should recognize what each service is used for and when you would choose one over another.

Why Data and Compute Matter in Machine Learning

Machine learning solutions require two essential components:

Data services → where training and inference data is stored and accessed
Compute services → where models are trained and executed

Azure provides scalable, cloud-based services for both, allowing organizations to build, train, and deploy machine learning solutions efficiently.

Data Services for Machine Learning on Azure

Azure offers several data storage services commonly used in machine learning scenarios.

Azure Blob Storage

Azure Blob Storage is the most common data store for machine learning.

Key characteristics:

Stores unstructured data (files, images, videos, CSVs)
Highly scalable and cost-effective
Frequently used as the data source for Azure Machine Learning experiments

Typical use cases:

Training datasets
Model artifacts
Logs and output files

👉 On AI-900: If the question mentions large datasets, files, or unstructured data, Blob Storage is usually the answer.

Azure Data Lake Storage Gen2

Azure Data Lake Storage is optimized for big data analytics and machine learning.

Key characteristics:

Built on Azure Blob Storage
Supports hierarchical namespaces
Designed for analytics workloads

Typical use cases:

Large-scale machine learning projects
Advanced analytics and data science pipelines

👉 On AI-900: Think of Data Lake Storage when big data and analytics are mentioned.

Azure SQL Database

Azure SQL Database stores structured, relational data.

Key characteristics:

Table-based storage
Uses SQL for querying
Suitable for well-defined schemas

Typical use cases:

Business and transactional data
Structured datasets used in ML training

👉 On AI-900: If the data is relational and structured, Azure SQL Database is a common choice.

Compute Services for Machine Learning on Azure

Compute services provide the processing power needed to train and run machine learning models.

Azure Machine Learning Compute

Azure Machine Learning provides managed compute resources specifically designed for ML workloads.

Key characteristics:

Scalable CPU and GPU compute
Used for training and inference
Managed through Azure Machine Learning workspace

Typical use cases:

Model training
Experimentation
Batch inference

👉 On AI-900: This is the primary compute service for machine learning.

Azure Virtual Machines

Azure Virtual Machines (VMs) offer full control over the compute environment.

Key characteristics:

Customizable CPU or GPU configurations
Supports specialized ML workloads
More management responsibility

Typical use cases:

Custom machine learning environments
Legacy or specialized ML tools

👉 On AI-900: VMs appear when flexibility or custom configuration is required.

Azure Kubernetes Service (AKS)

AKS is used primarily for deploying machine learning models at scale.

Key characteristics:

Container orchestration
High availability and scalability
Often used for real-time inference

Typical use cases:

Production ML model deployment
Scalable inference endpoints

👉 On AI-900: AKS is associated with deployment, not training.

How These Services Work Together

In a typical Azure machine learning workflow:

Data is stored in Blob Storage, Data Lake, or SQL Database
Models are trained using Azure Machine Learning compute or VMs
Models are deployed using Azure Machine Learning or AKS
Predictions are generated and consumed by applications

Azure handles scalability, security, and integration across these services.

Key Exam Takeaways

For AI-900, remember:

Blob Storage → unstructured ML data
Data Lake Storage → big data analytics
Azure SQL Database → structured data
Azure Machine Learning compute → training and experimentation
Virtual Machines → custom compute environments
AKS → scalable model deployment

You are not expected to configure these services — only recognize their purpose.

Exam Tip 💡

If a question asks:

“Where is ML data stored?” → Blob Storage or Data Lake
“Where is the model trained?” → Azure Machine Learning compute
“How is a model deployed at scale?” → AKS

Go to the Practice Exam Questions for this topic.

Go to the AI-900 Exam Prep Hub main page.

Data Education & Training, Data Science January 20, 2026

Glossary – 100 “Data Science” Terms

Below is a glossary that includes 100 “Data Science” terms and phrases, along with their definitions and examples, in alphabetical order. Enjoy!

Term	Definition & Example
A/B Testing	Comparing two variants. Example: Website layout test.
Accuracy	Overall correct predictions rate. Example: 90% accuracy.
Actionable Insight	Insight leading to action. Example: Improve onboarding.
Algorithm	Procedure used to train models. Example: Decision trees.
Alternative Hypothesis	Assumption opposing the null hypothesis. Example: Group A performs better than B.
AUC	Area under ROC curve. Example: Model ranking metric.
Bayesian Inference	Updating probabilities with new evidence. Example: Prior and posterior beliefs.
Bias-Variance Tradeoff	Balance between simplicity and flexibility. Example: Model tuning.
Bootstrapping	Resampling technique for estimation. Example: Estimating confidence intervals.
Business Problem	Decision-focused question. Example: Why churn increased.
Causation	One variable directly affects another. Example: Price drop causes sales increase.
Classification	Predicting categories. Example: Spam detection.
Clustering	Grouping similar observations. Example: Market segmentation.
Computer Vision	Interpreting images and video. Example: Image classification.
Confidence Interval	Range likely containing the true value. Example: 95% CI for average revenue.
Confusion Matrix	Table evaluating classification results. Example: True positives vs false positives.
Correlation	Strength of relationship between variables. Example: Ad spend vs revenue.
Cross-Validation	Repeated training/testing splits. Example: k-fold CV.
Data Drift	Change in input data distribution. Example: New demographics.
Data Imputation	Replacing missing values. Example: Median imputation.
Data Leakage	Training model with future information. Example: Using post-event data.
Data Science	Interdisciplinary field combining statistics, programming, and domain knowledge to extract insights from data. Example: Predicting customer churn.
Data Storytelling	Communicating insights effectively. Example: Executive dashboards.
Dataset	A structured collection of data for analysis. Example: Customer transactions table.
Deep Learning	Multi-layer neural networks. Example: Speech recognition.
Descriptive Statistics	Summary statistics of data. Example: Mean, median.
Dimensionality Reduction	Reducing number of features. Example: PCA.
Effect Size	Magnitude of difference or relationship. Example: Lift in conversion rate.
Ensemble Learning	Combining multiple models. Example: Boosting techniques.
Ethics in Data Science	Responsible use of data and models. Example: Avoiding biased predictions.
Experimentation	Testing hypotheses with data. Example: A/B testing.
Explainable AI (XAI)	Techniques to explain predictions. Example: SHAP values.
Exploratory Data Analysis (EDA)	Initial data investigation using statistics and visuals. Example: Distribution plots.
F1 Score	Balance of precision and recall. Example: Imbalanced datasets.
Feature	An input variable used in modeling. Example: Customer age.
Feature Engineering	Creating new features from raw data. Example: Tenure calculated from signup date.
Forecasting	Predicting future values. Example: Demand forecasting.
Generalization	Model performance on unseen data. Example: Stable test accuracy.
Hazard Function	Instantaneous event rate. Example: Churn risk over time.
Holdout Set	Data reserved for final evaluation. Example: Final test dataset.
Hyperparameter	Pre-set model configuration. Example: Learning rate.
Hypothesis	A testable assumption about data. Example: Discounts increase conversion rates.
Hypothesis Testing	Statistical method to evaluate assumptions. Example: t-test for average sales.
Insight	Meaningful analytical finding. Example: High churn among new users.
Label	Known output used in supervised learning. Example: Fraud or not fraud.
Likelihood	Probability of data given parameters. Example: Used in Bayesian models.
Loss Function	Measures prediction error. Example: Mean squared error.
Mean	Arithmetic average. Example: Average sales value.
Median	Middle value of ordered data. Example: Median income.
Missing Values	Absent data points. Example: Null customer age.
Mode	Most frequent value. Example: Most common category.
Model	Mathematical representation learned from data. Example: Logistic regression.
Model Drift	Performance degradation over time. Example: Changing customer behavior.
Model Interpretability	Understanding model decisions. Example: Feature importance.
Monte Carlo Simulation	Random sampling to model uncertainty. Example: Risk modeling.
Natural Language Processing (NLP)	Analyzing human language. Example: Sentiment analysis.
Neural Network	Model inspired by the human brain. Example: Image recognition.
Null Hypothesis	Default assumption of no effect. Example: No difference between two groups.
Optimization	Process of minimizing loss. Example: Gradient descent.
Outlier	Value significantly different from others. Example: Unusually large purchase.
Overfitting	Model memorizes training data. Example: Poor test performance.
Pipeline	End-to-end data science workflow. Example: Ingest → train → deploy.
Population	Entire group of interest. Example: All customers.
Posterior Probability	Updated belief after observing data. Example: Updated churn likelihood.
Precision	Correct positive prediction rate. Example: Fraud detection precision.
Principal Component Analysis (PCA)	Linear dimensionality reduction technique. Example: Visualizing high-dimensional data.
Prior Probability	Initial belief before observing data. Example: Baseline churn rate.
p-value	Probability of observing results under the null hypothesis. Example: p < 0.05 indicates significance.
Recall	Ability to identify all positives. Example: Medical diagnosis.
Regression	Predicting numeric values. Example: Sales forecasting.
Reinforcement Learning	Learning via rewards and penalties. Example: Game-playing AI.
Reproducibility	Ability to recreate results. Example: Fixed random seeds.
ROC Curve	Classifier performance visualization. Example: Threshold comparison.
Sampling	Selecting subset of data. Example: Survey sample.
Sampling Bias	Non-representative sampling. Example: Surveying only active users.
Seasonality	Repeating time-based patterns. Example: Holiday sales.
Semi-Structured Data	Data with flexible structure. Example: JSON files.
Stacking	Ensemble method using meta-models. Example: Combining classifiers.
Standard Deviation	Average distance from the mean. Example: Price volatility.
Stationarity	Stable statistical properties over time. Example: Mean doesn’t change.
Statistical Power	Probability of detecting a true effect. Example: Larger sample sizes increase power.
Statistical Significance	Evidence results are unlikely due to chance. Example: Rejecting the null hypothesis.
Structured Data	Data with a fixed schema. Example: SQL tables.
Supervised Learning	Learning with labeled data. Example: Credit risk prediction.
Survival Analysis	Modeling time-to-event data. Example: Customer churn timing.
Target Variable	The outcome a model predicts. Example: Loan default indicator.
Test Data	Data used to evaluate model performance. Example: Held-out validation set.
Text Mining	Extracting insights from text. Example: Topic modeling.
Time Series	Data indexed by time. Example: Daily stock prices.
Tokenization	Splitting text into units. Example: Words or subwords.
Training Data	Data used to train a model. Example: Historical transactions.
Transfer Learning	Reusing pretrained models. Example: Image models for medical scans.
Trend	Long-term direction in data. Example: Growing user base.
Underfitting	Model too simple to capture patterns. Example: High bias.
Unstructured Data	Data without predefined structure. Example: Text, images.
Unsupervised Learning	Learning without labels. Example: Customer clustering.
Uplift Modeling	Measuring treatment impact. Example: Marketing campaign effectiveness.
Validation Set	Data used for tuning models. Example: Hyperparameter selection.
Variance	Measure of data spread. Example: Sales variability.
Word Embeddings	Numerical text representations. Example: Word2Vec.

Data Careers, Data Science January 20, 2026

What Exactly Does a Data Scientist Do?

A Data Scientist focuses on using statistical analysis, experimentation, and machine learning to understand complex problems and make predictions about what is likely to happen next. While Data Analysts often explain what has already happened, and Data Engineers build the systems that deliver data, Data Scientists explore patterns, probabilities, and future outcomes.

At their best, Data Scientists help organizations move from descriptive insights to predictive and prescriptive decision-making.

The Core Purpose of a Data Scientist

At its core, the role of a Data Scientist is to:

Explore complex and ambiguous problems using data
Build models that explain or predict outcomes
Quantify uncertainty and risk
Inform decisions with probabilistic insights

Data Scientists are not just model builders—they are problem solvers who apply scientific thinking to business questions.

Typical Responsibilities of a Data Scientist

While responsibilities vary by organization and maturity, most Data Scientists work across the following areas.

Framing the Problem and Defining Success

Data Scientists work with stakeholders to:

Clarify the business objective
Determine whether a data science approach is appropriate
Define measurable success criteria
Identify constraints and assumptions

A key skill is knowing when not to use machine learning.

Exploring and Understanding Data

Before modeling begins, Data Scientists:

Perform exploratory data analysis (EDA)
Investigate distributions, correlations, and outliers
Identify data gaps and biases
Assess data quality and suitability for modeling

This phase often determines whether a project succeeds or fails.

Feature Engineering and Data Preparation

Transforming raw data into meaningful inputs is a major part of the job:

Creating features that capture real-world behavior
Encoding categorical variables
Handling missing or noisy data
Scaling and normalizing data where needed

Good features often matter more than complex models.

Building and Evaluating Models

Data Scientists develop and test models such as:

Regression and classification models
Time-series forecasting models
Clustering and segmentation techniques
Anomaly detection systems

They evaluate models using appropriate metrics and validation techniques, balancing accuracy with interpretability and robustness.

Communicating Results and Recommendations

A critical responsibility is explaining:

What the model does and does not do
How confident the predictions are
What trade-offs exist
How results should be used in decision-making

A model that cannot be understood or trusted will rarely be adopted.

Common Tools Used by Data Scientists

While toolsets vary, Data Scientists commonly use:

Programming Languages such as Python or R
Statistical & ML Libraries (e.g., scikit-learn, TensorFlow, PyTorch)
SQL for data access and exploration
Notebooks for experimentation and analysis
Visualization Libraries for data exploration
Version Control for reproducibility

The emphasis is on experimentation, iteration, and learning.

What a Data Scientist Is Not

Clarifying misconceptions is important.

A Data Scientist is typically not:

A report or dashboard developer
A data engineer focused on pipelines and infrastructure
An AI product that automatically solves business problems
A decision-maker replacing human judgment

In practice, Data Scientists collaborate closely with analysts, engineers, and business leaders.

What the Role Looks Like Day-to-Day

A typical day for a Data Scientist may include:

Exploring a new dataset or feature
Testing model assumptions
Running experiments and comparing results
Reviewing model performance
Discussing findings with stakeholders
Iterating based on feedback or new data

Much of the work is exploratory and non-linear.

How the Role Evolves Over Time

As organizations mature, the Data Scientist role often evolves:

From ad-hoc modeling → repeatable experimentation
From isolated analysis → productionized models
From accuracy-focused → impact-focused outcomes
From individual contributor → technical or domain expert

Senior Data Scientists often guide model strategy, ethics, and best practices.

Why Data Scientists Are So Important

Data Scientists add value by:

Quantifying uncertainty and risk
Anticipating future outcomes
Enabling proactive decision-making
Supporting innovation through experimentation

They help organizations move beyond hindsight and into foresight.

Final Thoughts

A Data Scientist’s job is not simply to build complex models—it is to apply scientific thinking to messy, real-world problems using data.

When Data Scientists succeed, their work informs smarter decisions, better products, and more resilient strategies—always in partnership with engineering, analytics, and the business.

Good luck on your data journey!

The Data Community

Tag: Data Science

Describe Data and Compute Services for Data Science and Machine Learning (AI-900 Exam Prep)

Why Data and Compute Matter in Machine Learning

Data Services for Machine Learning on Azure

Azure Blob Storage

Azure Data Lake Storage Gen2

Azure SQL Database

Compute Services for Machine Learning on Azure

Azure Machine Learning Compute

Azure Virtual Machines

Azure Kubernetes Service (AKS)

How These Services Work Together

Key Exam Takeaways

Exam Tip 💡

Glossary – 100 “Data Science” Terms

What Exactly Does a Data Scientist Do?

The Core Purpose of a Data Scientist

Typical Responsibilities of a Data Scientist

Framing the Problem and Defining Success

Exploring and Understanding Data

Feature Engineering and Data Preparation

Building and Evaluating Models

Communicating Results and Recommendations

Common Tools Used by Data Scientists

What a Data Scientist Is Not

What the Role Looks Like Day-to-Day

How the Role Evolves Over Time

Why Data Scientists Are So Important

Final Thoughts

Information and resources for the data professionals' community