Category: Glossary of Data Terms

Glossary of Data Terms, Microsoft Fabric, Power BI February 7, 2026

Microsoft Fabric & Power BI Glossary (100 Terms)

Below is a table of 100 “Microsoft Fabric and Power BI” terms and definitions, sorted alphabetically (except for the first two entries). You can use it as a reference to get a quick idea of what something is or means, and also as a way to identify topics to research.

#	Term	Definition & Example
1	Microsoft Fabric	Unified analytics platform combining data engineering, warehousing, science, and BI. Example: Building pipelines and dashboards in one workspace.
2	Power BI	Microsoft’s business intelligence visualization tool. Example: Sales dashboards.
3	Aggregations	Pre-summarized tables. Example: Faster queries.
4	Anomaly Detection	Highlights unusual values. Example: Sales spike.
5	App	Packaged Power BI content. Example: Sales app.
6	Bookmarks	Saved report states. Example: Guided navigation. Article on configuring bookmarks
7	Bronze Layer	Raw data. Example: Ingested CSVs.
8	Calculated Column	Static DAX column. Example: Profit category.
9	Calculation Group	Reusable DAX logic. Example: Time intelligence.
10	Capacity Metrics App	Performance insights. Example: CU spikes.
11	Capacity Unit (CU)	Measure of Fabric compute. Example: Performance scaling.
12	Certified Dataset	Official data source. Example: Finance semantic model.
13	Composite Model	Mix of Import + DirectQuery. Example: Hybrid datasets. Article about designing and building composite models
14	Composite Table	Mixed storage table. Example: Hybrid dimensions.
15	Custom Visual	Marketplace visuals. Example: Sankey diagram.
16	Dashboard	KPI overview page. Example: Executive metrics.
17	Data Catalog	Discover datasets. Example: Search semantic models.
18	Data Lineage	Shows data flow. Example: Source → report.
19	Data Mart	Self-service warehouse. Example: Analyst-owned SQL.
20	Data Model Size	Memory footprint. Example: Import limits.
21	Data Pipeline	Orchestrates data movement. Example: Copy from S3.
22	Data Source Credentials	Authentication info. Example: SQL login.
23	Data Warehouse	Structured analytical database. Example: T-SQL querying sales facts.
24	Dataflow Gen2	Fabric ETL artifact. Example: Cloud ingestion pipelines.
25	DAX	Formula language for measures. Example: Total Sales calculation.
26	Delta Lake	Transactional file format. Example: ACID parquet.
27	Deployment Pipeline	Dev/Test/Prod promotion. Example: CI/CD. Article on creating and configuring deployment pipelines
28	Dimension Table	Descriptive attributes. Example: Products. Article that describes fact and dimension tables and how to create them
29	Direct Lake	Queries OneLake directly without import. Example: Near real-time reporting.
30	DirectQuery	Queries source system live. Example: SQL Server reporting. Article on choosing between DirectQuery and Import in Power BI
31	Drill Down	Navigate deeper. Example: Year → Month. Article on Power BI drilldown vs drill-through
32	Drill Through	Jump to detail page. Example: Customer profile. Article on Power BI drilldown vs drill-through
33	Embedded Analytics	Power BI in apps. Example: Web portals.
34	Endorsement	Certified or promoted datasets. Example: Trusted models.
35	End-to-End Analytics	Full Fabric workflow. Example: Ingest → model → report.
36	Fabric Capacity	Compute resources. Example: F64 SKU.
37	Fabric CI/CD	Automated deployments. Example: Pipeline promotion.
38	Fabric Data Activator	Event-based alerts. Example: Trigger email on anomaly.
39	Fabric Item	Any asset. Example: Notebook, warehouse.
40	Fabric Monitoring Hub	Capacity tracking. Example: CU consumption.
41	Fabric Workspace	Container for Fabric assets. Example: Lakehouse + reports together.
42	Fact Table	Stores measurable events. Example: Orders. Article that describes fact and dimension tables and how to create them
43	Gateway	Connects on-prem data. Example: Local SQL Server.
44	Git Integration	Source control. Example: Azure DevOps.
45	Goals	Performance targets. Example: Revenue quota.
46	Gold Layer	Business-ready data. Example: KPI models.
47	Import Mode	Data loaded into Power BI memory. Example: Daily refresh model.
48	Incremental Refresh	Only refresh recent data. Example: Last 30 days.
49	Lakehouse	Combines data lake + warehouse features. Example: Spark + SQL analytics.
50	Lineage View	Dependency visualization. Example: Pipeline flow.
51	M Language	Language behind Power Query. Example: Transform steps.
52	Measure	Dynamic DAX calculation. Example: YTD Revenue.
53	Medallion Architecture	Bronze/Silver/Gold layers. Example: Curated analytics.
54	Metrics App	Goal tracking. Example: OKRs.
55	Microsoft Purview	Governance integration. Example: Catalog assets.
56	Mirroring	Replicates operational DBs. Example: Azure SQL sync.
57	Model Refresh Failure	Update error. Example: Credential expired.
58	Model View	Relationship design canvas. Example: Schema building.
59	Notebook	Spark coding environment. Example: PySpark transformations.
60	Object-Level Security	Hides tables/columns. Example: HR salary masking.
61	OneLake	Fabric’s centralized data lake. Example: Shared parquet storage.
62	Page-Level Filter	Applies to page. Example: Region filter.
63	Paginated Report	Pixel-perfect reporting. Example: Invoice PDFs.
64	Performance Analyzer	Measures visual speed. Example: DAX tuning.
65	Perspective	User-specific model view. Example: Finance vs Sales.
66	Power BI Desktop	Authoring tool. Example: Local report creation.
67	Power BI Service	Cloud hosting platform. Example: app.powerbi.com.
68	Power Query	ETL engine in Power BI/Fabric. Example: Cleaning CSV files.
69	Preview Feature	Early-access capability. Example: New visuals.
70	PySpark	Python Spark API. Example: Transform big data.
71	Query Folding	Pushes logic to source. Example: SQL filtering.
72	Refresh	Updating model data. Example: Nightly refresh.
73	Relationship	Link between tables. Example: CustomerID join.
74	Report	Collection of visuals. Example: Finance report.
75	Report-Level Filter	Applies everywhere. Example: Fiscal year.
76	REST API	Automates Power BI. Example: Dataset refresh trigger.
77	Row-Level Security (RLS)	Restricts data by user. Example: Region access. Articles on implementing RLS roles and configuring RLS group membership
78	Semantic Model	Logical layer for reporting (formerly dataset). Example: Measures and relationships.
79	Sensitivity Label	Data classification. Example: Confidential. Article on applying sensitivity labels
80	Share	Grant report access. Example: Email link.
81	Shortcut	Virtual data reference. Example: External ADLS folder.
82	Silver Layer	Cleaned data. Example: Standardized tables.
83	Slicer	Interactive filter. Example: Year selector.
84	Spark	Distributed compute engine. Example: Large joins.
85	SQL Analytics Endpoint	T-SQL interface to Lakehouse. Example: BI queries.
86	Star Schema	Fact table with dimensions. Example: Sales model.
87	Subscribe	Scheduled email snapshots. Example: Weekly KPIs.
88	Tabular Editor	External modeling tool. Example: Bulk measures.
89	Tenant Setting	Admin control. Example: Export permissions.
90	Themes	Styling reports. Example: Brand colors.
91	Tooltip	Hover info. Example: Exact sales value. Article on creating tooltips in Power BI
92	T-SQL	SQL dialect in Fabric. Example: SELECT statements.
93	Usage Metrics	Report adoption stats. Example: View counts.
94	Visual	Chart or table. Example: Bar chart.
95	Visual Interaction	Cross-filtering visuals. Example: Click bar filters table.
96	Visual-Level Filter	Applies to one visual. Example: Top 10 only.
97	Warehouse Endpoint	SQL access to Lakehouse. Example: SSMS connection.
98	Workspace App Audience	Targeted content. Example: Exec vs Sales.
99	Workspace Role	Access level. Example: Viewer, Member.
100	XMLA Endpoint	Advanced model management. Example: Tabular Editor.

Thanks for reading!

Data Cleaning, Data Governance, Data Quality Assurance, Glossary of Data Terms January 28, 2026

Glossary – 100 “Data Quality & Data Validation” terms

Below is a glossary that includes 100 common “Data Quality & Data Validation” terms and phrases in alphabetical order. Enjoy!

Term	Definition & Example
Business Rule	Business-defined constraint on data. Example: Credit limit approval rules.
Check Constraint	SQL rule enforcing condition. Example: Age > 0.
Constraint	Rule enforced at database level. Example: NOT NULL constraint.
Continuous Validation	Ongoing automated validation. Example: Streaming pipelines.
Corrective Control	Fixes identified errors. Example: Data reload.
Data Accuracy	Degree to which data correctly represents reality. Example: Correct customer addresses.
Data Accuracy Rate	Percentage of correct values. Example: 99.5% accurate.
Data Anomaly	Unexpected or suspicious data value. Example: Sudden traffic spike.
Data Bias	Systematic data distortion. Example: Sampling bias.
Data Certification	Marking trusted datasets. Example: Certified gold tables.
Data Cleansing	Correcting or removing invalid data. Example: Fixing malformed phone numbers.
Data Completeness	Presence of all required data elements. Example: No missing customer IDs.
Data Completeness Rate	Percentage of populated fields. Example: 97% filled.
Data Confidence	Trust users have in data. Example: Executive reporting trust.
Data Conformance	Adherence to standards or schemas. Example: ISO country codes.
Data Consistency	Uniformity of data across systems. Example: Same currency code everywhere.
Data Deduplication	Removing duplicate records. Example: Merge customer profiles.
Data Defect	Specific instance of poor quality. Example: Invalid customer record.
Data Drift	Gradual change in data patterns. Example: Customer behavior shifts.
Data Enrichment	Enhancing data with additional attributes. Example: Adding demographic data.
Data Error	Incorrect or invalid data value. Example: Misspelled city name.
Data Exception	Approved rule deviation. Example: Legacy records.
Data Exception Handling	Process for managing violations. Example: Manual review.
Data Freshness	How current the data is. Example: Last updated timestamp.
Data Governance	Framework overseeing data quality. Example: Stewardship model.
Data Imputation	Filling missing values. Example: Replacing null with average.
Data Integrity	Accuracy and consistency over the lifecycle. Example: Foreign key relationships enforced.
Data Issue	Identified quality problem. Example: Missing values.
Data Latency	Delay between event and availability. Example: 2-hour ingestion lag.
Data Lineage	Tracking data flow and transformations. Example: Source to dashboard.
Data Matching	Identifying records referring to same entity. Example: Customer record linkage.
Data Noise	Irrelevant or misleading data. Example: Test records in prod.
Data Observability	Visibility into data health and behavior. Example: Pipeline monitoring.
Data Ownership	Accountability for data quality. Example: Business owner.
Data Precision	Level of detail in data. Example: Decimal places.
Data Profiling	Analyzing data to understand structure and quality. Example: Null percentage analysis.
Data Quality	Measure of how fit data is for its intended use. Example: Accurate sales totals in reports.
Data Quality Alert	Notification of quality issue. Example: Slack alert.
Data Quality Audit	Formal assessment of data quality. Example: Quarterly review.
Data Quality Automation	Automated quality processes. Example: CI/CD checks.
Data Quality Backlog	Tracked list of quality issues. Example: Jira tickets.
Data Quality Benchmark	Comparison standard. Example: Industry averages.
Data Quality Dashboard	Visual view of quality metrics. Example: Completeness trends.
Data Quality Dimension	Category used to measure quality. Example: Accuracy, completeness.
Data Quality Framework	Structured quality approach. Example: DAMA dimensions.
Data Quality Incident	Major quality failure. Example: Incorrect financial report.
Data Quality KPI	Metric tracking quality performance. Example: Duplicate rate.
Data Quality Maturity	Level of quality capability. Example: Reactive vs proactive.
Data Quality Monitoring	Ongoing quality measurement. Example: Daily freshness checks.
Data Quality Ownership Matrix	Mapping quality responsibility. Example: RACI chart.
Data Quality Program	Organization-wide quality initiative. Example: Enterprise DQ strategy.
Data Quality Regression	Reintroduced quality issue. Example: After schema change.
Data Quality Rule Engine	System executing validation rules. Example: Automated checks.
Data Quality Rule Violation	Failure to meet a rule. Example: Negative balance.
Data Quality Score	Numeric representation of data quality. Example: 98% completeness.
Data Quality SLA	Quality expectations agreement. Example: 99% accuracy target.
Data Quality SLA Breach	Failure to meet quality targets. Example: Accuracy below SLA.
Data Quality Trend	Quality performance over time. Example: Monthly improvement.
Data Reconciliation	Comparing datasets for consistency. Example: Finance system vs warehouse.
Data Reliability	Consistent data performance over time. Example: Stable metrics.
Data Remediation	Fixing data quality issues. Example: Reprocessing failed loads.
Data Sampling	Checking subset of data. Example: Random record review.
Data Standardization	Transforming data into a common format. Example: Converting dates to ISO format.
Data Steward	Role responsible for data quality. Example: Customer data steward.
Data Threshold	Acceptable quality limit. Example: ≤ 1% nulls.
Data Timeliness	Data availability within required timeframes. Example: Daily data refresh by 6 AM.
Data Trust Score	Composite measure of reliability. Example: Internal trust index.
Data Uniqueness	No unintended duplicates exist. Example: One row per customer.
Data Validation	Process of checking data against rules. Example: Rejecting invalid dates.
Data Validation Pipeline	Automated validation process. Example: Ingestion checks.
Data Validity	Data conforms to defined formats and rules. Example: Email follows standard pattern.
Data Verification	Confirming data accuracy. Example: Source system comparison.
Detective Control	Finds errors after entry. Example: Quality audits.
Domain Validation	Restricting values to a set. Example: Status = Active/Inactive.
Downstream Validation	Validating analytical outputs. Example: Dashboard totals.
Duplicate Detection	Identifying duplicate records. Example: Same email address twice.
Error Rate	Proportion of invalid records. Example: 2% failures.
Foreign Key	Reference to another table. Example: Order → Customer.
Format Validation	Ensuring correct data format. Example: YYYY-MM-DD dates.
Golden Dataset	Highest-quality dataset version. Example: Curated finance data.
Hard Validation	Blocking invalid data. Example: Reject invalid IDs.
Null Check	Ensuring required fields are populated. Example: Order ID not null.
Outlier Detection	Identifying abnormal values. Example: Negative revenue amounts.
Pattern Matching	Validating via regex patterns. Example: Postal code validation.
Post-Load Validation	Checks after data load. Example: Row count comparisons.
Pre-Load Validation	Checks before data ingestion. Example: File schema validation.
Preventive Control	Stops errors before entry. Example: Input validation.
Primary Key	Unique record identifier. Example: CustomerID.
Quality Gate	Mandatory validation checkpoint. Example: Before publishing data.
Range Validation	Checking values fall within limits. Example: Age between 0 and 120.
Referential Integrity	Valid relationships between tables. Example: Orders reference valid customers.
Root Cause Analysis	Identifying source of data issues. Example: ETL failure investigation.
Schema Validation	Checking data structure against schema. Example: Column data types.
Soft Validation	Warning without rejecting data. Example: Flag unusual values.
Source System Validation	Checking upstream data. Example: CRM record checks.
Statistical Validation	Using statistics to validate data. Example: Distribution checks.
Trusted Dataset	Data approved for consumption. Example: Executive KPIs.
Validation Coverage	Proportion of data checked. Example: 100% of critical fields.
Validation Rule	Condition data must satisfy. Example: Quantity must be ≥ 0.
Validation Threshold	Limit triggering failure. Example: >5% nulls.

AI, Artificial Intelligence (AI), Data Education & Training, Glossary of Data Terms January 10, 2026

Glossary – 100 “AI” Terms

Below is a glossary that includes 100 common “AI (Artificial Intelligence)” terms and phrases in alphabetical order. Enjoy!

Term	Definition & Example
Accuracy	Percentage of correct predictions. Example: 92% accuracy.
Agent	AI entity performing tasks autonomously. Example: Task-planning agent.
AI Alignment	Ensuring AI goals match human values. Example: Safe AI systems.
AI Bias	Systematic unfairness in AI outcomes. Example: Biased hiring models.
Algorithm	A set of rules used to train models. Example: Decision tree algorithm.
Artificial General Intelligence (AGI)	Hypothetical AI with human-level intelligence. Example: Broad reasoning across tasks.
Artificial Intelligence (AI)	Systems that perform tasks requiring human-like intelligence. Example: Chatbots answering questions.
Artificial Neural Network (ANN)	A network of interconnected artificial neurons. Example: Credit scoring models.
Attention Mechanism	Focuses model on relevant input parts. Example: Language translation.
AUC	Area under ROC curve. Example: Model comparison.
AutoML	Automated model selection and tuning. Example: Auto-generated models.
Autonomous System	AI operating with minimal human input. Example: Self-driving cars.
Backpropagation	Method to update neural network weights. Example: Deep learning training.
Batch	Subset of data processed at once. Example: Batch size of 32.
Batch Inference	Predictions made in bulk. Example: Nightly scoring jobs.
Bias (Model Bias)	Error from oversimplified assumptions. Example: Linear model on non-linear data.
Bias–Variance Tradeoff	Balance between bias and variance. Example: Choosing model complexity.
Black Box Model	Model with opaque internal logic. Example: Deep neural networks.
Classification	Predicting categorical outcomes. Example: Email spam classification.
Clustering	Grouping similar data points. Example: Customer segmentation.
Computer Vision	AI for interpreting images and video. Example: Facial recognition.
Concept Drift	Changes in underlying relationships. Example: Fraud patterns evolving.
Confusion Matrix	Table evaluating classification results. Example: True positives vs false positives.
Data Augmentation	Expanding data via transformations. Example: Image rotation.
Data Drift	Changes in input data distribution. Example: New user demographics.
Data Leakage	Using future information in training. Example: Including test labels.
Decision Tree	Tree-based decision model. Example: Loan approval logic.
Deep Learning	ML using multi-layer neural networks. Example: Image recognition.
Dimensionality Reduction	Reducing number of features. Example: PCA for visualization.
Edge AI	AI running on local devices. Example: Smart cameras.
Embedding	Numerical representation of data. Example: Word embeddings.
Ensemble Model	Combining multiple models. Example: Random forest.
Epoch	One full pass through training data. Example: 50 training epochs.
Ethics in AI	Moral considerations in AI use. Example: Avoiding bias.
Explainable AI (XAI)	Making AI decisions understandable. Example: Feature importance charts.
F1 Score	Balance of precision and recall. Example: Imbalanced datasets.
Fairness	Equitable AI outcomes across groups. Example: Equal approval rates.
Feature	An input variable for a model. Example: Customer age.
Feature Engineering	Creating or transforming features to improve models. Example: Calculating customer tenure.
Federated Learning	Training models across decentralized data. Example: Mobile keyboard predictions.
Few-Shot Learning	Learning from few examples. Example: Custom classification with few samples.
Fine-Tuning	Further training a pre-trained model. Example: Custom chatbot training.
Generalization	Model’s ability to perform on new data. Example: Accurate predictions on unseen data.
Generative AI	AI that creates new content. Example: Text or image generation.
Gradient Boosting	Sequentially improving weak models. Example: XGBoost.
Gradient Descent	Optimization technique adjusting weights iteratively. Example: Training neural networks.
Hallucination	Model generates incorrect information. Example: False factual claims.
Hyperparameter	Configuration set before training. Example: Learning rate.
Inference	Using a trained model to predict. Example: Real-time recommendations.
K-Means	Clustering algorithm. Example: Market segmentation.
Knowledge Graph	Graph-based representation of knowledge. Example: Search engines.
Label	The correct output for supervised learning. Example: “Fraud” or “Not Fraud”.
Large Language Model (LLM)	AI trained on massive text corpora. Example: ChatGPT.
Loss Function	Measures model error during training. Example: Mean squared error.
Machine Learning (ML)	AI that learns patterns from data without explicit programming. Example: Spam email detection.
MLOps	Practices for managing ML lifecycle. Example: CI/CD for models.
Model	A trained mathematical representation of patterns. Example: Logistic regression model.
Model Deployment	Making a model available for use. Example: API-based predictions.
Model Drift	Model performance degradation over time. Example: Changing customer behavior.
Model Interpretability	Ability to understand model behavior. Example: Decision tree visualization.
Model Versioning	Tracking model changes. Example: v1 vs v2 models.
Monitoring	Tracking model performance in production. Example: Accuracy alerts.
Multimodal AI	AI handling multiple data types. Example: Text + image models.
Naive Bayes	Probabilistic classification algorithm. Example: Spam filtering.
Natural Language Processing (NLP)	AI for understanding human language. Example: Sentiment analysis.
Neural Network	Model inspired by the human brain’s structure. Example: Handwritten digit recognition.
Optimization	Process of minimizing loss. Example: Gradient descent.
Overfitting	Model learns noise instead of patterns. Example: Perfect training accuracy, poor test accuracy.
Pipeline	Automated ML workflow. Example: Training-to-deployment flow.
Precision	Correct positive predictions rate. Example: Fraud detection precision.
Pretrained Model	Model trained on general data. Example: GPT models.
Principal Component Analysis (PCA)	Technique for dimensionality reduction. Example: Compressing high-dimensional data.
Privacy	Protecting personal data. Example: Anonymizing training data.
Prompt	Input instruction for generative models. Example: “Summarize this text.”
Prompt Engineering	Crafting effective prompts. Example: Improving LLM responses.
Random Forest	Ensemble of decision trees. Example: Classification tasks.
Real-Time Inference	Immediate predictions on live data. Example: Fraud detection.
Recall	Ability to find all positives. Example: Cancer detection.
Regression	Predicting numeric values. Example: Sales forecasting.
Reinforcement Learning	Learning through rewards and penalties. Example: Game-playing AI.
Reproducibility	Ability to recreate results. Example: Fixed random seeds.
Robotics	AI applied to physical machines. Example: Warehouse robots.
ROC Curve	Performance visualization for classifiers. Example: Threshold analysis.
Semi-Supervised Learning	Mix of labeled and unlabeled data. Example: Image classification with limited labels.
Speech Recognition	Converting speech to text. Example: Voice assistants.
Supervised Learning	Learning using labeled data. Example: Predicting house prices from known values.
Support Vector Machine (SVM)	Algorithm separating data with margins. Example: Text classification.
Synthetic Data	Artificially generated data. Example: Privacy-safe training.
Test Data	Data used to evaluate model performance. Example: Held-out validation dataset.
Threshold	Cutoff for classification decisions. Example: Probability > 0.7.
Token	Smallest unit of text processed by models. Example: Words or subwords.
Training Data	Data used to teach a model. Example: Historical sales records.
Transfer Learning	Reusing knowledge from another task. Example: Image model reused for medical scans.
Transformer	Neural architecture for sequence data. Example: Language translation models.
Underfitting	Model too simple to capture patterns. Example: High error on all datasets.
Unsupervised Learning	Learning from unlabeled data. Example: Customer clustering.
Validation Data	Data used to tune model parameters. Example: Hyperparameter selection.
Variance	Error from sensitivity to data fluctuations. Example: Highly complex model.
XGBoost	Optimized gradient boosting algorithm. Example: Kaggle competitions.
Zero-Shot Learning	Performing tasks without examples. Example: Classifying unseen labels.

Please share your suggestions for any terms that should be added.

Data Cleaning, Data Development, Data Engineering, Data Integration, Data Security, Data Strategy, Glossary of Data Terms January 10, 2026January 19, 2026

Glossary – 100 “Data Engineering” Terms

Below is a glossary that includes 100 common “Data Engineering” terms and phrases in alphabetical order. Enjoy!

Term	Definition & Example
Access Control	Managing who can access data. Example: Role-based permissions.
At-Least-Once Processing	Data may be processed more than once. Example: Duplicate-safe pipelines.
At-Most-Once Processing	Data processed zero or one time. Example: No retries on failure.
Backfill	Processing historical data. Example: Reloading last year’s data.
Batch Processing	Processing data in scheduled chunks. Example: Daily sales aggregation.
Blue-Green Deployment	Deployment strategy minimizing downtime. Example: Switching pipeline versions.
Canary Release	Gradual rollout to detect issues. Example: New pipeline tested on 5% of data.
Change Data Capture (CDC)	Capturing database changes. Example: Streaming updates from OLTP DB.
Checkpointing	Saving progress during processing. Example: Spark streaming checkpoints.
Cloud Storage	Scalable remote data storage. Example: Azure Data Lake Storage.
Cold Storage	Low-cost storage for infrequent access. Example: Archived logs.
Columnar Storage	Data stored by column instead of row. Example: Parquet files.
Compression	Reducing data size. Example: Gzip-compressed files.
Compute Engine	System performing data processing. Example: Spark cluster.
Consumption Layer	Data prepared for analytics. Example: Gold layer.
Cost Optimization	Reducing infrastructure costs. Example: Query optimization.
Curated Layer	Cleaned and transformed data. Example: Silver layer.
DAG (Directed Acyclic Graph)	Workflow structure with dependencies. Example: Airflow pipeline.
Data Catalog	Searchable inventory of data assets. Example: Azure Purview.
Data Contract	Agreement defining data structure and expectations. Example: Producer guarantees column names and types.
Data Engineering	The practice of designing, building, and maintaining data systems. Example: Creating pipelines that feed analytics dashboards.
Data Governance	Policies for data management and usage. Example: Access control rules.
Data Ingestion	Collecting data from source systems. Example: Ingesting API data hourly.
Data Lake	Centralized storage for raw data. Example: S3-based data lake.
Data Latency	Time delay in data availability. Example: 5-minute pipeline delay.
Data Lineage	Tracking data flow from source to output. Example: Source-to-dashboard trace.
Data Mart	Subset of warehouse for specific use. Example: Finance data mart.
Data Masking	Obscuring sensitive data. Example: Masked credit card numbers.
Data Mesh	Domain-oriented decentralized data ownership. Example: Teams own their data products.
Data Modeling	Designing data structures for usage. Example: Star schema design.
Data Observability	Monitoring data health and pipelines. Example: Freshness alerts.
Data Partition Pruning	Skipping irrelevant partitions. Example: Querying one date only.
Data Pipeline	An automated process that moves and transforms data. Example: Nightly ETL job from CRM to warehouse.
Data Platform	Integrated set of data tools. Example: End-to-end analytics stack.
Data Product	A dataset treated as a product. Example: Curated customer table.
Data Profiling	Analyzing data characteristics. Example: Value distributions.
Data Quality	Accuracy, completeness, and reliability of data. Example: No duplicate records.
Data Replay	Reprocessing historical events. Example: Rebuilding aggregates from logs.
Data Retention	Rules for data lifespan. Example: Delete logs after 1 year.
Data Security	Protecting data from unauthorized access. Example: Encryption at rest.
Data Serialization	Converting data for storage or transport. Example: Avro encoding.
Data Sink	The destination where data is stored. Example: Data warehouse.
Data Source	The origin of data. Example: ERP system, SaaS application.
Data Validation	Ensuring data meets expectations. Example: Null checks.
Data Versioning	Tracking dataset changes. Example: Snapshot tables.
Data Warehouse	Optimized storage for analytics queries. Example: Azure Synapse Analytics.
Dead Letter Queue (DLQ)	Storage for failed records. Example: Invalid messages routed for review.
Dimension Table	Table storing descriptive attributes. Example: Customer details.
ELT	Extract, Load, Transform approach. Example: Transforming data inside Snowflake.
ETL	Extract, Transform, Load process. Example: Cleaning data before loading into a database.
Event Time	Timestamp when event occurred. Example: User click time.
Event-Driven Architecture	Systems reacting to events in real time. Example: Trigger pipeline on file arrival.
Exactly-Once Processing	Ensuring data is processed only once. Example: Preventing duplicate events.
Fact Table	Table storing quantitative measures. Example: Order transactions.
Fault Tolerance	System resilience to failures. Example: Node failure recovery.
File Format	How data is stored on disk. Example: Parquet, CSV.
Foreign Key	Field linking tables together. Example: CustomerID in orders table.
Full Load	Reloading all data. Example: Initial table population.
High Availability	System uptime and reliability. Example: Multi-zone deployment.
Hot Storage	High-performance storage for frequent access. Example: Real-time tables.
Idempotency	Ability to rerun pipelines safely. Example: Reprocessing without duplicates.
Incremental Load	Loading only new or changed data. Example: CDC-based ingestion.
Indexing	Creating structures to speed queries. Example: Index on order date.
Infrastructure as Code (IaC)	Managing infrastructure via code. Example: Terraform scripts.
Lakehouse	Hybrid of data lake and warehouse. Example: Databricks Lakehouse.
Late-Arriving Data	Data that arrives after expected time. Example: Delayed event logs.
Logging	Recording system events. Example: Job execution logs.
Message Queue	Buffer for asynchronous data transfer. Example: Kafka topic for events.
Metadata	Data about data. Example: Table definitions and lineage.
Metrics	Quantitative indicators of performance. Example: Rows processed per run.
Orchestration	Coordinating pipeline execution. Example: DAG scheduling.
Partitioning	Dividing data for performance. Example: Partitioning by date.
Personally Identifiable Information (PII)	Data identifying individuals. Example: Email addresses.
Pipeline Monitoring	Tracking pipeline execution status. Example: Failure notifications.
Primary Key	Unique identifier for a record. Example: CustomerID.
Processing Time	Timestamp when data is processed. Example: Ingestion time.
Query Optimization	Improving query efficiency. Example: Predicate pushdown.
Raw Layer	Storage of unprocessed data. Example: Bronze layer.
Real-Time Data	Data available with minimal latency. Example: Live dashboard updates.
Retry Logic	Automatic reruns on failure. Example: Retry failed ingestion job.
Scalability	Ability to handle growing workloads. Example: Auto-scaling clusters.
Scheduler	Tool managing execution timing. Example: Cron, Airflow.
Schema	The structure of a dataset. Example: Table columns and data types.
Schema Evolution	Handling schema changes over time. Example: Adding new columns safely.
Secrets Management	Secure handling of credentials. Example: Key Vault for passwords.
Semi-Structured Data	Data with flexible schema. Example: JSON, Parquet.
Serverless	Infrastructure managed by provider. Example: Serverless SQL pools.
Serving Layer	Layer optimized for consumption. Example: BI-ready tables.
Sharding	Distributing data across nodes. Example: User data split across servers.
Snowflake Schema	Normalized version of star schema. Example: Product broken into sub-dimensions.
Star Schema	Fact table surrounded by dimensions. Example: Sales fact with date dimension.
Stream Processing	Processing data in real time. Example: Clickstream event processing.
Structured Data	Data with a fixed schema. Example: SQL tables.
Technical Debt	Long-term cost of quick fixes. Example: Hardcoded transformations.
Throughput	Amount of data processed per unit time. Example: Records per second.
Transformation Layer	Layer where business logic is applied. Example: dbt models.
Unstructured Data	Data without a predefined structure. Example: Images, PDFs.
Watermark	Marker for processed data. Example: Last processed timestamp.
Windowing	Grouping stream data by time windows. Example: 5-minute aggregations.
Workload Isolation	Separating workloads to avoid contention. Example: Dedicated compute pools.

Please share your suggestions for any terms that should be added.

Data Analysis, Glossary of Data Terms January 10, 2026

Glossary – 100 “Data Analysis” Terms

Below is a glossary that includes 100 common “Data Analysis” terms and phrases in alphabetical order. Enjoy!

Term	Definition & Example
A/B Test	Comparing two variations to measure impact. Example: Two webpage layouts.
Actionable Insight	An insight that leads to a clear decision. Example: Improve onboarding experience.
Ad Hoc Analysis	One-off analysis for a specific question. Example: Investigating a sudden sales dip.
Aggregation	Summarizing data using functions like sum or average. Example: Total revenue by region.
Analytical Maturity	Organization’s capability to use data effectively. Example: Moving from descriptive to predictive analytics.
Bar Chart	A chart comparing categories. Example: Sales by region.
Baseline	A reference point for comparison. Example: Last year’s sales used as baseline.
Benchmark	A standard used to compare performance. Example: Industry average churn rate.
Bias	Systematic error in data or analysis. Example: Surveying only active users.
Business Question	A decision-focused question data aims to answer. Example: Which products drive profit?
Causation	A relationship where one variable causes another. Example: Price cuts causing sales growth.
Confidence Interval	Range likely containing a true value. Example: 95% CI for average sales.
Correlation	A statistical relationship between variables. Example: Sales and marketing spend.
Cumulative Total	A running total over time. Example: Year-to-date revenue.
Dashboard	A visual collection of key metrics. Example: Executive sales dashboard.
Data	Raw facts or measurements collected for analysis. Example: Sales transactions, sensor readings, survey responses.
Data Anomaly	Unexpected or unusual data pattern. Example: Sudden spike in user signups.
Data Cleaning	Correcting or removing inaccurate data. Example: Fixing misspelled country names.
Data Consistency	Uniform representation across datasets. Example: Same currency used everywhere.
Data Governance	Policies ensuring data quality, security, and usage. Example: Defined data ownership roles.
Data Imputation	Replacing missing values with estimated ones. Example: Filling null ages with the median.
Data Lineage	Tracking data origin and transformations. Example: Tracing metrics back to source systems.
Data Literacy	Ability to read, understand, and use data. Example: Interpreting charts correctly.
Data Model	The structure defining how data tables relate. Example: Star schema.
Data Pipeline	Automated flow of data from source to destination. Example: Daily ingestion job.
Data Profiling	Analyzing data characteristics. Example: Checking null percentages.
Data Quality	The accuracy, completeness, and reliability of data. Example: Valid dates and consistent formats.
Data Refresh	Updating data with the latest values. Example: Nightly refresh.
Data Refresh Frequency	How often data is updated. Example: Hourly vs. daily refresh.
Data Skewness	Degree of asymmetry in data distribution. Example: Income data skewed to the right.
Data Source	The origin of data. Example: SQL database, API.
Data Storytelling	Communicating insights using narrative and visuals. Example: Executive-ready presentation.
Data Transformation	Modifying data to improve usability or consistency. Example: Converting text dates to date data types.
Data Validation	Ensuring data meets rules and expectations. Example: No negative quantities.
Data Wrangling	Transforming raw data into a usable format. Example: Reshaping columns for analysis.
Dataset	A structured collection of related data. Example: A table of customer orders with dates, amounts, and regions.
Derived Metric	A metric calculated from other metrics. Example: Profit margin = Profit / Revenue.
Descriptive Analytics	Analysis that explains what happened. Example: Last quarter’s sales summary.
Diagnostic Analytics	Analysis that explains why something happened. Example: Revenue drop due to fewer customers.
Dice	Filtering data by multiple dimensions. Example: Sales for 2025 in the West region.
Dimension	A descriptive attribute used to slice data. Example: Date, region, product.
Dimension Table	A table containing descriptive attributes. Example: Product details.
Dimensionality	Number of features or variables in data. Example: High-dimensional customer data.
Distribution	How values are spread across a range. Example: Income distribution.
Drill Down	Navigating from summary to detail. Example: Yearly sales → monthly sales.
Drill Through	Jumping to a detailed view for a specific value. Example: Clicking a region to see store data.
ELT	Extract, Load, Transform approach. Example: Transforming data inside a warehouse.
ETL	Extract, Transform, Load process. Example: Loading CRM data into a warehouse.
Exploratory Data Analysis (EDA)	Initial investigation to understand data. Example: Visualizing distributions.
Fact Table	A table containing quantitative data. Example: Sales transactions.
Feature	An individual measurable property used in analysis. Example: Customer age used in churn analysis.
Feature Engineering	Creating new features from existing data. Example: Calculating customer tenure from signup date.
Filtering	Limiting data to a subset of interest. Example: Only orders from 2025.
Granularity	The level of detail in the data. Example: Daily sales vs. monthly sales.
Grouping	Organizing data into categories before aggregation. Example: Sales grouped by product category.
Histogram	A chart showing data distribution. Example: Frequency of order sizes.
Hypothesis	A testable assumption. Example: Discounts increase sales.
Incremental Load	Loading only new or changed data. Example: Yesterday’s transactions.
Insight	A meaningful finding that informs action. Example: High churn among new users.
KPI (Key Performance Indicator)	A critical metric tied to business objectives. Example: Monthly churn rate.
Kurtosis	Measure of how heavy the tails of a distribution are. Example: Detecting extreme outliers.
Latency	Delay between data generation and availability. Example: Real-time vs. daily data.
Line Chart	A chart showing trends over time. Example: Monthly revenue trend.
Mean	The arithmetic average. Example: Average order value.
Measure	A calculated numeric value, often aggregated. Example: SUM(Sales).
Median	The middle value in ordered data. Example: Median household income.
Metric	A quantifiable measure used to track performance. Example: Total sales, average order value.
Missing Values	Data points that are absent or null. Example: Blank customer age values.
Mode	The most frequent value. Example: Most common product category.
Multivariate Analysis	Analyzing multiple variables simultaneously. Example: Studying price, demand, and seasonality.
Normalization	Scaling data to a common range. Example: Normalizing values between 0 and 1.
Observation	A single record or row in a dataset. Example: One customer’s purchase history.
Outlier	A data point significantly different from others. Example: An unusually large transaction amount.
Percentile	Value below which a percentage of data falls. Example: 90th percentile response time.
Population	The full set of interest. Example: All customers.
Predictive Analytics	Analysis that forecasts future outcomes. Example: Predicting next month’s demand.
Prescriptive Analytics	Analysis that suggests actions. Example: Recommending price changes.
Quartile	Values dividing data into four parts. Example: Q1, Q2, Q3.
Report	A structured presentation of analysis results. Example: Monthly performance report.
Reproducibility	Ability to recreate analysis results consistently. Example: Using versioned datasets.
Rolling Average	An average calculated over a moving window. Example: 7-day rolling average of sales.
Root Cause Analysis	Identifying the underlying cause of an issue. Example: Revenue loss due to inventory shortages.
Sample	A subset of a population. Example: Survey respondents.
Sampling Bias	Bias introduced by non-random samples. Example: Feedback collected only from power users.
Scatter Plot	A chart showing relationships between two variables. Example: Ad spend vs. revenue.
Seasonality	Repeating patterns tied to time cycles. Example: Holiday sales spikes.
Semi-Structured Data	Data with flexible structure. Example: JSON files.
Sensitivity Analysis	Evaluating how outcomes change with inputs. Example: Impact of price changes on profit.
Slice	Filtering data by a single dimension. Example: Sales for 2025 only.
Snapshot	Data captured at a specific point in time. Example: End-of-month balances.
Snowflake Schema	A normalized version of a star schema. Example: Product broken into sub-tables.
Standard Deviation	Average distance from the mean. Example: Consistency of sales performance.
Standardization	Rescaling data to have mean 0 and standard deviation 1. Example: Preparing data for regression analysis.
Star Schema	A data model with facts surrounded by dimensions. Example: Sales fact with product and date dimensions.
Structured Data	Data with a fixed schema. Example: Relational tables.
Time Series	Data indexed by time. Example: Daily stock prices.
Trend	A general direction in data over time. Example: Increasing monthly revenue.
Unstructured Data	Data without a predefined schema. Example: Emails, images.
Variable	A characteristic or attribute that can take different values. Example: Age, revenue, product category.
Variance	Measure of data spread. Example: Variance in delivery times.