Category: Glossary of Data Terms

Microsoft Fabric & Power BI Glossary (100 Terms)

Below is a table of 100 “Microsoft Fabric and Power BI” terms and definitions, sorted alphabetically (except for the first two entries). You can use it as a reference to get a quick idea of what something is or means, and also as a way to identify topics to research.

#TermDefinition & Example
1Microsoft FabricUnified analytics platform combining data engineering, warehousing, science, and BI. Example: Building pipelines and dashboards in one workspace.
2Power BIMicrosoft’s business intelligence visualization tool. Example: Sales dashboards.
3AggregationsPre-summarized tables. Example: Faster queries.
4Anomaly DetectionHighlights unusual values. Example: Sales spike.
5AppPackaged Power BI content. Example: Sales app.
6BookmarksSaved report states. Example: Guided navigation.
Article on configuring bookmarks
7Bronze LayerRaw data. Example: Ingested CSVs.
8Calculated ColumnStatic DAX column. Example: Profit category.
9Calculation GroupReusable DAX logic. Example: Time intelligence.
10Capacity Metrics AppPerformance insights. Example: CU spikes.
11Capacity Unit (CU)Measure of Fabric compute. Example: Performance scaling.
12Certified DatasetOfficial data source. Example: Finance semantic model.
13Composite ModelMix of Import + DirectQuery. Example: Hybrid datasets.
Article about designing and building composite models
14Composite TableMixed storage table. Example: Hybrid dimensions.
15Custom VisualMarketplace visuals. Example: Sankey diagram.
16DashboardKPI overview page. Example: Executive metrics.
17Data CatalogDiscover datasets. Example: Search semantic models.
18Data LineageShows data flow. Example: Source → report.
19Data MartSelf-service warehouse. Example: Analyst-owned SQL.
20Data Model SizeMemory footprint. Example: Import limits.
21Data PipelineOrchestrates data movement. Example: Copy from S3.
22Data Source CredentialsAuthentication info. Example: SQL login.
23Data WarehouseStructured analytical database. Example: T-SQL querying sales facts.
24Dataflow Gen2Fabric ETL artifact. Example: Cloud ingestion pipelines.
25DAXFormula language for measures. Example: Total Sales calculation.
26Delta LakeTransactional file format. Example: ACID parquet.
27Deployment PipelineDev/Test/Prod promotion. Example: CI/CD.
Article on creating and configuring deployment pipelines
28Dimension TableDescriptive attributes. Example: Products.
Article that describes fact and dimension tables and how to create them
29Direct LakeQueries OneLake directly without import. Example: Near real-time reporting.
30DirectQueryQueries source system live. Example: SQL Server reporting.
Article on choosing between DirectQuery and Import in Power BI
31Drill DownNavigate deeper. Example: Year → Month.
Article on Power BI drilldown vs drill-through
32Drill ThroughJump to detail page. Example: Customer profile.
Article on Power BI drilldown vs drill-through
33Embedded AnalyticsPower BI in apps. Example: Web portals.
34EndorsementCertified or promoted datasets. Example: Trusted models.
35End-to-End AnalyticsFull Fabric workflow. Example: Ingest → model → report.
36Fabric CapacityCompute resources. Example: F64 SKU.
37Fabric CI/CDAutomated deployments. Example: Pipeline promotion.
38Fabric Data ActivatorEvent-based alerts. Example: Trigger email on anomaly.
39Fabric ItemAny asset. Example: Notebook, warehouse.
40Fabric Monitoring HubCapacity tracking. Example: CU consumption.
41Fabric WorkspaceContainer for Fabric assets. Example: Lakehouse + reports together.
42Fact TableStores measurable events. Example: Orders.
Article that describes fact and dimension tables and how to create them
43GatewayConnects on-prem data. Example: Local SQL Server.
44Git IntegrationSource control. Example: Azure DevOps.
45GoalsPerformance targets. Example: Revenue quota.
46Gold LayerBusiness-ready data. Example: KPI models.
47Import ModeData loaded into Power BI memory. Example: Daily refresh model.
48Incremental RefreshOnly refresh recent data. Example: Last 30 days.
49LakehouseCombines data lake + warehouse features. Example: Spark + SQL analytics.
50Lineage ViewDependency visualization. Example: Pipeline flow.
51M LanguageLanguage behind Power Query. Example: Transform steps.
52MeasureDynamic DAX calculation. Example: YTD Revenue.
53Medallion ArchitectureBronze/Silver/Gold layers. Example: Curated analytics.
54Metrics AppGoal tracking. Example: OKRs.
55Microsoft PurviewGovernance integration. Example: Catalog assets.
56MirroringReplicates operational DBs. Example: Azure SQL sync.
57Model Refresh FailureUpdate error. Example: Credential expired.
58Model ViewRelationship design canvas. Example: Schema building.
59NotebookSpark coding environment. Example: PySpark transformations.
60Object-Level SecurityHides tables/columns. Example: HR salary masking.
61OneLakeFabric’s centralized data lake. Example: Shared parquet storage.
62Page-Level FilterApplies to page. Example: Region filter.
63Paginated ReportPixel-perfect reporting. Example: Invoice PDFs.
64Performance AnalyzerMeasures visual speed. Example: DAX tuning.
65PerspectiveUser-specific model view. Example: Finance vs Sales.
66Power BI DesktopAuthoring tool. Example: Local report creation.
67Power BI ServiceCloud hosting platform. Example: app.powerbi.com.
68Power QueryETL engine in Power BI/Fabric. Example: Cleaning CSV files.
69Preview FeatureEarly-access capability. Example: New visuals.
70PySparkPython Spark API. Example: Transform big data.
71Query FoldingPushes logic to source. Example: SQL filtering.
72RefreshUpdating model data. Example: Nightly refresh.
73RelationshipLink between tables. Example: CustomerID join.
74ReportCollection of visuals. Example: Finance report.
75Report-Level FilterApplies everywhere. Example: Fiscal year.
76REST APIAutomates Power BI. Example: Dataset refresh trigger.
77Row-Level Security (RLS)Restricts data by user. Example: Region access.
Articles on implementing RLS roles and configuring RLS group membership
78Semantic ModelLogical layer for reporting (formerly dataset). Example: Measures and relationships.
79Sensitivity LabelData classification. Example: Confidential.
Article on applying sensitivity labels
80ShareGrant report access. Example: Email link.
81ShortcutVirtual data reference. Example: External ADLS folder.
82Silver LayerCleaned data. Example: Standardized tables.
83SlicerInteractive filter. Example: Year selector.
84SparkDistributed compute engine. Example: Large joins.
85SQL Analytics EndpointT-SQL interface to Lakehouse. Example: BI queries.
86Star SchemaFact table with dimensions. Example: Sales model.
87SubscribeScheduled email snapshots. Example: Weekly KPIs.
88Tabular EditorExternal modeling tool. Example: Bulk measures.
89Tenant SettingAdmin control. Example: Export permissions.
90ThemesStyling reports. Example: Brand colors.
91TooltipHover info. Example: Exact sales value.
Article on creating tooltips in Power BI
92T-SQLSQL dialect in Fabric. Example: SELECT statements.
93Usage MetricsReport adoption stats. Example: View counts.
94VisualChart or table. Example: Bar chart.
95Visual InteractionCross-filtering visuals. Example: Click bar filters table.
96Visual-Level FilterApplies to one visual. Example: Top 10 only.
97Warehouse EndpointSQL access to Lakehouse. Example: SSMS connection.
98Workspace App AudienceTargeted content. Example: Exec vs Sales.
99Workspace RoleAccess level. Example: Viewer, Member.
100XMLA EndpointAdvanced model management. Example: Tabular Editor.

Thanks for reading!

Glossary – 100 “Data Quality & Data Validation” terms

Below is a glossary that includes 100 common “Data Quality & Data Validation” terms and phrases in alphabetical order. Enjoy!

TermDefinition & Example
 Business RuleBusiness-defined constraint on data. Example: Credit limit approval rules.
 Check ConstraintSQL rule enforcing condition. Example: Age > 0.
 ConstraintRule enforced at database level. Example: NOT NULL constraint.
 Continuous ValidationOngoing automated validation. Example: Streaming pipelines.
 Corrective ControlFixes identified errors. Example: Data reload.
 Data AccuracyDegree to which data correctly represents reality. Example: Correct customer addresses.
 Data Accuracy RatePercentage of correct values. Example: 99.5% accurate.
 Data AnomalyUnexpected or suspicious data value. Example: Sudden traffic spike.
 Data BiasSystematic data distortion. Example: Sampling bias.
 Data CertificationMarking trusted datasets. Example: Certified gold tables.
 Data CleansingCorrecting or removing invalid data. Example: Fixing malformed phone numbers.
 Data CompletenessPresence of all required data elements. Example: No missing customer IDs.
 Data Completeness RatePercentage of populated fields. Example: 97% filled.
 Data ConfidenceTrust users have in data. Example: Executive reporting trust.
 Data ConformanceAdherence to standards or schemas. Example: ISO country codes.
 Data ConsistencyUniformity of data across systems. Example: Same currency code everywhere.
 Data DeduplicationRemoving duplicate records. Example: Merge customer profiles.
 Data DefectSpecific instance of poor quality. Example: Invalid customer record.
 Data DriftGradual change in data patterns. Example: Customer behavior shifts.
 Data EnrichmentEnhancing data with additional attributes. Example: Adding demographic data.
 Data ErrorIncorrect or invalid data value. Example: Misspelled city name.
 Data ExceptionApproved rule deviation. Example: Legacy records.
 Data Exception HandlingProcess for managing violations. Example: Manual review.
 Data FreshnessHow current the data is. Example: Last updated timestamp.
 Data GovernanceFramework overseeing data quality. Example: Stewardship model.
 Data ImputationFilling missing values. Example: Replacing null with average.
 Data IntegrityAccuracy and consistency over the lifecycle. Example: Foreign key relationships enforced.
 Data IssueIdentified quality problem. Example: Missing values.
 Data LatencyDelay between event and availability. Example: 2-hour ingestion lag.
 Data LineageTracking data flow and transformations. Example: Source to dashboard.
 Data MatchingIdentifying records referring to same entity. Example: Customer record linkage.
 Data NoiseIrrelevant or misleading data. Example: Test records in prod.
 Data ObservabilityVisibility into data health and behavior. Example: Pipeline monitoring.
 Data OwnershipAccountability for data quality. Example: Business owner.
 Data PrecisionLevel of detail in data. Example: Decimal places.
 Data ProfilingAnalyzing data to understand structure and quality. Example: Null percentage analysis.
 Data QualityMeasure of how fit data is for its intended use. Example: Accurate sales totals in reports.
 Data Quality AlertNotification of quality issue. Example: Slack alert.
 Data Quality AuditFormal assessment of data quality. Example: Quarterly review.
 Data Quality AutomationAutomated quality processes. Example: CI/CD checks.
 Data Quality BacklogTracked list of quality issues. Example: Jira tickets.
 Data Quality BenchmarkComparison standard. Example: Industry averages.
 Data Quality DashboardVisual view of quality metrics. Example: Completeness trends.
 Data Quality DimensionCategory used to measure quality. Example: Accuracy, completeness.
 Data Quality FrameworkStructured quality approach. Example: DAMA dimensions.
 Data Quality IncidentMajor quality failure. Example: Incorrect financial report.
 Data Quality KPIMetric tracking quality performance. Example: Duplicate rate.
 Data Quality MaturityLevel of quality capability. Example: Reactive vs proactive.
 Data Quality MonitoringOngoing quality measurement. Example: Daily freshness checks.
 Data Quality Ownership MatrixMapping quality responsibility. Example: RACI chart.
 Data Quality ProgramOrganization-wide quality initiative. Example: Enterprise DQ strategy.
 Data Quality RegressionReintroduced quality issue. Example: After schema change.
 Data Quality Rule EngineSystem executing validation rules. Example: Automated checks.
 Data Quality Rule ViolationFailure to meet a rule. Example: Negative balance.
 Data Quality ScoreNumeric representation of data quality. Example: 98% completeness.
 Data Quality SLAQuality expectations agreement. Example: 99% accuracy target.
 Data Quality SLA BreachFailure to meet quality targets. Example: Accuracy below SLA.
 Data Quality TrendQuality performance over time. Example: Monthly improvement.
 Data ReconciliationComparing datasets for consistency. Example: Finance system vs warehouse.
 Data ReliabilityConsistent data performance over time. Example: Stable metrics.
 Data RemediationFixing data quality issues. Example: Reprocessing failed loads.
 Data SamplingChecking subset of data. Example: Random record review.
 Data StandardizationTransforming data into a common format. Example: Converting dates to ISO format.
 Data StewardRole responsible for data quality. Example: Customer data steward.
 Data ThresholdAcceptable quality limit. Example: ≤ 1% nulls.
 Data TimelinessData availability within required timeframes. Example: Daily data refresh by 6 AM.
 Data Trust ScoreComposite measure of reliability. Example: Internal trust index.
 Data UniquenessNo unintended duplicates exist. Example: One row per customer.
 Data ValidationProcess of checking data against rules. Example: Rejecting invalid dates.
 Data Validation PipelineAutomated validation process. Example: Ingestion checks.
 Data ValidityData conforms to defined formats and rules. Example: Email follows standard pattern.
 Data VerificationConfirming data accuracy. Example: Source system comparison.
 Detective ControlFinds errors after entry. Example: Quality audits.
 Domain ValidationRestricting values to a set. Example: Status = Active/Inactive.
 Downstream ValidationValidating analytical outputs. Example: Dashboard totals.
 Duplicate DetectionIdentifying duplicate records. Example: Same email address twice.
 Error RateProportion of invalid records. Example: 2% failures.
 Foreign KeyReference to another table. Example: Order → Customer.
 Format ValidationEnsuring correct data format. Example: YYYY-MM-DD dates.
 Golden DatasetHighest-quality dataset version. Example: Curated finance data.
 Hard ValidationBlocking invalid data. Example: Reject invalid IDs.
 Null CheckEnsuring required fields are populated. Example: Order ID not null.
 Outlier DetectionIdentifying abnormal values. Example: Negative revenue amounts.
 Pattern MatchingValidating via regex patterns. Example: Postal code validation.
 Post-Load ValidationChecks after data load. Example: Row count comparisons.
 Pre-Load ValidationChecks before data ingestion. Example: File schema validation.
 Preventive ControlStops errors before entry. Example: Input validation.
 Primary KeyUnique record identifier. Example: CustomerID.
 Quality GateMandatory validation checkpoint. Example: Before publishing data.
 Range ValidationChecking values fall within limits. Example: Age between 0 and 120.
 Referential IntegrityValid relationships between tables. Example: Orders reference valid customers.
 Root Cause AnalysisIdentifying source of data issues. Example: ETL failure investigation.
 Schema ValidationChecking data structure against schema. Example: Column data types.
 Soft ValidationWarning without rejecting data. Example: Flag unusual values.
 Source System ValidationChecking upstream data. Example: CRM record checks.
 Statistical ValidationUsing statistics to validate data. Example: Distribution checks.
 Trusted DatasetData approved for consumption. Example: Executive KPIs.
 Validation CoverageProportion of data checked. Example: 100% of critical fields.
 Validation RuleCondition data must satisfy. Example: Quantity must be ≥ 0.
 Validation ThresholdLimit triggering failure. Example: >5% nulls.

Glossary – 100 “AI” Terms

Below is a glossary that includes 100 common “AI (Artificial Intelligence)” terms and phrases in alphabetical order. Enjoy!

TermDefinition & Example
 AccuracyPercentage of correct predictions. Example: 92% accuracy.
 AgentAI entity performing tasks autonomously. Example: Task-planning agent.
 AI AlignmentEnsuring AI goals match human values. Example: Safe AI systems.
 AI BiasSystematic unfairness in AI outcomes. Example: Biased hiring models.
 AlgorithmA set of rules used to train models. Example: Decision tree algorithm.
 Artificial General Intelligence (AGI)Hypothetical AI with human-level intelligence. Example: Broad reasoning across tasks.
 Artificial Intelligence (AI)Systems that perform tasks requiring human-like intelligence. Example: Chatbots answering questions.
 Artificial Neural Network (ANN)A network of interconnected artificial neurons. Example: Credit scoring models.
 Attention MechanismFocuses model on relevant input parts. Example: Language translation.
 AUCArea under ROC curve. Example: Model comparison.
 AutoMLAutomated model selection and tuning. Example: Auto-generated models.
 Autonomous SystemAI operating with minimal human input. Example: Self-driving cars.
 BackpropagationMethod to update neural network weights. Example: Deep learning training.
 BatchSubset of data processed at once. Example: Batch size of 32.
 Batch InferencePredictions made in bulk. Example: Nightly scoring jobs.
 Bias (Model Bias)Error from oversimplified assumptions. Example: Linear model on non-linear data.
 Bias–Variance TradeoffBalance between bias and variance. Example: Choosing model complexity.
 Black Box ModelModel with opaque internal logic. Example: Deep neural networks.
 ClassificationPredicting categorical outcomes. Example: Email spam classification.
 ClusteringGrouping similar data points. Example: Customer segmentation.
 Computer VisionAI for interpreting images and video. Example: Facial recognition.
 Concept DriftChanges in underlying relationships. Example: Fraud patterns evolving.
 Confusion MatrixTable evaluating classification results. Example: True positives vs false positives.
 Data AugmentationExpanding data via transformations. Example: Image rotation.
 Data DriftChanges in input data distribution. Example: New user demographics.
 Data LeakageUsing future information in training. Example: Including test labels.
 Decision TreeTree-based decision model. Example: Loan approval logic.
 Deep LearningML using multi-layer neural networks. Example: Image recognition.
 Dimensionality ReductionReducing number of features. Example: PCA for visualization.
 Edge AIAI running on local devices. Example: Smart cameras.
 EmbeddingNumerical representation of data. Example: Word embeddings.
 Ensemble ModelCombining multiple models. Example: Random forest.
 EpochOne full pass through training data. Example: 50 training epochs.
 Ethics in AIMoral considerations in AI use. Example: Avoiding bias.
 Explainable AI (XAI)Making AI decisions understandable. Example: Feature importance charts.
 F1 ScoreBalance of precision and recall. Example: Imbalanced datasets.
 FairnessEquitable AI outcomes across groups. Example: Equal approval rates.
 FeatureAn input variable for a model. Example: Customer age.
 Feature EngineeringCreating or transforming features to improve models. Example: Calculating customer tenure.
 Federated LearningTraining models across decentralized data. Example: Mobile keyboard predictions.
 Few-Shot LearningLearning from few examples. Example: Custom classification with few samples.
 Fine-TuningFurther training a pre-trained model. Example: Custom chatbot training.
 GeneralizationModel’s ability to perform on new data. Example: Accurate predictions on unseen data.
 Generative AIAI that creates new content. Example: Text or image generation.
 Gradient BoostingSequentially improving weak models. Example: XGBoost.
 Gradient DescentOptimization technique adjusting weights iteratively. Example: Training neural networks.
 HallucinationModel generates incorrect information. Example: False factual claims.
 HyperparameterConfiguration set before training. Example: Learning rate.
 InferenceUsing a trained model to predict. Example: Real-time recommendations.
 K-MeansClustering algorithm. Example: Market segmentation.
 Knowledge GraphGraph-based representation of knowledge. Example: Search engines.
 LabelThe correct output for supervised learning. Example: “Fraud” or “Not Fraud”.
 Large Language Model (LLM)AI trained on massive text corpora. Example: ChatGPT.
 Loss FunctionMeasures model error during training. Example: Mean squared error.
 Machine Learning (ML)AI that learns patterns from data without explicit programming. Example: Spam email detection.
 MLOpsPractices for managing ML lifecycle. Example: CI/CD for models.
 ModelA trained mathematical representation of patterns. Example: Logistic regression model.
 Model DeploymentMaking a model available for use. Example: API-based predictions.
 Model DriftModel performance degradation over time. Example: Changing customer behavior.
 Model InterpretabilityAbility to understand model behavior. Example: Decision tree visualization.
 Model VersioningTracking model changes. Example: v1 vs v2 models.
 MonitoringTracking model performance in production. Example: Accuracy alerts.
 Multimodal AIAI handling multiple data types. Example: Text + image models.
 Naive BayesProbabilistic classification algorithm. Example: Spam filtering.
 Natural Language Processing (NLP)AI for understanding human language. Example: Sentiment analysis.
 Neural NetworkModel inspired by the human brain’s structure. Example: Handwritten digit recognition.
 OptimizationProcess of minimizing loss. Example: Gradient descent.
 OverfittingModel learns noise instead of patterns. Example: Perfect training accuracy, poor test accuracy.
 PipelineAutomated ML workflow. Example: Training-to-deployment flow.
 PrecisionCorrect positive predictions rate. Example: Fraud detection precision.
 Pretrained ModelModel trained on general data. Example: GPT models.
 Principal Component Analysis (PCA)Technique for dimensionality reduction. Example: Compressing high-dimensional data.
 PrivacyProtecting personal data. Example: Anonymizing training data.
 PromptInput instruction for generative models. Example: “Summarize this text.”
 Prompt EngineeringCrafting effective prompts. Example: Improving LLM responses.
 Random ForestEnsemble of decision trees. Example: Classification tasks.
 Real-Time InferenceImmediate predictions on live data. Example: Fraud detection.
 RecallAbility to find all positives. Example: Cancer detection.
 RegressionPredicting numeric values. Example: Sales forecasting.
 Reinforcement LearningLearning through rewards and penalties. Example: Game-playing AI.
 ReproducibilityAbility to recreate results. Example: Fixed random seeds.
 RoboticsAI applied to physical machines. Example: Warehouse robots.
 ROC CurvePerformance visualization for classifiers. Example: Threshold analysis.
 Semi-Supervised LearningMix of labeled and unlabeled data. Example: Image classification with limited labels.
 Speech RecognitionConverting speech to text. Example: Voice assistants.
 Supervised LearningLearning using labeled data. Example: Predicting house prices from known values.
 Support Vector Machine (SVM)Algorithm separating data with margins. Example: Text classification.
 Synthetic DataArtificially generated data. Example: Privacy-safe training.
 Test DataData used to evaluate model performance. Example: Held-out validation dataset.
 ThresholdCutoff for classification decisions. Example: Probability > 0.7.
 TokenSmallest unit of text processed by models. Example: Words or subwords.
 Training DataData used to teach a model. Example: Historical sales records.
 Transfer LearningReusing knowledge from another task. Example: Image model reused for medical scans.
 TransformerNeural architecture for sequence data. Example: Language translation models.
 UnderfittingModel too simple to capture patterns. Example: High error on all datasets.
 Unsupervised LearningLearning from unlabeled data. Example: Customer clustering.
 Validation DataData used to tune model parameters. Example: Hyperparameter selection.
 VarianceError from sensitivity to data fluctuations. Example: Highly complex model.
 XGBoostOptimized gradient boosting algorithm. Example: Kaggle competitions.
 Zero-Shot LearningPerforming tasks without examples. Example: Classifying unseen labels.

Please share your suggestions for any terms that should be added.

Glossary – 100 “Data Engineering” Terms

Below is a glossary that includes 100 common “Data Engineering” terms and phrases in alphabetical order. Enjoy!

TermDefinition & Example
Access ControlManaging who can access data. Example: Role-based permissions.
At-Least-Once ProcessingData may be processed more than once. Example: Duplicate-safe pipelines.
At-Most-Once ProcessingData processed zero or one time. Example: No retries on failure.
BackfillProcessing historical data. Example: Reloading last year’s data.
Batch ProcessingProcessing data in scheduled chunks. Example: Daily sales aggregation.
Blue-Green DeploymentDeployment strategy minimizing downtime. Example: Switching pipeline versions.
Canary ReleaseGradual rollout to detect issues. Example: New pipeline tested on 5% of data.
Change Data Capture (CDC)Capturing database changes. Example: Streaming updates from OLTP DB.
CheckpointingSaving progress during processing. Example: Spark streaming checkpoints.
Cloud StorageScalable remote data storage. Example: Azure Data Lake Storage.
Cold StorageLow-cost storage for infrequent access. Example: Archived logs.
Columnar StorageData stored by column instead of row. Example: Parquet files.
CompressionReducing data size. Example: Gzip-compressed files.
Compute EngineSystem performing data processing. Example: Spark cluster.
Consumption LayerData prepared for analytics. Example: Gold layer.
Cost OptimizationReducing infrastructure costs. Example: Query optimization.
Curated LayerCleaned and transformed data. Example: Silver layer.
DAG (Directed Acyclic Graph)Workflow structure with dependencies. Example: Airflow pipeline.
Data CatalogSearchable inventory of data assets. Example: Azure Purview.
Data ContractAgreement defining data structure and expectations. Example: Producer guarantees column names and types.
Data EngineeringThe practice of designing, building, and maintaining data systems. Example: Creating pipelines that feed analytics dashboards.
Data GovernancePolicies for data management and usage. Example: Access control rules.
Data IngestionCollecting data from source systems. Example: Ingesting API data hourly.
Data LakeCentralized storage for raw data. Example: S3-based data lake.
Data LatencyTime delay in data availability. Example: 5-minute pipeline delay.
Data LineageTracking data flow from source to output. Example: Source-to-dashboard trace.
Data MartSubset of warehouse for specific use. Example: Finance data mart.
Data MaskingObscuring sensitive data. Example: Masked credit card numbers.
Data MeshDomain-oriented decentralized data ownership. Example: Teams own their data products.
Data ModelingDesigning data structures for usage. Example: Star schema design.
Data ObservabilityMonitoring data health and pipelines. Example: Freshness alerts.
Data Partition PruningSkipping irrelevant partitions. Example: Querying one date only.
Data PipelineAn automated process that moves and transforms data. Example: Nightly ETL job from CRM to warehouse.
Data PlatformIntegrated set of data tools. Example: End-to-end analytics stack.
Data ProductA dataset treated as a product. Example: Curated customer table.
Data ProfilingAnalyzing data characteristics. Example: Value distributions.
Data QualityAccuracy, completeness, and reliability of data. Example: No duplicate records.
Data ReplayReprocessing historical events. Example: Rebuilding aggregates from logs.
Data RetentionRules for data lifespan. Example: Delete logs after 1 year.
Data SecurityProtecting data from unauthorized access. Example: Encryption at rest.
Data SerializationConverting data for storage or transport. Example: Avro encoding.
Data SinkThe destination where data is stored. Example: Data warehouse.
Data SourceThe origin of data. Example: ERP system, SaaS application.
Data ValidationEnsuring data meets expectations. Example: Null checks.
Data VersioningTracking dataset changes. Example: Snapshot tables.
Data WarehouseOptimized storage for analytics queries. Example: Azure Synapse Analytics.
Dead Letter Queue (DLQ)Storage for failed records. Example: Invalid messages routed for review.
Dimension TableTable storing descriptive attributes. Example: Customer details.
ELTExtract, Load, Transform approach. Example: Transforming data inside Snowflake.
ETLExtract, Transform, Load process. Example: Cleaning data before loading into a database.
Event TimeTimestamp when event occurred. Example: User click time.
Event-Driven ArchitectureSystems reacting to events in real time. Example: Trigger pipeline on file arrival.
Exactly-Once ProcessingEnsuring data is processed only once. Example: Preventing duplicate events.
Fact TableTable storing quantitative measures. Example: Order transactions.
Fault ToleranceSystem resilience to failures. Example: Node failure recovery.
File FormatHow data is stored on disk. Example: Parquet, CSV.
Foreign KeyField linking tables together. Example: CustomerID in orders table.
Full LoadReloading all data. Example: Initial table population.
High AvailabilitySystem uptime and reliability. Example: Multi-zone deployment.
Hot StorageHigh-performance storage for frequent access. Example: Real-time tables.
IdempotencyAbility to rerun pipelines safely. Example: Reprocessing without duplicates.
Incremental LoadLoading only new or changed data. Example: CDC-based ingestion.
IndexingCreating structures to speed queries. Example: Index on order date.
Infrastructure as Code (IaC)Managing infrastructure via code. Example: Terraform scripts.
LakehouseHybrid of data lake and warehouse. Example: Databricks Lakehouse.
Late-Arriving DataData that arrives after expected time. Example: Delayed event logs.
LoggingRecording system events. Example: Job execution logs.
Message QueueBuffer for asynchronous data transfer. Example: Kafka topic for events.
MetadataData about data. Example: Table definitions and lineage.
MetricsQuantitative indicators of performance. Example: Rows processed per run.
OrchestrationCoordinating pipeline execution. Example: DAG scheduling.
PartitioningDividing data for performance. Example: Partitioning by date.
Personally Identifiable Information (PII)Data identifying individuals. Example: Email addresses.
Pipeline MonitoringTracking pipeline execution status. Example: Failure notifications.
Primary KeyUnique identifier for a record. Example: CustomerID.
Processing TimeTimestamp when data is processed. Example: Ingestion time.
Query OptimizationImproving query efficiency. Example: Predicate pushdown.
Raw LayerStorage of unprocessed data. Example: Bronze layer.
Real-Time DataData available with minimal latency. Example: Live dashboard updates.
Retry LogicAutomatic reruns on failure. Example: Retry failed ingestion job.
ScalabilityAbility to handle growing workloads. Example: Auto-scaling clusters.
SchedulerTool managing execution timing. Example: Cron, Airflow.
SchemaThe structure of a dataset. Example: Table columns and data types.
Schema EvolutionHandling schema changes over time. Example: Adding new columns safely.
Secrets ManagementSecure handling of credentials. Example: Key Vault for passwords.
Semi-Structured DataData with flexible schema. Example: JSON, Parquet.
ServerlessInfrastructure managed by provider. Example: Serverless SQL pools.
Serving LayerLayer optimized for consumption. Example: BI-ready tables.
ShardingDistributing data across nodes. Example: User data split across servers.
Snowflake SchemaNormalized version of star schema. Example: Product broken into sub-dimensions.
Star SchemaFact table surrounded by dimensions. Example: Sales fact with date dimension.
Stream ProcessingProcessing data in real time. Example: Clickstream event processing.
Structured DataData with a fixed schema. Example: SQL tables.
Technical DebtLong-term cost of quick fixes. Example: Hardcoded transformations.
ThroughputAmount of data processed per unit time. Example: Records per second.
Transformation LayerLayer where business logic is applied. Example: dbt models.
Unstructured DataData without a predefined structure. Example: Images, PDFs.
WatermarkMarker for processed data. Example: Last processed timestamp.
WindowingGrouping stream data by time windows. Example: 5-minute aggregations.
Workload IsolationSeparating workloads to avoid contention. Example: Dedicated compute pools.

Please share your suggestions for any terms that should be added.

Glossary – 100 “Data Analysis” Terms

Below is a glossary that includes 100 common “Data Analysis” terms and phrases in alphabetical order. Enjoy!

TermDefinition & Example
A/B TestComparing two variations to measure impact. Example: Two webpage layouts.
Actionable InsightAn insight that leads to a clear decision. Example: Improve onboarding experience.
Ad Hoc AnalysisOne-off analysis for a specific question. Example: Investigating a sudden sales dip.
AggregationSummarizing data using functions like sum or average. Example: Total revenue by region.
Analytical MaturityOrganization’s capability to use data effectively. Example: Moving from descriptive to predictive analytics.
Bar ChartA chart comparing categories. Example: Sales by region.
BaselineA reference point for comparison. Example: Last year’s sales used as baseline.
BenchmarkA standard used to compare performance. Example: Industry average churn rate.
BiasSystematic error in data or analysis. Example: Surveying only active users.
Business QuestionA decision-focused question data aims to answer. Example: Which products drive profit?
CausationA relationship where one variable causes another. Example: Price cuts causing sales growth.
Confidence IntervalRange likely containing a true value. Example: 95% CI for average sales.
CorrelationA statistical relationship between variables. Example: Sales and marketing spend.
Cumulative TotalA running total over time. Example: Year-to-date revenue.
DashboardA visual collection of key metrics. Example: Executive sales dashboard.
DataRaw facts or measurements collected for analysis. Example: Sales transactions, sensor readings, survey responses.
Data AnomalyUnexpected or unusual data pattern. Example: Sudden spike in user signups.
Data CleaningCorrecting or removing inaccurate data. Example: Fixing misspelled country names.
Data ConsistencyUniform representation across datasets. Example: Same currency used everywhere.
Data GovernancePolicies ensuring data quality, security, and usage. Example: Defined data ownership roles.
Data ImputationReplacing missing values with estimated ones. Example: Filling null ages with the median.
Data LineageTracking data origin and transformations. Example: Tracing metrics back to source systems.
Data LiteracyAbility to read, understand, and use data. Example: Interpreting charts correctly.
Data ModelThe structure defining how data tables relate. Example: Star schema.
Data PipelineAutomated flow of data from source to destination. Example: Daily ingestion job.
Data ProfilingAnalyzing data characteristics. Example: Checking null percentages.
Data QualityThe accuracy, completeness, and reliability of data. Example: Valid dates and consistent formats.
Data RefreshUpdating data with the latest values. Example: Nightly refresh.
Data Refresh FrequencyHow often data is updated. Example: Hourly vs. daily refresh.
Data SkewnessDegree of asymmetry in data distribution. Example: Income data skewed to the right.
Data SourceThe origin of data. Example: SQL database, API.
Data StorytellingCommunicating insights using narrative and visuals. Example: Executive-ready presentation.
Data TransformationModifying data to improve usability or consistency. Example: Converting text dates to date data types.
Data ValidationEnsuring data meets rules and expectations. Example: No negative quantities.
Data WranglingTransforming raw data into a usable format. Example: Reshaping columns for analysis.
DatasetA structured collection of related data. Example: A table of customer orders with dates, amounts, and regions.
Derived MetricA metric calculated from other metrics. Example: Profit margin = Profit / Revenue.
Descriptive AnalyticsAnalysis that explains what happened. Example: Last quarter’s sales summary.
Diagnostic AnalyticsAnalysis that explains why something happened. Example: Revenue drop due to fewer customers.
DiceFiltering data by multiple dimensions. Example: Sales for 2025 in the West region.
DimensionA descriptive attribute used to slice data. Example: Date, region, product.
Dimension TableA table containing descriptive attributes. Example: Product details.
DimensionalityNumber of features or variables in data. Example: High-dimensional customer data.
DistributionHow values are spread across a range. Example: Income distribution.
Drill DownNavigating from summary to detail. Example: Yearly sales → monthly sales.
Drill ThroughJumping to a detailed view for a specific value. Example: Clicking a region to see store data.
ELTExtract, Load, Transform approach. Example: Transforming data inside a warehouse.
ETLExtract, Transform, Load process. Example: Loading CRM data into a warehouse.
Exploratory Data Analysis (EDA)Initial investigation to understand data. Example: Visualizing distributions.
Fact TableA table containing quantitative data. Example: Sales transactions.
FeatureAn individual measurable property used in analysis. Example: Customer age used in churn analysis.
Feature EngineeringCreating new features from existing data. Example: Calculating customer tenure from signup date.
FilteringLimiting data to a subset of interest. Example: Only orders from 2025.
GranularityThe level of detail in the data. Example: Daily sales vs. monthly sales.
GroupingOrganizing data into categories before aggregation. Example: Sales grouped by product category.
HistogramA chart showing data distribution. Example: Frequency of order sizes.
HypothesisA testable assumption. Example: Discounts increase sales.
Incremental LoadLoading only new or changed data. Example: Yesterday’s transactions.
InsightA meaningful finding that informs action. Example: High churn among new users.
KPI (Key Performance Indicator)A critical metric tied to business objectives. Example: Monthly churn rate.
KurtosisMeasure of how heavy the tails of a distribution are. Example: Detecting extreme outliers.
LatencyDelay between data generation and availability. Example: Real-time vs. daily data.
Line ChartA chart showing trends over time. Example: Monthly revenue trend.
MeanThe arithmetic average. Example: Average order value.
MeasureA calculated numeric value, often aggregated. Example: SUM(Sales).
MedianThe middle value in ordered data. Example: Median household income.
MetricA quantifiable measure used to track performance. Example: Total sales, average order value.
Missing ValuesData points that are absent or null. Example: Blank customer age values.
ModeThe most frequent value. Example: Most common product category.
Multivariate AnalysisAnalyzing multiple variables simultaneously. Example: Studying price, demand, and seasonality.
NormalizationScaling data to a common range. Example: Normalizing values between 0 and 1.
ObservationA single record or row in a dataset. Example: One customer’s purchase history.
OutlierA data point significantly different from others. Example: An unusually large transaction amount.
PercentileValue below which a percentage of data falls. Example: 90th percentile response time.
PopulationThe full set of interest. Example: All customers.
Predictive AnalyticsAnalysis that forecasts future outcomes. Example: Predicting next month’s demand.
Prescriptive AnalyticsAnalysis that suggests actions. Example: Recommending price changes.
QuartileValues dividing data into four parts. Example: Q1, Q2, Q3.
ReportA structured presentation of analysis results. Example: Monthly performance report.
ReproducibilityAbility to recreate analysis results consistently. Example: Using versioned datasets.
Rolling AverageAn average calculated over a moving window. Example: 7-day rolling average of sales.
Root Cause AnalysisIdentifying the underlying cause of an issue. Example: Revenue loss due to inventory shortages.
SampleA subset of a population. Example: Survey respondents.
Sampling BiasBias introduced by non-random samples. Example: Feedback collected only from power users.
Scatter PlotA chart showing relationships between two variables. Example: Ad spend vs. revenue.
SeasonalityRepeating patterns tied to time cycles. Example: Holiday sales spikes.
Semi-Structured DataData with flexible structure. Example: JSON files.
Sensitivity AnalysisEvaluating how outcomes change with inputs. Example: Impact of price changes on profit.
SliceFiltering data by a single dimension. Example: Sales for 2025 only.
SnapshotData captured at a specific point in time. Example: End-of-month balances.
Snowflake SchemaA normalized version of a star schema. Example: Product broken into sub-tables.
Standard DeviationAverage distance from the mean. Example: Consistency of sales performance.
StandardizationRescaling data to have mean 0 and standard deviation 1. Example: Preparing data for regression analysis.
Star SchemaA data model with facts surrounded by dimensions. Example: Sales fact with product and date dimensions.
Structured DataData with a fixed schema. Example: Relational tables.
Time SeriesData indexed by time. Example: Daily stock prices.
TrendA general direction in data over time. Example: Increasing monthly revenue.
Unstructured DataData without a predefined schema. Example: Emails, images.
VariableA characteristic or attribute that can take different values. Example: Age, revenue, product category.
VarianceMeasure of data spread. Example: Variance in delivery times.

Please share your suggestions for any terms that should be added.