Below is a glossary that includes 100 common “Data Analysis” terms and phrases in alphabetical order. Enjoy!
| Term | Definition & Example |
| A/B Test | Comparing two variations to measure impact. Example: Two webpage layouts. |
| Actionable Insight | An insight that leads to a clear decision. Example: Improve onboarding experience. |
| Ad Hoc Analysis | One-off analysis for a specific question. Example: Investigating a sudden sales dip. |
| Aggregation | Summarizing data using functions like sum or average. Example: Total revenue by region. |
| Analytical Maturity | Organization’s capability to use data effectively. Example: Moving from descriptive to predictive analytics. |
| Bar Chart | A chart comparing categories. Example: Sales by region. |
| Baseline | A reference point for comparison. Example: Last year’s sales used as baseline. |
| Benchmark | A standard used to compare performance. Example: Industry average churn rate. |
| Bias | Systematic error in data or analysis. Example: Surveying only active users. |
| Business Question | A decision-focused question data aims to answer. Example: Which products drive profit? |
| Causation | A relationship where one variable causes another. Example: Price cuts causing sales growth. |
| Confidence Interval | Range likely containing a true value. Example: 95% CI for average sales. |
| Correlation | A statistical relationship between variables. Example: Sales and marketing spend. |
| Cumulative Total | A running total over time. Example: Year-to-date revenue. |
| Dashboard | A visual collection of key metrics. Example: Executive sales dashboard. |
| Data | Raw facts or measurements collected for analysis. Example: Sales transactions, sensor readings, survey responses. |
| Data Anomaly | Unexpected or unusual data pattern. Example: Sudden spike in user signups. |
| Data Cleaning | Correcting or removing inaccurate data. Example: Fixing misspelled country names. |
| Data Consistency | Uniform representation across datasets. Example: Same currency used everywhere. |
| Data Governance | Policies ensuring data quality, security, and usage. Example: Defined data ownership roles. |
| Data Imputation | Replacing missing values with estimated ones. Example: Filling null ages with the median. |
| Data Lineage | Tracking data origin and transformations. Example: Tracing metrics back to source systems. |
| Data Literacy | Ability to read, understand, and use data. Example: Interpreting charts correctly. |
| Data Model | The structure defining how data tables relate. Example: Star schema. |
| Data Pipeline | Automated flow of data from source to destination. Example: Daily ingestion job. |
| Data Profiling | Analyzing data characteristics. Example: Checking null percentages. |
| Data Quality | The accuracy, completeness, and reliability of data. Example: Valid dates and consistent formats. |
| Data Refresh | Updating data with the latest values. Example: Nightly refresh. |
| Data Refresh Frequency | How often data is updated. Example: Hourly vs. daily refresh. |
| Data Skewness | Degree of asymmetry in data distribution. Example: Income data skewed to the right. |
| Data Source | The origin of data. Example: SQL database, API. |
| Data Storytelling | Communicating insights using narrative and visuals. Example: Executive-ready presentation. |
| Data Transformation | Modifying data to improve usability or consistency. Example: Converting text dates to date data types. |
| Data Validation | Ensuring data meets rules and expectations. Example: No negative quantities. |
| Data Wrangling | Transforming raw data into a usable format. Example: Reshaping columns for analysis. |
| Dataset | A structured collection of related data. Example: A table of customer orders with dates, amounts, and regions. |
| Derived Metric | A metric calculated from other metrics. Example: Profit margin = Profit / Revenue. |
| Descriptive Analytics | Analysis that explains what happened. Example: Last quarter’s sales summary. |
| Diagnostic Analytics | Analysis that explains why something happened. Example: Revenue drop due to fewer customers. |
| Dice | Filtering data by multiple dimensions. Example: Sales for 2025 in the West region. |
| Dimension | A descriptive attribute used to slice data. Example: Date, region, product. |
| Dimension Table | A table containing descriptive attributes. Example: Product details. |
| Dimensionality | Number of features or variables in data. Example: High-dimensional customer data. |
| Distribution | How values are spread across a range. Example: Income distribution. |
| Drill Down | Navigating from summary to detail. Example: Yearly sales → monthly sales. |
| Drill Through | Jumping to a detailed view for a specific value. Example: Clicking a region to see store data. |
| ELT | Extract, Load, Transform approach. Example: Transforming data inside a warehouse. |
| ETL | Extract, Transform, Load process. Example: Loading CRM data into a warehouse. |
| Exploratory Data Analysis (EDA) | Initial investigation to understand data. Example: Visualizing distributions. |
| Fact Table | A table containing quantitative data. Example: Sales transactions. |
| Feature | An individual measurable property used in analysis. Example: Customer age used in churn analysis. |
| Feature Engineering | Creating new features from existing data. Example: Calculating customer tenure from signup date. |
| Filtering | Limiting data to a subset of interest. Example: Only orders from 2025. |
| Granularity | The level of detail in the data. Example: Daily sales vs. monthly sales. |
| Grouping | Organizing data into categories before aggregation. Example: Sales grouped by product category. |
| Histogram | A chart showing data distribution. Example: Frequency of order sizes. |
| Hypothesis | A testable assumption. Example: Discounts increase sales. |
| Incremental Load | Loading only new or changed data. Example: Yesterday’s transactions. |
| Insight | A meaningful finding that informs action. Example: High churn among new users. |
| KPI (Key Performance Indicator) | A critical metric tied to business objectives. Example: Monthly churn rate. |
| Kurtosis | Measure of how heavy the tails of a distribution are. Example: Detecting extreme outliers. |
| Latency | Delay between data generation and availability. Example: Real-time vs. daily data. |
| Line Chart | A chart showing trends over time. Example: Monthly revenue trend. |
| Mean | The arithmetic average. Example: Average order value. |
| Measure | A calculated numeric value, often aggregated. Example: SUM(Sales). |
| Median | The middle value in ordered data. Example: Median household income. |
| Metric | A quantifiable measure used to track performance. Example: Total sales, average order value. |
| Missing Values | Data points that are absent or null. Example: Blank customer age values. |
| Mode | The most frequent value. Example: Most common product category. |
| Multivariate Analysis | Analyzing multiple variables simultaneously. Example: Studying price, demand, and seasonality. |
| Normalization | Scaling data to a common range. Example: Normalizing values between 0 and 1. |
| Observation | A single record or row in a dataset. Example: One customer’s purchase history. |
| Outlier | A data point significantly different from others. Example: An unusually large transaction amount. |
| Percentile | Value below which a percentage of data falls. Example: 90th percentile response time. |
| Population | The full set of interest. Example: All customers. |
| Predictive Analytics | Analysis that forecasts future outcomes. Example: Predicting next month’s demand. |
| Prescriptive Analytics | Analysis that suggests actions. Example: Recommending price changes. |
| Quartile | Values dividing data into four parts. Example: Q1, Q2, Q3. |
| Report | A structured presentation of analysis results. Example: Monthly performance report. |
| Reproducibility | Ability to recreate analysis results consistently. Example: Using versioned datasets. |
| Rolling Average | An average calculated over a moving window. Example: 7-day rolling average of sales. |
| Root Cause Analysis | Identifying the underlying cause of an issue. Example: Revenue loss due to inventory shortages. |
| Sample | A subset of a population. Example: Survey respondents. |
| Sampling Bias | Bias introduced by non-random samples. Example: Feedback collected only from power users. |
| Scatter Plot | A chart showing relationships between two variables. Example: Ad spend vs. revenue. |
| Seasonality | Repeating patterns tied to time cycles. Example: Holiday sales spikes. |
| Semi-Structured Data | Data with flexible structure. Example: JSON files. |
| Sensitivity Analysis | Evaluating how outcomes change with inputs. Example: Impact of price changes on profit. |
| Slice | Filtering data by a single dimension. Example: Sales for 2025 only. |
| Snapshot | Data captured at a specific point in time. Example: End-of-month balances. |
| Snowflake Schema | A normalized version of a star schema. Example: Product broken into sub-tables. |
| Standard Deviation | Average distance from the mean. Example: Consistency of sales performance. |
| Standardization | Rescaling data to have mean 0 and standard deviation 1. Example: Preparing data for regression analysis. |
| Star Schema | A data model with facts surrounded by dimensions. Example: Sales fact with product and date dimensions. |
| Structured Data | Data with a fixed schema. Example: Relational tables. |
| Time Series | Data indexed by time. Example: Daily stock prices. |
| Trend | A general direction in data over time. Example: Increasing monthly revenue. |
| Unstructured Data | Data without a predefined schema. Example: Emails, images. |
| Variable | A characteristic or attribute that can take different values. Example: Age, revenue, product category. |
| Variance | Measure of data spread. Example: Variance in delivery times. |
Please share your suggestions for any terms that should be added.
