Category: Performance Tuning

Implement a Star Schema for a Lakehouse or Warehouse

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Prepare data
--> Transform data
--> Implement a star schema for a lakehouse or warehouse

Designing and implementing an effective schema is foundational to efficient analytics. In Microsoft Fabric, structuring your data into a star schema dramatically improves query performance, simplifies reporting, and aligns with best practices for BI workloads.

This article explains what a star schema is, why it matters in Fabric, and how to implement it in a lakehouse or data warehouse.

What Is a Star Schema?

A star schema is a relational modeling technique that organizes data into two primary types of tables:

  • Fact tables: Contain measurable, quantitative data (metrics, transactions, events).
  • Dimension tables: Contain descriptive attributes (e.g., customer info, product details, dates).

Star schemas get their name because the design resembles a star—a central fact table linked to multiple dimension tables.

Why Use a Star Schema?

A star schema offers multiple advantages for analytical workloads:

  • Improved query performance: Queries are simplified and optimized due to straightforward joins.
  • Simpler reporting: BI tools like Power BI map naturally to star schemas.
  • Aggregations and drill-downs: Dimension tables support filtering and hierarchy reporting.
  • Better scalability: Optimized for large datasets and parallel processing.

In Fabric, both lakehouses and warehouses support star schema implementations, depending on workload and user needs.

Core Components of a Star Schema

1. Fact Tables

Fact tables store the numeric measurements of business processes.
Common characteristics:

  • Contains keys linking to dimensions
  • Often large and wide
  • Used for aggregations (SUM, COUNT, AVG, etc.)

Examples:
Sales transactions, inventory movement, website events

2. Dimension Tables

Dimension tables describe contextual attributes.
Common characteristics:

  • Contain descriptive fields
  • Usually smaller than fact tables
  • Often used for filtering/grouping

Examples:
Customer, product, date, geography

Implementing a Star Schema in a Lakehouse

Lakehouses in Fabric support Delta format tables and both Spark SQL and T-SQL analytics endpoints.

Steps to Implement:

  1. Ingest raw data into your lakehouse (as files or staging tables).
  2. Transform data:
    • Cleanse and conform fields
    • Derive business keys
  3. Create dimension tables:
    • Deduplicate
    • Add descriptive attributes
  4. Create fact tables:
    • Join transactional data to dimension keys
    • Store numeric measures
  5. Optimize:
    • Partition and Z-ORDER for performance

Tools You Might Use:

  • Notebooks (PySpark)
  • Lakehouse SQL
  • Data pipelines

Exam Tip:
Lakehouses are ideal when you need flexibility, schema evolution, or combined batch + exploratory analytics.

Implementing a Star Schema in a Warehouse

Data warehouses in Fabric provide a SQL-optimized store designed for BI workloads.

Steps to Implement:

  1. Stage raw data in warehouse tables
  2. Build conforming dimension tables
  3. Build fact tables with proper keys
  4. Add constraints and indexes (as appropriate)
  5. Optimize with materialized views or aggregations

Warehouse advantages:

  • Strong query performance for BI
  • Native SQL analytics
  • Excellent integration with Power BI and semantic models

Exam Tip:
Choose a warehouse when your priority is high-performance BI analytics with well-defined dimensional models.

Common Star Schema Patterns

Conformed Dimensions

  • Dimensions shared across multiple fact tables
  • Ensures consistent filtering and reporting across business processes

Slowly Changing Dimensions (SCD)

  • Maintain historical attribute changes
  • Types include Type 1 (overwrite) and Type 2 (versioning)

Fact Table Grain

  • Define the “grain” (level of detail) clearly—for example, “one row per sales transaction.”

Star Schema and Power BI Semantic Models

Semantic models often sit on top of star schemas:

  • Fact tables become measure containers
  • Dimensions become filtering hierarchies
  • Reduces DAX complexity
  • Improves performance

Best Practice: Structure your lakehouse or warehouse into a star schema before building the semantic model.

Star Schema in Lakehouse vs Warehouse

FeatureLakehouseWarehouse
Query enginesSpark & SQLSQL only
Best forMixed workloads (big data + SQL)BI & reporting
OptimizationPartition/Z-ORDERIndexing and statistics
ToolingNotebooks, pipelinesSQL scripts, BI artifacts
Schema complexityFlexibleRigid

Governance and Performance Considerations

  • Use consistent keys across facts and dimensions
  • Validate referential integrity where possible
  • Avoid wide, unindexed tables for BI queries
  • Apply sensitivity labels on schemas for governance
  • Document schema and business logic

What to Know for the DP-600 Exam

Be prepared to:

  • Explain the purpose of star schema components
  • Identify when to implement star schema in lakehouses vs warehouses
  • Recognize patterns like conformed dimensions and SCDs
  • Understand performance implications of schema design
  • Relate star schema design to Power BI and semantic models

Final Exam Tip
If the question emphasizes high-performance reporting, simple joins, and predictable filtering, think star schema.
If it mentions big data exploration or flexible schema evolution, star schema in a lakehouse may be part of the answer.

Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

1. What is the defining characteristic of a star schema?

A. Multiple fact tables connected through bridge tables
B. A central fact table connected directly to dimension tables
C. Fully normalized transactional tables
D. A schema optimized for OLTP workloads

Correct Answer: B

Explanation:
A star schema consists of a central fact table directly linked to surrounding dimension tables, forming a star-like structure optimized for analytics.

2. Which type of data is stored in a fact table?

A. Descriptive attributes such as names and categories
B. Hierarchical metadata for navigation
C. Quantitative, measurable values
D. User access permissions

Correct Answer: C

Explanation:
Fact tables store numeric measures (e.g., sales amount, quantity) that are aggregated during analytical queries.

3. Which table type is typically smaller and used for filtering and grouping?

A. Fact table
B. Dimension table
C. Bridge table
D. Staging table

Correct Answer: B

Explanation:
Dimension tables store descriptive attributes and are commonly used for filtering, grouping, and slicing fact data in reports.

4. Why are star schemas preferred for Power BI semantic models?

A. They eliminate the need for relationships
B. They align naturally with BI tools and optimize query performance
C. They reduce OneLake storage usage
D. They replace DAX calculations

Correct Answer: B

Explanation:
Power BI and other BI tools are optimized for star schemas, which simplify joins, reduce model complexity, and improve performance.

5. When implementing a star schema in a Fabric lakehouse, which storage format is typically used?

A. CSV
B. JSON
C. Parquet
D. Delta

Correct Answer: D

Explanation:
Fabric lakehouses store tables in Delta format, which supports ACID transactions and efficient analytical querying.

6. Which scenario most strongly suggests using a warehouse instead of a lakehouse for a star schema?

A. Schema evolution and exploratory data science
B. High-performance, SQL-based BI reporting
C. Streaming ingestion of real-time events
D. Semi-structured data exploration

Correct Answer: B

Explanation:
Fabric warehouses are optimized for SQL-based analytics and BI workloads, making them ideal for star schemas supporting reporting scenarios.

7. What does the “grain” of a fact table describe?

A. The number of dimensions in the table
B. The level of detail represented by each row
C. The size of the table in storage
D. The indexing strategy

Correct Answer: B

Explanation:
The grain defines the level of detail for each row in the fact table (e.g., one row per transaction or per day).

8. What is a conformed dimension?

A. A dimension used by only one fact table
B. A dimension that contains only numeric values
C. A shared dimension used consistently across multiple fact tables
D. A dimension generated dynamically at query time

Correct Answer: C

Explanation:
Conformed dimensions are shared across multiple fact tables, enabling consistent filtering and reporting across different business processes.

9. Which design choice improves performance when querying star schemas?

A. Highly normalized dimension tables
B. Complex many-to-many relationships
C. Simple joins between fact and dimension tables
D. Storing dimensions inside the fact table

Correct Answer: C

Explanation:
Star schemas minimize join complexity by using simple, direct relationships between facts and dimensions, improving query performance.

10. Which statement best describes how star schemas fit into the Fabric analytics lifecycle?

A. They replace semantic models entirely
B. They are used only for real-time analytics
C. They provide an analytics-ready structure for reporting and modeling
D. They are required only for data ingestion

Correct Answer: C

Explanation:
Star schemas organize data into an analytics-ready structure that supports semantic models, reporting, and scalable BI workloads.

Choose Between Direct Lake on OneLake and Direct Lake on SQL Endpoints

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Implement and manage semantic models (25-30%)
--> Optimize enterprise-scale semantic models
--> Choose between Direct Lake on OneLake and Direct Lake on SQL endpoints

In Microsoft Fabric, Direct Lake is a high-performance semantic model storage mode that allows Power BI and Fabric semantic models to query data directly from OneLake without importing it into VertiPaq. When implementing Direct Lake, you must choose where the semantic model reads from, either:

  • Direct Lake on OneLake
  • Direct Lake on SQL endpoints

Understanding the differences, trade-offs, and use cases for each option is critical for optimizing enterprise-scale semantic models, and this topic appears explicitly in the DP-600 exam blueprint.


Direct Lake on OneLake

What It Is

Direct Lake on OneLake connects the semantic model directly to Delta tables stored in OneLake, bypassing SQL engines entirely. Queries operate directly on Parquet/Delta files using the Fabric Direct Lake engine.

Key Characteristics

  • Reads Delta tables directly from OneLake
  • No dependency on a SQL query engine
  • Near-Import performance with zero data duplication
  • Minimal latency between data ingestion and reporting
  • Requires supported Delta table structures and data types

Advantages

  • Best performance for large-scale analytics
  • Always reflects the latest data written to OneLake
  • Eliminates Import refresh overhead
  • Ideal for lakehouse-centric architectures

Limitations

  • Some complex DAX patterns may cause fallback
  • Requires schema compatibility with Direct Lake
  • Less flexibility for SQL-based transformations

Typical Use Cases

  • Enterprise lakehouse analytics
  • High-volume fact tables
  • Near-real-time reporting
  • Fabric-native data pipelines

Direct Lake on SQL Endpoints

What It Is

Direct Lake on SQL endpoints connects the semantic model to the SQL analytics endpoint of a Lakehouse or Warehouse, while still using Direct Lake storage mode behind the scenes.

Instead of reading files directly, the semantic model relies on the SQL endpoint to expose the data.

Key Characteristics

  • Queries go through the SQL endpoint
  • Still benefits from Direct Lake storage
  • Enables SQL views and transformations
  • Slightly higher latency than pure OneLake access

Advantages

  • Supports SQL-based modeling (views, joins, calculated columns)
  • Easier integration with existing SQL logic
  • Familiar experience for SQL-first teams
  • Useful when business logic is already defined in SQL

Limitations

  • Additional query layer may impact performance
  • Less efficient than direct file access
  • SQL endpoint availability becomes a dependency

Typical Use Cases

  • Organizations with strong SQL development practices
  • Reuse of existing SQL views and transformations
  • Gradual migration from Warehouse or SQL models
  • Mixed BI and ad-hoc SQL workloads

Key Comparison Summary

AspectDirect Lake on OneLakeDirect Lake on SQL Endpoint
Data accessDirect file accessVia SQL analytics endpoint
PerformanceHighestSlightly lower
SQL dependencyNoneRequired
Schema flexibilityLowerHigher
Transformation styleLakehouse / SparkSQL-based
Ideal forScale & performanceSQL reuse & flexibility

Choosing Between the Two (Exam-Focused Guidance)

On the DP-600 exam, questions typically focus on architectural intent and performance optimization:

Choose Direct Lake on OneLake when:

  • Performance is the top priority
  • Data is already modeled in Delta tables
  • You want the simplest, most scalable architecture
  • Near-real-time analytics are required

Choose Direct Lake on SQL endpoints when:

  • You need SQL views or transformations
  • Existing logic already exists in SQL
  • Teams are more comfortable with SQL than Spark
  • Some flexibility is preferred over maximum performance

Exam Tip 💡

If a question emphasizes:

  • Maximum performance, minimal latency, or scalability/large-scale analyticsDirect Lake on OneLake
  • SQL views, SQL transformations, or SQL reuseDirect Lake on SQL endpoints

Expect scenario-based questions where both options are technically valid, but only one best aligns with the business and performance requirements.


Practice Questions:

Here are 10 questions to test and help solidify your learning and knowledge. As you review these and other questions in your preparation, make sure to …

  • Identifying and understand why an option is correct (or incorrect) — not just which one
  • Look for and understand the usage scenario of keywords in exam questions to guide you
  • Expect scenario-based questions rather than direct definitions

Question 1

A company has Delta tables stored in OneLake and wants the lowest possible query latency for Power BI reports without using SQL views. Which option should they choose?

A. Import mode
B. DirectQuery on SQL endpoint
C. Direct Lake on SQL endpoint
D. Direct Lake on OneLake

Correct Answer: D

Explanation:
Direct Lake on OneLake reads Delta tables directly from OneLake without a SQL layer, delivering the best performance and lowest latency.


Question 2

Which requirement would most strongly favor Direct Lake on SQL endpoints over Direct Lake on OneLake?

A. Maximum performance
B. Real-time data visibility
C. Use of SQL views for business logic
D. Minimal infrastructure dependencies

Correct Answer: C

Explanation:
Direct Lake on SQL endpoints allows semantic models to consume SQL views and transformations, making it ideal when business logic is defined in SQL.


Question 3

What is a key architectural difference between Direct Lake on OneLake and Direct Lake on SQL endpoints?

A. Only OneLake supports Delta tables
B. SQL endpoints require data import
C. OneLake access bypasses the SQL engine
D. SQL endpoints cannot be used with semantic models

Correct Answer: C

Explanation:
Direct Lake on OneLake reads Delta files directly from storage, while SQL endpoints introduce an additional SQL query layer.


Question 4

A Fabric semantic model uses Direct Lake on OneLake. Under which condition might it fallback to DirectQuery?

A. The model contains calculated columns
B. The dataset exceeds 1 TB
C. The Delta table schema is unsupported
D. The SQL endpoint is unavailable

Correct Answer: C

Explanation:
If the Delta table schema or data types are not supported by Direct Lake, Fabric automatically falls back to DirectQuery.


Question 5

Which scenario is best suited for Direct Lake on SQL endpoints?

A. High-volume streaming telemetry
B. SQL-first team reusing existing warehouse views
C. Near-real-time dashboards on raw lake data
D. Large fact tables optimized for scan performance

Correct Answer: B

Explanation:
Direct Lake on SQL endpoints is ideal when teams rely on SQL views and want to reuse existing SQL logic.


Question 6

Which statement about performance is most accurate?

A. SQL endpoints always outperform OneLake
B. OneLake always requires Import mode
C. Direct Lake on OneLake typically offers better performance
D. Direct Lake on SQL endpoints does not use Direct Lake

Correct Answer: C

Explanation:
Direct Lake on OneLake avoids the SQL layer, resulting in faster query execution in most scenarios.


Question 7

A Power BI model must reflect new data immediately after ingestion into OneLake. Which option best supports this requirement?

A. Import mode
B. DirectQuery
C. Direct Lake on SQL endpoint
D. Direct Lake on OneLake

Correct Answer: D

Explanation:
Direct Lake on OneLake reads data directly from Delta tables and reflects changes immediately without refresh.


Question 8

Which dependency exists when using Direct Lake on SQL endpoints that does not exist with Direct Lake on OneLake?

A. Delta Lake support
B. VertiPaq compression
C. SQL analytics endpoint availability
D. Semantic model compatibility

Correct Answer: C

Explanation:
Direct Lake on SQL endpoints depends on the SQL analytics endpoint being available, while OneLake access does not.


Question 9

From a DP-600 exam perspective, which factor most often determines the correct choice between these two options?

A. Dataset size alone
B. Whether SQL transformations are required
C. Number of report users
D. Power BI license type

Correct Answer: B

Explanation:
Exam questions typically focus on whether SQL logic (views, joins, transformations) is needed, which drives the choice.


Question 10

You are designing an enterprise semantic model focused on scalability and minimal complexity. The data is already curated as Delta tables. What is the best choice?

A. Import mode
B. DirectQuery on SQL endpoint
C. Direct Lake on SQL endpoint
D. Direct Lake on OneLake

Correct Answer: D

Explanation:
Direct Lake on OneLake offers the simplest architecture with the highest scalability and performance when Delta tables are already prepared.


Create and configure deployment pipelines

This post is a part of the DP-600: Implementing Analytics Solutions Using Microsoft Fabric Exam Prep Hub; and this topic falls under these sections: 
Maintain a data analytics solution
--> Maintain the analytics development lifecycle
--> Create and configure deployment pipelines

Development pipelines in Microsoft Fabric provide a structured, governed way to promote analytics content across environments—typically Development, Test, and Production. They are a core lifecycle management feature that helps teams deploy changes safely, consistently, and with minimal risk. For the DP-600 exam, you should understand what development pipelines are, how they are configured, what they support, and how they differ from Git-based version control.

What Are Development Pipelines?

A development pipeline is a Fabric feature that:

  • Connects multiple workspaces into an ordered promotion flow
  • Enables controlled deployment of items between environments
  • Supports validation and testing before production release

Pipelines are especially important for enterprise-scale analytics solutions.

Typical Pipeline Structure

A standard Fabric pipeline consists of three stages:

  1. Development
    • Active development
    • Frequent changes
    • Used by engineers and analysts
  2. Test
    • Validation and user acceptance testing
    • Data and logic verification
    • Limited access
  3. Production
    • Certified, trusted content
    • Broad consumer access
    • Minimal direct changes

Each stage is linked to a separate Fabric workspace.

Creating a Development Pipeline

At a high level, the process is:

  1. Create a deployment pipeline in Microsoft Fabric
  2. Assign a workspace to each stage:
    • Dev workspace
    • Test workspace
    • Prod workspace
  3. Configure pipeline settings
  4. Control who can deploy between stages

Once created, the pipeline provides a visual interface showing item differences across stages.

What Items Can Be Deployed Through Pipelines?

Development pipelines support deployment of many Fabric items, including:

  • Semantic models
  • Reports and dashboards
  • Dataflows Gen2
  • Lakehouses and Warehouses (supported scenarios)
  • Other supported analytics artifacts

Exam note:
Not every Fabric item supports pipeline deployment equally—expect questions to focus on Power BI and core analytics items.

How Deployment Works

Comparing Changes

  • Pipelines show differences between stages
  • You can review what will change before deploying

Deploying Content

  • Deploy from Dev → Test
  • Validate
  • Deploy from Test → Prod

Deployments:

  • Copy item definitions
  • Can update existing items or create new ones
  • Do not automatically move workspace permissions

Deployment Rules and Parameters

Pipelines support deployment rules, such as:

  • Changing data source connections per environment
  • Switching parameters between Dev, Test, and Prod
  • Avoiding hard-coded environment values

This is critical for:

  • Separating development and production data
  • Supporting safe testing

Pipelines vs Git Integration (Exam Comparison)

This distinction is frequently tested.

FeatureDevelopment PipelinesGit Integration
PurposeEnvironment promotionSource control
FocusDeploymentVersioning
Tracks historyNoYes
Supports branchingNoYes
Typical useDev → Test → ProdCode collaboration

Key insight:
They are complementary, not competing features.

Permissions and Governance

To use pipelines:

  • Users need appropriate pipeline permissions
  • Workspace access is still required
  • Production deployments are often restricted to a small group

Pipelines support governance by:

  • Reducing direct changes in production
  • Enforcing controlled release processes
  • Improving auditability

Common Exam Scenarios

You may be asked to:

  • Choose pipelines for controlled promotion of reports
  • Identify when pipelines are preferable to manual publishing
  • Combine pipelines with Git and PBIP
  • Configure different data sources per environment
  • Prevent accidental production changes

Example:

A report must be tested before being released to executives.
Correct concept: Use a development pipeline with Dev, Test, and Prod stages.

Best Practices to Remember

  • Use separate workspaces per environment
  • Restrict production deployment permissions
  • Combine pipelines with:
    • PBIP projects
    • Git integration
    • Endorsements and certification
  • Avoid direct editing in production

Key Exam Takeaways

  • Development pipelines manage content promotion across environments
  • They connect multiple Fabric workspaces
  • Pipelines support comparison, validation, and controlled deployment
  • They do not replace Git-based version control
  • A core feature of the Fabric analytics lifecycle

Exam Tips

  • If a question focuses on moving content safely from development to production, the correct answer is development pipelines.
  • If it focuses on tracking changes or collaboration, the answer is Git or PBIP.
  • Know how pipelines support:
    • Dev/Test/Prod lifecycle
    • Governance & change control
    • Environment-specific configuration
    • Enterprise-scale BI practices
  • Common exam traps:
    • Confusing workspace roles with deploy permissions
    • Assuming pipelines manage security or performance
    • Forgetting deployment rules

Practice Questions

Question 1 (Single choice)

What is the PRIMARY purpose of a deployment pipeline in Microsoft Fabric?

A. Schedule dataset refreshes
B. Promote content across lifecycle environments
C. Enable row-level security
D. Optimize DAX performance

Correct Answer: B

Explanation:
Deployment pipelines are designed to promote content across environments (for example, Development → Test → Production) in a controlled and governed manner.

  • ❌ A: Refresh scheduling is handled separately
  • ❌ C: Security is not the primary purpose
  • ❌ D: Performance tuning is unrelated

Question 2 (Multi-select)

Which stages are available by default in a Fabric deployment pipeline? (Select all that apply.)

A. Development
B. Test
C. Production
D. Sandbox

Correct Answers: A, B, C

Explanation:
Fabric deployment pipelines use a three-stage lifecycle:

  • Development
  • Test
  • Production

There is no default Sandbox stage.


Question 3 (Scenario-based)

A team wants analysts to freely modify reports, while only approved changes reach production. Which pipeline stage should analysts primarily work in?

A. Production
B. Test
C. Development
D. Any stage

Correct Answer: C

Explanation:
The Development stage is intended for:

  • Frequent changes
  • Experimentation
  • Initial validation

Higher stages are more controlled.


Question 4 (Single choice)

Which permission is required to deploy content from one stage to the next in a deployment pipeline?

A. Viewer
B. Contributor
C. Admin
D. Pipeline deploy permission

Correct Answer: D

Explanation:
Deploying content requires explicit pipeline deployment permissions, not just workspace roles.

  • ❌ Admin alone is not sufficient
  • ❌ Contributor may edit but not deploy

Question 5 (Scenario-based)

You deploy a semantic model from Test to Production. What happens to data source connections by default?

A. They are deleted
B. They remain unchanged
C. They can be overridden per stage
D. They must be manually reconfigured

Correct Answer: C

Explanation:
Deployment pipelines support parameter and data source rules, allowing environment-specific connections.


Question 6 (Multi-select)

Which items can be deployed using deployment pipelines? (Select all that apply.)

A. Reports
B. Semantic models
C. Dashboards
D. Notebooks

Correct Answers: A, B, C

Explanation:
Deployment pipelines support Power BI artifacts, including:

  • Reports
  • Semantic models
  • Dashboards

❌ Notebooks are Fabric artifacts but are not deployed via Power BI deployment pipelines.


Question 7 (Scenario-based)

A deployment shows warnings that some items are skipped. What is the MOST likely cause?

A. The workspace is full
B. Unsupported artifacts exist
C. The dataset is too large
D. Git integration is disabled

Correct Answer: B

Explanation:
Unsupported or incompatible artifacts (for example, unsupported report types) may be skipped during deployment.


Question 8 (Single choice)

Which feature allows different environments to use different data sources during deployment?

A. Row-level security
B. Dynamic format strings
C. Deployment rules
D. Incremental refresh

Correct Answer: C

Explanation:
Deployment rules allow:

  • Data source switching
  • Parameter overrides
  • Environment-specific configuration

Question 9 (Scenario-based)

You want production users to access only certified content. How do deployment pipelines help?

A. By enforcing sensitivity labels
B. By promoting tested content only
C. By encrypting production reports
D. By disabling edit access

Correct Answer: B

Explanation:
Deployment pipelines ensure:

  • Content is validated in Test
  • Only approved changes reach Production

They support trust and governance, not encryption or labeling.


Question 10 (Multi-select)

Which best practices apply when configuring deployment pipelines? (Select all that apply.)

A. Restrict deploy permissions
B. Use separate data sources per stage
C. Allow all users to deploy to Production
D. Validate content in Test before Production

Correct Answers: A, B, D

Explanation:
Best practices include:

  • Limited deploy access
  • Environment-specific configurations
  • Mandatory testing before production

❌ Allowing everyone to deploy defeats governance.


Power BI Storage modes

Power BI allows us to connect to many different data sources – from relational databases, NoSQL databases, files, and more – to source data for consumption in Power BI. From the data sourced, you can create additional data (new calculated columns, metrics, transformed data, etc.), build data models, and create reports and dashboards.

There are a few storage modes related to how the data is retrieved, stored, and processed in Power BI. The storage modes are Import, DirectQuery, Live Connection, and Dual. The storage mode is set at the table level for each table in the Power BI data model. I will now describe these modes.

Import

With the Import storage mode, Power BI imports and caches the data from the sources. Once the data import is complete, the data in Power BI will remain the same until is refreshed by the Power BI refresh process for that dataset.

This storage mode allows for the usage of the most Power BI features for data modeling and analysis. For example, Import mode is required for using two of the popular Power BI features, Quick Insights and Q&A. Also, this mode is almost always the best for performance. However, it’s not necessarily the best option in all scenarios. Since the data is imported, the file size can get large and can sometimes take a considerable amount of time to load. But generally, for relatively static, low volume data, it is the preferred choice.

Queries submitted to an imported dataset will return data from the cached data only.

DirectQuery

With the DirectQuery storage mode, no data is cached in Power BI, but the metadata of the source tables, columns, data types, and relationships is cached. Instead, the data is directly queries on the source database when needed by a Power BI query, such as when a user runs a Power BI report that uses the data.

For Since the data is not imported, if all the tables in the data model use DirectQuery, the Power BI file size will be very small compared to a model with imported data.

Live Connection

The Live Connection storage mode is a special case of the DirectQuery mode. It is only available when sourcing from Power BI Service datasets or Analysis Services data models. There are limitations when using this mode. Data modeling is limited to creating measures, and therefore, you cannot apply transformations to the data, and you cannot define relationships within the data. And you can only have one data source in your data model.

Dual

With the Dual storage mode, a table may use Import mode or DirectQuery mode, depending on the mode of the other tables included in the query. For example, you may have a scenario in which you have a Date table that is connected to one transaction table that needs to reflect the data in the source, and is therefore set to DirectQuery mode, and also connected to another transaction table that only has less than 100,000 rows and is set to Import storage mode. By setting the Date table to Dual storage mode, Power BI will use DirectQuery when the query involves the date table and the first transaction table, while using Import mode when the query involves the date table and the second transaction table.

The below table summarizes the Power BI data storage modes:

ImportDirectQueryLive ConnectionDual
-Data is imported and cached in Power BI

-Preferred for static, relatively small datasets

-All Power BI functionality is available – including DAX, Calculated tables, Q&A and Quick Insights

-Can connect to Analysis Services but Live Connection is preferred

-Can have unlimited data sources

-Typically provides the best performance







-Data is queried on the source when needed

-Use for large datasets and when data changes in source need to be updated immediately

-Features such as Q&A, Quick Insights, Calculated Tables, and many DAX queries are not supported

-Limited data transformation functionality

-Parent-child functionality not supported

-For relational databases

-Not supported for Analysis Services

-Performance greatly dependent on the source data source
-A special case of DirectQuery

-Used for connecting to multi-dimensional data sources, such as Analysis Services

-Can be used only with Power BI datasets and Analysis Services

-Can have only one data source

-No data transformation available

-Q&A and Quick Insights not available

-Can create measures



-A combination of Import and DirectQuery

-Power BI will choose the appropriate option based on the storage mode of the tables involved in the query

-Can improve performance



















Summary of Power BI storage modes

 Note: the content in this post is relevant for the PL-300 Analyzing Data with Microsoft Power BI certification exam.

Thanks for reading! I hope you found this information useful.

Good luck on your analytics journey!

BI Application getting ORA-02391 error

Last week we rolled out a new dashboard that uses a new data source.
In one of our BI environments, the application was throwing an error:
“ORA-02391: exceeded simultaneous SESSIONS_PER_USER limit at OCI call OCISessionBegin

This is an Oracle Database error, and not an error directly from the BI Application.

For the “ORA-02391: exceeded simultaneous SESSIONS_PER_USER limit” error …
The Cause is:   An attempt was made to exceed the maximum number of concurrent sessions allowed by the SESSIONS_PER_USER clause of the user profile.
And the Action for resolution is:   End one or more concurrent sessions or ask the database administrator to increase the SESSIONS_PER_USER limit of the user profile.

Turns out the SESSIONS_PER_USER parameter was set too low; it was set to 3 for the user being used to access the database from the BI application. This error could have also been observed from an ETL tool accessing the database with an ID with the same parameter setting.

One of the DBAs bumped this parameter up to 30 for the user, and that resolved the issue.
We requested for this change to be done on the BI application databases in all the environments – Development, Test, QA, and Production.

Although all seems to be well, we will now monitor to see how many sessions the application is using and if there is any negative impact on the source application. This will allow us to determine if we need to make any other adjustments.

Thanks for reading. I hope you found this information useful.

Quality Assurance (QA) for Data Projects or Data Applications

This post discusses Quality Assurance (QA) activities for data projects.

What is Quality Assurance (QA)?  Simply put, Quality Assurance, also called QA, Testing or Validation, is about testing an application or solution to ensure that all the stated/promised/expected requirements are met. It is a critically important activity for all software application development or implementations. Data applications are no different. They need to be tested to ensure they work as intended.

QA stands between development and deployment. And QA makes the difference between a delivered product and a high quality delivered product.

There are a number of things to keep in mind when you plan your Quality Assurance activities for data solutions. I present some of them in this post as suggestions, considerations, or prompting questions. The things mentioned here will not apply to all data applications but can be used as a guide or a check.

People / Teams

The number of people and teams involved in a project will vary depending on the size, scope and complexity of the project.

The technical team building the application needs to perform an initial level of validation of the solution.

If there is a Quality Assurance team that performs the validation tasks, then that team will need to perform the “official” validation.

The business analysts and end-users of the application also need to validate. Where possible, work with as many end users as efficiently possible. The more real users you have testing the application, the better the chances of finding issues early.

Where it makes sense, Test IDs that simulate various types of users or groups should be used to help test various usage and security scenarios. This is particularly useful in automated testing.

On large projects where there is a lot to be tested, it is best to break up the testing across multiple people or teams. This will help to prevent testing fatigue and sloppy testing and result in higher quality testing.

Plan ahead to ensure that access for all the relevant users is set up in the testing environments.

Communication

With all the teams and people involved, it is important to have a plan for how they will communicate. Things to consider and have a plan for include:

  • How will teams communicate within? Email, Microsoft Teams, SharePoint, Shared Files, are some options.
  • How will the various teams involved communicate with each other? In other words, how will cross-team communication be handled? As above, Email, Microsoft Teams, SharePoint, Shared Files, are some options.
  • How will issues and status be communicated? Weekly meetings, Status emails or documents, Shared files available on shared spaces are options.
  • How will changes and resolutions be tracked? Files, SDLC applications, Change Management applications are options.
  • How will teams and individuals be notified when they need to perform a task? Manual communication or automated notifications from tools are options.

Data

The most important thing to ensure in data projects is that the data is high quality, particularly the “base” data set. If the base data is incorrect, everything built on top of it will be bad. Of course, the correctness of intermediate and user-facing data is also just as important, but the validation of the base data is critical to achieving the correct data all over.

  • Ensure that table counts, field counts and row counts of key data are correct.
  • Does the data warehouse data match the source data?
  • Test detailed, low level records with small samples of data
  • Test to ensure that the data and the values conform to what is expected. For example, ensuring that there is no data older than 3 years old, or ensuring that there are no account values outside a certain range. The Data Governance Team may become involved in these activities across all projects.

Next in line is the “intermediate” data such as derived metrics, aggregates, specialized subsets, and more. These will also need to be verified.

  • Are the calculated values correct?
  • Are the aggregates correct? Test aggregate data with small, medium and large sets of data
  • Verify metric calculations

Then the user-facing data or data prepared for self-service usage needs to be validated.

  • Does the data on the dashboard match the data in the database?
  • Are the KPIs correctly reflecting the status?

Test the full flow of the data. The validity of the data should be verified at each stage of the data flow – from the source, to the staging, to the final tables in the data warehouse, to aggregates or subsets, to the dashboard.

Take snapshots of key datasets or reports so you can compare results post data migration.

Some additional data prep might be needed in some cases.

  • These include making sure that you have sourced adequate data for testing. For example, if you need to test an annual trend, then it might be best to have at least a year’s worth of data, preferably two.
  • You may need to scramble or redact some data for testing. Often Test data is taken from the Production environment and then scrambled and/or redacted in order to not expose sensitive information.
  • You may need to temporarily load in data for testing. For various reasons, you may need to load some Production data into the QA environment just to test the solution or a particular feature and then remove the data after the testing is complete. While this can be time consuming, sometimes it’s necessary, and it’s good to be aware of the need early and make plans accordingly.

Aesthetics & Representation of Data

Presentation matters. Although the most critical thing is data correctness, how the data is presented is also very important. Good presentation helps with understanding, usability, and adoption. A few things to consider include:

  • Does the application, such as dashboard, look good?  Does it look right? 
  • Are the components laid out properly so that there is no overcrowding?
  • Are the logos, colors and fonts in line with company expectations?
  • Are proper chart options used to display the various types of data and metrics?
  • Is the information provided in a way that users can digest?

Usage

The data application or solution should be user friendly, preferably intuitive or at least have good documentation. The data must be useful to the intended audience, in that, it should help them to understand the information and make good decisions or take sensible actions based on it.

The application should present data in a manner that is effective – easy to access, and easy to understand.

The presentation should satisfy the analytic workflows of the various users. Users should be able to logically step through the application to find information at the appropriate level of detail that they need based on their role.

A few things that affect usability include:

  • Prompts – ensure that all the proper prompts or selections are available to users to slice and filter the data as necessary. And of course, verify that they work.
  • Drill downs and drill throughs – validate that users can drill-down and across data to find the information they need in a simple, logical manner.
  • Easy interrogation of the data – if the application is ad-hoc in nature, validate that users can navigate it or at least verify that the documentation is comprehensive enough for users to follow.

Security

Securing the application and its data so that only authorized users have access to it is critical.

Application security comprises of “authentication”– access to the application, and “authorization” – what a user is authorized to do when he or she accesses the application.

Authorization (what a user is authorized to do within the application) can be broken into “object security” – what objects or features a user has access to, and “data security” – what data elements a user has access to within the various objects or features.

For example, a user has access to an application (authenticated / can log in), and within the application the user has access to (authorized to see and use) 3 of 10 reports (object-level security). The user is not authorized to see the other 7 reports (object-level security) and, therefore, will not have access to them. Now, within the 3 reports that the user has access to, he or she can only see data related to 1 of 5 departments (data-level security).

All object-level and data-level security needs to be validated. This includes negative testing. Not only test to make sure that users have the access they need, but testing should also ensure that users do not have access that they should not have.

  • Data for testing should be scrambled or redacted as appropriate to protect it.
  • Some extremely sensitive data may need to be filtered out entirely.
  • Can all the appropriate users access the application?
  • Are non-authorized users blocked from accessing the application?
  • Can user see the data they should be able to see to perform their jobs?

Performance

Performance of the data solution is important to user efficiency and user adoption. If users cannot get the results they need in a timely manner, they will look elsewhere to get what they need. Even if they have no choice, a poorly performing application will result in wasted time and dollars.

A few things to consider for ensuring quality around performance:

  • Application usage – is the performance acceptable? Do the results get returned in an acceptable time?
  • Data Integration – is the load performance acceptable?
  • Data processing – can the application perform all the processing it needs to do in a reasonable amount of time?
  • Stress Testing – how is performance with many users? How is it with a lot data?
  • How is performance with various selections or with no selections at all?
  • Is ad-hoc usage setup to be flexible but avoid rogue analyses that may cripple the system?
  • Is real-time analysis needed and is the application quick enough?

These items need to be validated and any issues need to be reported to the appropriate teams for performance tuning before the application is released for general usage.

Methodology

Each organization, and even each team within an organization, will have a preferred methodology for application development and change management, including how they perform QA activities.

Some things to consider include:

  • Get QA resources involved in projects early so that they gain an early understanding of the requirements and the solutions to assess and plan how best to test.
  • When appropriate, do not wait until all testing is complete before notifying development teams of issue discovered. By notifying them early, this could make the difference between your project being on-time or late.
  • Create a test plan and test scripts – even if they are high-level.
  • Where possible, execute tasks in an agile, iterative manner.
  • Each environment will have unique rules and guidelines that need to be validated. For example, your application may have a special naming convention, color & font guidelines, special metadata items, and more. You need to validate that these rules and guidelines are followed.
  • Use a checklist to ensure that you validate with consistency from deliverable to deliverable
  • When the solution being developed is replacing an existing system or dataset, use the new and old solutions in parallel to validate the new against the old.
  • Document test results. All testing participants should document what has been tested and the results. This may be as simple as a checkmark or a “Done” status, but may also include things like data entered, screenshots, results, errors, and more.
  • Update the appropriate tracking tools (such as your SDLC or Change Management tools) to document changes and validation. These tools will vary from company to company, but it is best to have a trail of the development, testing, and release to production.
  • For each company and application, there will a specific, unique set of things that will need to be done. It is best if you have a standard test plan or test checklist to help you confirm that you have tested all important aspects and scenarios of the application.

This is not an all-encompassing coverage of Quality Assurance for data solutions, but I hope the article gives you enough information to get started or tips for improving what you currently have in place. You can share your questions, thoughts and input via comments to this post. Thanks for reading!

Oracle Business Intelligence (OBIEE) Interview Questions and Answers – Set 1

These are a set of questions and answers to help you prepare for interviews for roles involving Oracle Business Intelligence (OBIEE).  I recommend that you do not simply try to memorize these questions and answers, but use them as a guide or to help you determine what you need to work on more to improve your knowledge and skills.
——————–
IMPORTANT DISCLAIMER: I cannot guarantee the correctness of any of these answers, and anyone using them should verify their correctness using other sources.
——————–

Q1. The business users mention that a particular report is not returning the correct results. How would you go about identifying if there is an issue and what the issue is?

A1. The answers to this question could vary widely because there are a few options of what you may do first, second, etc.

I would first determine why the users think the results are wrong.  Compare their expected results with the report results to determine what data values are being dropped or added. This may require a detailed-data to detailed-data comparison.

Next, I would determine when the “wrong results” started up.  Based on that, I would check if anything changed within the timeframe that could have affected this. If anything changed, you or another team member can investigate the details of the change.

Next, I would try to determine if the data is correct by comparing the source system data with the data from the report source, such as the data warehouse.  If the data in the data warehouse is correct, then it would indicate that something might be wrong with the report.  if the data in the data warehouse is not correct, then that indicates there might be a problem with the ETL process or logic.  Check filters, aggregation logic, selection steps, and more in the area that needs further examination (whether the analysis or the ETL).

If necessary, I would get the SQL generated by OBI for the analysis via the session logs, and run that SQL directly on the database, removing or adding to the SQL as necessary to investigate various scenarios with the data.

This could be one of the first things that you do, but if I had not found the issue as yet because everything looks good so far, clear the cache and see if that resolves the issue.

 

Q2. Can you create an analysis from multiple subject areas? And if yes, how would you go about doing it?

A2. Yes, you can create an analysis/report using multiple OBIEE Subject Areas.  First create an analysis as normal, by selecting the first subject area and then selecting the desired columns, and performing any desired calculations, formatting, or other manipulations on those columns.  Then, from the Subject Areas pane, click the Add Subject Area icon (cube with a plus sign) and select the second subject area, from which you will then select the desired columns.  You will need to union or join the data from these subject areas.

 

Q3. What is the purpose of the OBIEE RPD?

A3. The OBIEE RPD (Repository) is a metadata layer between the data sources (such as a relational databases or files) and the OBIEE front-end that is accessed through a web browser, which includes the Dashboard & Analysis Editor used by report developers or analyst, along with the published dashboards & analyses (reports) that the users see.  The RPD allows developers to create a business representation of the data, and create a business friendly view of the model, including renaming of columns to business friendly vocabulary, creating new data elements (such as metrics) from calculations and manipulations, defining hierarchies useful to business processes, and more.  This allows report developers and power users and analysts to be able to drag and drop columns to create analyses (reports).

 

Q4. Name and describe the various layers of the OBIEE Repository (RPD).

A4. There are 3 layers in the OBIEE Repository (RPD).  The Physical layer, the Business Model and Mapping layer, and the Presentation layer.

The Physical layer is where you define the data sources, including connection details, that you will use to source data for your OBIEE environment.  In this layer, you will import or define your table metadata, create aliases (a recommended practice), define the joins between tables (typically using the alias tables), create opaque views (“select” tables), and set caching options.

The Business Model and Mapping layer, referred to as BMM or logical layer, is where you will define the business model of the data from the physical model.  The business model is geared toward providing specific information needed for your specific business scenario.  The business model typically simplifies the representation from the physical model to form a more business friendly view of the data.

The BMM layer is where you will rename objects to more business friendly names, create business metrics from the data, create hierarchies useful for various business processes, define logical tables and columns and joins,

The Presentation layer is where you define the view seen by users in the front end reporting and analysis tools, such as, OBIEE Answers.  This layer allows you to structure/organize/label all data elements from the BMM layer into an easily understood, business friendly model – further simplifying the BMM model and making it more business friendly – that facilitates drag and drop usage for end users.

 

Q5. What are some of the types of analysis views that are available in OBIEE?

A5. Some of the types of analysis views available in OBIEE are:  table (straight table), pivot table, graph, funnel, gauge, trellis, filters, column selector, view selector, narrative, ticker, and static text.

 

Q6. What are some of the graph types available in OBIEE?

A6. Some of the types of graphs available in OBIEE are: bar (vertical, horizontal, and stacked); line; line-bar; area, pie; pareto; scatter; bubble; radar

 

Q7. Describe the steps for creating an analysis?

A7. Understand the requirement. Confirm that the data elements are available.  From the menu, New -> Analysis.  Select the appropriate subject area.  Find the columns that you need.  Bring them into the report.  Perform calculations and other data manipulations as necessary on one or more columns.  Rename and format columns as necessary.  Create the data views that provide the best representation of the data and/or that meets users’ requirements.  Verify the results by testing various scenarios – such as different time frames, different data elements, testing with prompt selections, and all the elements that need to validated to confirm you are meeting the users’ requirements.

 

Q8. What are the different types of variables in OBIEE?

A8.  There are two types of variables available in OBIEE and they are: (1) repository variables and (2) session variables.

Repository variables can have only a single value at any point in time, and are system-wide (repository-wide), hence the name Repository variable.

Repository variables can be used in ways similar to how you would use a constant or literal value in expressions in the RPD or in an analysis.

Repository variables have two sub-types: (i) static and (ii) dynamic

A static repository variable has a fixed value that is defined in the variable definition in the RPD (OBIEE repository), and stays that way until changed by a developer/administrator.

A dynamic repository variable (as the name implies) changes (is refreshed) based on the results returned from Initialization Block SQL queries that run on a defined schedule.

Session variables can contain more than one value and are created and assigned a value “for each session” when each user logs on, hence the name session variable.  Each user’s session variable may be different depending on the logic used to generate the value for the variable.

Session variables have two sub-types: (i) system and (ii) non-system

System session variables are special variables used by OBIEE for specific “system” purposes and the same variable names cannot be used for other variables. An often used system session variable is “USER” that gets set to the value of the current logged in user’s ID.

Non-system session variables are custom defined variables, typically set by an initialization block.  An often used non-system session variable scenario is one in which the variable values for each user is used in data filters to implement dynamic data-level that changes for each user.

 

Q9. What is an Initialization Block?

A9. An Initialization Block (Init Block) is an object defined with a “block” of SQL that is executed to “initialize” a variable specified in the Initialization Block’s definition. Init Blocks are used to initialize dynamic repository variables, system session variables, and non-system session variables.

 

Q10. How do you refresh the cache in OBIEE?

A10. One of the quickest ways is to run the “call SAPurgeAllCache();” statement in the Administration -> Issue SQL window.

You can get more details here … https://thedatacommunity.org/2013/10/11/how-to-clear-the-bi-server-cache-using-command-line-script-or-via-the-issue-sql-page/

 

Q11. How do you create navigation from one report to another based on the user clicking on a data value in the first report?

A11. You would create an Action Link on the navigate-from column (in the Interaction tab of the column properties) in your first report. In the Action Link, set the appropriate action, such as “Navigate to BI Content”, to specify the second report that you need to navigate to.

 

Q12. Describe the steps involved in building an OBIEE repository (RPD).

A12. The steps involved in building an OBIEE RPD can be separated into 3 sets of steps: (1) Build the Physical Layer, (2) Build the Business Model and Mapping (BMM) Layer, (3) Build the Presentation Layer

(1) Build the Physical Layer

  • Create the repository
  • Import metadata
  • Create aliases
  • Create physical keys and joins between the appropriate tables

(2) Build the BMM Layer with objects from the Physical Layer

  • Review and adjust (if necessary) the Logical Joins
  • Rename logical columns
  • Add logical table sources (as necessary)
  • Create derived columns
  • Create metrics
  • Remove unneeded logical objects
  • Create hierarchies

(3) Build the Presentation Layer

  • Create a Subject Area
  • Create or drag over Presentation tables
  • Create Presentation columns
  • Rename Presentation columns
  • Rearrange/organize Presentation columns into a user friendly view

Then, upload and test the RPD using analyses created in Answers.

 

Q13. Why is it recommended that you use Alias Tables in OBIEE?

A13.  Alias tables are defined in the Physical Layer of the RPD.  They are used to create a version of a physical table with a different user determined name, therefore allowing for the re-use of tables for multiple joins/data sets within the physical layer.  Another benefit of aliases is if there is a change to the physical table, in some cases those changes can be isolated by, for example, mapping the new columns in the physical table to existing columns in the alias, and preventing the need for other changes to the data model and in the various layers of the RPD.

 

Q14. How would you go about resolving performance issues with a specific report in OBIEE?

A14. Run the report through the dashboard.  Capture the SQL associated with the report.  Run that SQL directly on the database (using a tool such as SQL Developer or Toad) to see if it is performing poorly there also.  If it is, then we can deduct that the issue is on the database side or the report needs to be changed enough to make it generate a different SQL.  If it runs fine directly on the database, then the issue is somewhere else along the stack.

Taking the first scenario – runs poorly directly on database – review the SQL or run an explain plan on the SQL and determine what changes can be made to improve it.  This may involve adding indexes to tables on columns used in joins and in filtering criteria; reducing records in tables as appropriate before joining; removing unnecessary joins; changing the data model of the tables used, such as creating star schemas or creating aggregate tables. If necessary, work with a DBA to get help.

Taking the second scenario – runs fine directly on the database – review the analysis to determine what type of views are being used and determine by elimination if any of them are causing an issue; play around by removing columns and re-running to determine if any specific columns or calculations are causing an issue; check the logs to see if there any relevant messages to your scenario and adjust configuration parameters accordingly and re-run to determine if any effect.

 

Q15. What would you do if you are unable to figure out an OBIEE issue?

A15. There could be several reasonable answers to this question. A few good responses include … ask a co-worker, use a search engine (google/bing/etc) to try to find a solution, clear the cache, restart all processes at an appropriate time, search Oracle’s support site, create a Service Request (SR) with Oracle Support, post a description of your issue to relevant online groups/communities and ask for help, (when appropriate) meet with others in your environment to try to determine what has changed that you are unaware of that may have caused the issue. There could be many other valid responses.

 

Q16. What are some recent OBI dashboards that you have created?  -OR- Please describe some recent OBI projects that you have worked on.

A16. There are many ways to answer these open ended questions, but a few things I would suggest are:

  • describe the project
  • describe your role in the project
  • (where applicable) briefly describe your development process/methodology
  • (where applicable) describe how you worked with the business users to determine or review the requirements, perform training, perform validation, resolve issues, etc.
  • describe how you sourced the data (source systems)
  • describe how you designed and/or developed the solution (include some details without being too long, such as explaining what areas you designed/developed – data model, and/or RPD and reports, or just RPD, or just reports, etc.)
  • describe any challenges you ran into, and how you/team resolved
  • describe how you may have assisted others or worked with others or trained others
  • as you describe all the above, make sure it demonstrates what you brought to the project
  • And then finally, share the end result – for example, share if the users loved the solution and the kind of feedback that made you know that, what it helped them to do, if it saved them a lot of time, if this led to increased application usage, etc.

 

Q17. How do you move/migrate an OBIEE solution from one environment to another, such as, from your DEV to TST environment?

A17. The answer to this question could vary a bit, but may include things such as:

  • Use the same scripts from DEV to create any new database objects in TST.
  • Use Archive/Unarchive to move OBIEE catalog objects by Archiving the objects from DEV and unarchiving them into TST  -OR-  Use the Catalog Manager tool to move the catalog objects from DEV to TST.
  • Take the RPD from DEV and upload and activate it in TST  -OR-  merge the approved RPD changes from DEV into the TST RPD
  • Apply the appropriate security permissions to the objects in TST.
  • If there was a new ETL process involved in the solution, ensure that the ETL objects are also migrated to the ETL TST environment.
  • Restart the TST environment servers
  • Validate that everything is good, and if not, resolve by migrating anything that’s missing

 

Q18. How do you implement data-level security in OBIEE?

A18. First, determine how each user’s data-level access will be identified, that is, determine what table will house the data that specifies the access that each user  has to the data.  For example, if the data is to be secured by department, the table would contains records of each user and the department(s) that they have access to.

Then, create an Initialization Block that selects the departments for each user and assigns them to a session variable (DEPT_VAR).

Next, identify the appropriate roles for which the data-level security rules need to be applied, and set the filters (table.department = ‘DEPT_VAR’) on the appropriate data sets using the above variable.

Test the solution.

You can get more details here … https://thedatacommunity.org/2017/08/10/implementing-data-level-security-in-oracle-bi-obiee/

 

Q19. What is an Agent?  And when would you use it?

A19. An Agent (formerly called iBot in OBIEE 10g) is a scheduled or conditionally triggered process that runs and executes a specified report (analysis) based on hitting the schedule or condition.  Once the Agent runs, the analysis results can be sent to a user via email (attached or embedded), or to the dashboards in the form of an alert that the user will see when he/she accesses the dashboards. So, agents can be used to provide analyses’ results to specified users on some specified schedule or condition without any manual intervention.  Another use of Agents is, the can be used to seed the OBI cache over night after the nightly ETL has completed, to make the reports faster for the first set of users in the morning.

 

Q20. What are some functions that you have used in OBIEE Answers to manipulate column data?

A20. There could be wide range of answers here, but some of the commonly used functions include:

  • Aggregate functions, such as, MIN, MAX, SUM, AVG, COUNT, TopN
  • String functions, such as, CONCAT, LEFT, RIGHT, REPLACE, SUBSTRING, TRIMBOTH, UPPER
  • Mathematical functions, such as, ROUND, FLOOR, TRUNCATE, ABS
  • Datetime functions, such as, CURRENT_DATE, TimeStampAdd, TimeStampDiff, Year, Month, Now
  • Conversion functions, such as, CAST, IfNULL, CASE

However, your response should include the functions you have used, and be able to explain how you used them.

—————

Thanks for reading.  More sets will be available in the future. Good luck!

 

OBIA Financial Analytics – SIL_APTransactionFact_DiffManLoad performance issues

We are on Oracle Business Intelligence Applications (OBIA) 7.9.6.3 and had been experiencing performance issues with the SIL_APTransactionFact_DiffManLoad workflow/mapping. We tried a number of things but only had minimal improvements.   Eventually, I found a solution for the poor performance on Oracle Support.  This change resulted in a drastic improvement of this workflow.

The solution can be found on Oracle Support (http://support.oracle.com – Oracle Doc ID: 1446397.1), but for your convenience I have included the content below.  There are other mappings that have a similar problem.

————————————————————————

OBIA 7963: SIL_APTransactionFact_Diffmanload Mapping And Performance Issue (Doc ID 1446397.1)

In this Document
Symptoms
Cause
Solution

 Applies to:
Informatica OEM PowerCenter ETL Server – Version 7.9.6.3 [AN 1900] and later Information in this document applies to any platform.

Symptoms
The OBIEE application (7.9.6.3) ETL task “SIL_APTransactionFact_DiffManLoad” has run over 68 hours during full load execution.

Cause

  1. The size of these columns (DOC_HEADER_TEXT and LINE_ITEM_TEXT )  in DAC is 255 (except AP where its 1020 in DAC and Infa). But in Informatica the size for these two columns is 1020. Ideally it should be 255. This is a known performance issue.
  2. The cause of the problem has been identified in unpublished  Bug 12412793- PSR: B16 INCREMENTAL: SIL_GLREVENUEFACT,

Solution
Below are the steps you will follow to modify the size of the fields in the lookup.

  1. Take a backup of existing Lookups ( LKP_W_AP_XACT_F and LKP_W_AR_XACT_F ).
  2. Login to Informatica Designer >Transformations
  3. Open the lookup and modify the size of the fields. The port lengths for the DOC_HEADER_TEXT and LINE_ITEM_TEXT were changed to 255 .
  4. Save the changes
  5. Rerun the test and confirm the performance issue is resolved  and migrate the changes in PROD.

————————————————————————

OBIEE Performance Tuning

This post describes a few tips and things to keep in mind for OBIEE Performance Tuning.

Be Proactive when possible
The need to performance tune can be proactive (tune before a major issue arises) or reactive (tune after a problem is reported by users for example).  It is best to be proactive – so performance tuning should be built into your OBIEE maintenance schedule. For example, OBIEE’s Usage Tracking functionality should be used regularly to identify reports whose performance can be improved and then performance steps should be carried out on the worst performers.

Iterative Process – change one thing or set of things at a time
One of the first things to keep in mind is that performance tuning is an iterative process.  And there is typically no one silver bullet that will resolve all your performance problems.  You may need to analyze and make changes to multiple parts of the system, but you want to make the changes methodically.  It is best to change one parameter or setting at the same time (or one related set of parameters).  Adjust and test the settings for that one parameter/setting (or set of parameters) before moving on to another.  If you change too much at one time, you may have a difficulty determining what is helping from what is hurting your efforts.

Fix user complaints first, worst performers next, and then the next bad performers down the list
Another thing to keep in mind, tune what users are reporting first, then tune the worst problems second, then move on to the next.

Team Effort – problem could be anywhere along the technology stack
Performance problems could be anywhere along the technology stack:
• OBIEE
• Database
• Server
• Network
Due to that span of technology, performance tuning is a team effort.  OBIEE Admins and Developers, DBAs, and ETL Developers can all be key to solving performance issues.
Logs from all components may need to be reviewed depending on the scenario.

Try to isolate or narrow-down the source of the problem
For example, run the report SQL directly on the database and see if you have the same problem. If there is no issue when run directly on that the database, then you have eliminated the database as the problem.
Determine if other applications have been also been experiencing slowness which could indicate the possibility of a network problem.

If your users have reported an issue, then you need to get as much details as possible about the performance problems they are experiencing.  When did this start happening?  Is it just one report or many?  Is it localized to one business area or multiple?  Is it all the time or sometimes?  Knowing this will help you to know where to focus.

Other questions to ask as you try to identify the source of the problem include but not limited to:
Has anything changed?  If reports were running fine, but are now slow, the first thing to ask is …
When the issue start?  Determining exactly when it started might be helpful when correlating with other system or company activity)
What has changed recently?  Has there been any system changes, data changes, database updates, network changes, etc. (even if they seem unrelated)?  For example, rolling into a new calendar year will cause new “Year” value(s) to be included in the data and can impact performance if statistics are not gathered.
Is there a possibility that an index was dropped and not recreated as expected?

Use OBIEE’s Usage Tracking information to analyze specific reports, analyze long running reports, or frequently run reports.  You will want to capture and analyze the SQL from these reports to determine what can be done to improve their performance.

Database
DBAs can monitor the system in real-time, use various tools, or review logs for information that can be helpful in the tuning effort.  Tools such as Oracle Enterprise Manager (EM) or SQL Tuning Advisor can be used to identify, analyze and tune high-load SQL.
OBIEE Usage Tracking can also be used to identify high-load SQL.
Without getting into much detail, these are some database features that could be used to help improve performance:
• Gather Statistics
• Results Cache database feature
• Partitioning

Servers
The System Admins can monitor the server resources to determine if there is an issue there.
• Use fast disk for the OBIEE cache and/or temporary files.

 

OBIEE-specific performance tuning tips

• OBIEE Caching
Are the tables being used set to cacheable?
Is caching turned on at the application level?
You may consider seeding the cache daily.
CACHE Settings:
o MAX_ROWS_PER_CACHE_ENTRY
o MAX_CACHE_ENTRY_SIZE
o MAX_CACHE_ENTRIES
o ——————-
o USE_ADVANCED_HIT_DETECTION

• Use Aggregation: Aggregate data when applicable
o You can use Aggregate tables or materialized views to realize this benefit.
o Aggregate Fact tables and corresponding Aggregate Dimensions.
o Make sure aggregation rules are applied to Fact table measures.
o Don’t necessarily merge all measures into a single fact.

• Joins and Indexes
o Do not create unnecessary joins.
o Verify that the joins on the tables being investigated are appropriate.
o Performance Indexing could be helpful.  Again, this is an iterative process.

• Prompts and Filters
o Use LOV tables to drive prompt values when possible, instead of building prompts from large transactional data tables.
o Force filter selection / entry by making prompt values required.  Do not allow open ended run of reports.

• Filter out unneeded data.  If there is a significant amount of data that is not being used in one or more tables (especially if they are frequently used), then that data should be filtered out by the ETL before it gets joined in SQL, and then has to be filtered out in the RPD or at the report level.

• Enter the “Number of Elements at this level” value in the logical level in hierarchies.
• Also ensure that all logical level keys are unique.

• Avoid function in the where clause when possible.

• Be careful of sub-queries.

• Check out the features of the OBIEE Performance Monitor
http://server:port/analytics/saw.dll?Perfmon  (enter your OBI server and port)

• When possible, do comparison analysis to determine for example, why is this report running fine, but this other seemingly similar report is not.

• Use fast disk for the OBIEE cache and/or temporary files.

Sometimes a complete overhaul might be required
Review the users’ workflow and determine if new and improved queries can be written or if the number of queries can be reduced.
Present information from a summary level first, and then provide increasing levels of details as requested by users through drill down or navigation.  Basically, present detailed information only when necessary, and minimize the amount of detail provided at a time by filtering on user selections.

Oracle’s OBIEE Performance Tuning Guide
Apply recommendations from the “Best Practices Guide for Infrastructure Tuning Oracle® Business Intelligence Enterprise Edition 11g Release”.  I would recommend applying 1 – 3 changes or set of changes at a time; don’t apply everything at the same time because if there is a problem, it will be more difficult to determine which change caused it.
https://blogs.oracle.com/proactivesupportEPM/entry/wp_obiee_tuning_guide

New OBIEE Tuning Guide (Version 4) – released Jan 2014

A new version of the OBIEE Tuning Guide (Version 4) was released in January, 2014.  It is suitable for OBIEE 11.1.1.6 and 11.1.1.7 versions.

It can be found on Oracle Support under Document ID 1333049.1.   Or here is a direct link to the document (PDF).

—————————————-

New highlights from the previous document include:

  •     Optimized JVM switches for JRockit / Sun JVM / IBM JVM
  •     New tuning parameters settings / values for JavaHost / OPIS / OBIS components.
  •     Improved performance monitoring techniques.
  •     IBM WebSphere tuning parameters.
  •     More WebLogic Server tuning parameters.
  •     Windows Server 2012 tuning parameter.
  •     New optimized Linux / AIX tuning parameters.
  •     Additional Essbase ASO tuning parameters.
  •     libOVD authenticator search tuning

—————————————-

As always, this document provides us with recommended baselines or starting points, but the appropriate settings for each environment will vary.