Category: Data Warehousing

Understanding Microsoft Fabric Shortcuts

Microsoft Fabric is a central platform for data and analytics, and one of its powerful features that supports it being an all-in-one platform is Shortcuts. Shortcuts provide a simple way to unify data across multiple locations without duplicating or moving it. This is a big deal because it saves a LOT of time and effort that is usually involved in moving data around.

What Are Shortcuts?

Shortcuts are references (or “pointers”) to data that resides in another storage location. Instead of copying the data into Fabric, a shortcut lets you access and query it as if it were stored locally.

This is especially valuable in today’s data landscape, where data often spans OneLake, Azure Data Lake Storage (ADLS), Amazon S3, or other environments.

Types of Shortcuts

There are 2 types of shortcuts: table shortcuts and file shortcuts

  1. Table Shortcuts
    • Point to existing tables in other Fabric workspaces or external sources.
    • Allow you to query and analyze the table without physically moving it.
  2. File Shortcuts
    • Point to files (e.g., Parquet, CSV, Delta Lake) stored in OneLake or other supported storage systems.
    • Useful for scenarios where files are your system of record, but you want to use them in Fabric experiences like Power BI, Data Engineering, or Data Science.

Benefits of Shortcuts

Shortcuts is a really useful feature, and here are some of its benefits:

  • No Data Duplication: Saves storage costs and avoids data sprawl.
  • Single Source of Truth: Data stays in its original location while being usable across Fabric.
  • Speed and Efficiency: Query and analyze external data in place, without lengthy ETL processes.
  • Flexibility: Works across different storage platforms and Fabric workspaces.

How and Where Shortcuts Can Be Created

  • In OneLake: You can create shortcuts directly in OneLake to link to data from ADLS Gen2, Amazon S3, or other OneLake workspaces.
  • In Fabric Experiences: Whether working in Data Engineering, Data Science, Real-Time Analytics, or Power BI, shortcuts can be created in lakehouses or KQL (Kusto Query Language) databases, and you can use them directly as data in OneLake. Any Fabric service will be able to use them without copying data from the data source.
  • In Workspaces: Shortcuts make it possible to connect across lakehouses stored in different workspaces, breaking down silos within an organization. The shortcuts can be generated from a lakehouse, warehouse, or KQL database.
  • Note that warehouses do not support the creation of shortcuts. However, you can query data stored within other warehouses and lakehouses.

How Shortcuts Can Be Used

  • Cross-Workspace Data Access: Analysts can query data in another team’s workspace without requesting a copy.
  • Data Virtualization: Data scientists can work with files stored in ADLS without having to move them into Fabric.
  • BI and Reporting: Power BI models can use shortcuts to reference external files or tables, enabling consistent reporting without duplication.
  • ETL Simplification: Instead of moving raw files into Fabric, engineers can create shortcuts and build transformations directly on the source.

Common Scenarios

  • A finance team wants to build Power BI reports on data stored by the operations team without moving the data.
  • A data scientist needs access to parquet files in Amazon S3 but prefers to analyze them within Fabric.
  • A company with multiple Fabric workspaces wants to centralize access to shared reference data (like customer or product master data) without replication.

In summary: Microsoft Fabric Shortcuts simplify data access across locations and workspaces. Whether table-based or file-based, they allow organizations to unify data without duplication, streamline analytics, and improve collaboration.

Here is a link to the Microsoft Learn OneLake documentation about Shortcuts. From there you will be able to explore all the Shortcut topics shown in the image below:

Thanks for reading! I hope you found this information useful.

SQL Tips: How to generate insert statements using the data from the output of a select statement using Toad

If you need to build insert statements for many rows of data for inserting that data into another table or into the same table in a different environment, there is a convenient way to do this in Toad. This is often needed to move data around.

In Toad, execute the appropriate select statement on the source table …

Select * from [my_schema].[my_simple_table];

I used a simple “select *” above, but your SQL statement can be any valid SQL statement that returns the data you want to insert into the other table. You may add specific columns, add filters, joins, and any other valid SQL operation.

Let’s say you want to insert the output into another table in a different schema.

Right-click on the output result data, and click “Export Dataset…”

From the “Export format” drop down menu, choose “Insert Statements”

In the Output section of the Export Dataset dialog box, enter the location and name of the Insert Script file that will be generated.

There are several other parameters that you could choose but we won’t cover them all here.

If you only wanted to generate inserts for some selected rows, select “Export only selected rows”.

If you need to specify the schema of the target table, select “Include schema name”

In the Table section, enter the name of the target schema and table

Note, there are data masking options available that can be very useful if, for example, you are moving some data from a Production environment to a Non-Production environment, and you do not want to expose the data there.

After you have set the parameters relevant to your scenario, Click “OK”.

The Insert Script file of all the data will be generated with the filename and at the location you specified. And the insert statements will include the name of the schema and table you specified.

Thanks for reading!

Data Cleaning methods

Data cleaning is an essential step in the data preprocessing pipeline when preparing data for analytics or data science. It involves identifying and correcting or removing errors, inconsistencies, and inaccuracies in the dataset to improve its quality and reliability. It is essential that data is cleaned before being used in analyses, reporting, development or integration. Here are some common data cleaning methods:

Handling missing values:

  • Delete rows or columns with a high percentage of missing values if they don’t contribute significantly to the analysis.
  • Impute missing values by replacing them with a statistical measure such as mean, median, mode, or using more advanced techniques like regression imputation or k-nearest neighbors imputation.

Handling categorical variables:

  • Encode categorical variables into numerical representations using techniques like one-hot encoding, label encoding, or target encoding.

Removing duplicates:

  • Identify and remove duplicate records based on one or more key variables.
  • Be cautious when removing duplicates, as sometimes duplicated entries may be valid and intentional.

Handling outliers:

  • Identify outliers using statistical methods like z-scores, box plots, or domain knowledge.
  • Decide whether to remove outliers or transform them based on the nature of the data and the analysis goals.

Correcting inconsistent data:

  • Standardize data formats: Convert data into a consistent format (e.g., converting dates to a specific format).
  • Resolve inconsistencies: Identify and correct inconsistent values (e.g., correcting misspelled words, merging similar categories).

Dealing with irrelevant or redundant features:

  • Remove irrelevant features that do not contribute to the analysis or prediction task.
  • Identify and handle redundant features that provide similar information to avoid multicollinearity issues.

Data normalization or scaling:

  • Normalize numerical features to a common scale (e.g., min-max scaling or z-score normalization) to prevent certain features from dominating the analysis due to their larger magnitudes.

Data integrity issues:

Finally, you need to address data integrity issues.

  • Check for data integrity problems such as inconsistent data types, incorrect data ranges, or violations of business rules.
  • Resolve integrity issues by correcting or removing problematic data.

It’s important to note that the specific data cleaning methods that need to be applied to a dataset will vary depending on the nature of the dataset, the analysis goals, and domain knowledge. It’s recommended to thoroughly understand the data and consult with domain experts when preparing to perform data cleaning tasks.

Quality Assurance (QA) for Data Projects or Data Applications

This post discusses Quality Assurance (QA) activities for data projects.

What is Quality Assurance (QA)?  Simply put, Quality Assurance, also called QA, Testing or Validation, is about testing an application or solution to ensure that all the stated/promised/expected requirements are met. It is a critically important activity for all software application development or implementations. Data applications are no different. They need to be tested to ensure they work as intended.

QA stands between development and deployment. And QA makes the difference between a delivered product and a high quality delivered product.

There are a number of things to keep in mind when you plan your Quality Assurance activities for data solutions. I present some of them in this post as suggestions, considerations, or prompting questions. The things mentioned here will not apply to all data applications but can be used as a guide or a check.

People / Teams

The number of people and teams involved in a project will vary depending on the size, scope and complexity of the project.

The technical team building the application needs to perform an initial level of validation of the solution.

If there is a Quality Assurance team that performs the validation tasks, then that team will need to perform the “official” validation.

The business analysts and end-users of the application also need to validate. Where possible, work with as many end users as efficiently possible. The more real users you have testing the application, the better the chances of finding issues early.

Where it makes sense, Test IDs that simulate various types of users or groups should be used to help test various usage and security scenarios. This is particularly useful in automated testing.

On large projects where there is a lot to be tested, it is best to break up the testing across multiple people or teams. This will help to prevent testing fatigue and sloppy testing and result in higher quality testing.

Plan ahead to ensure that access for all the relevant users is set up in the testing environments.

Communication

With all the teams and people involved, it is important to have a plan for how they will communicate. Things to consider and have a plan for include:

  • How will teams communicate within? Email, Microsoft Teams, SharePoint, Shared Files, are some options.
  • How will the various teams involved communicate with each other? In other words, how will cross-team communication be handled? As above, Email, Microsoft Teams, SharePoint, Shared Files, are some options.
  • How will issues and status be communicated? Weekly meetings, Status emails or documents, Shared files available on shared spaces are options.
  • How will changes and resolutions be tracked? Files, SDLC applications, Change Management applications are options.
  • How will teams and individuals be notified when they need to perform a task? Manual communication or automated notifications from tools are options.

Data

The most important thing to ensure in data projects is that the data is high quality, particularly the “base” data set. If the base data is incorrect, everything built on top of it will be bad. Of course, the correctness of intermediate and user-facing data is also just as important, but the validation of the base data is critical to achieving the correct data all over.

  • Ensure that table counts, field counts and row counts of key data are correct.
  • Does the data warehouse data match the source data?
  • Test detailed, low level records with small samples of data
  • Test to ensure that the data and the values conform to what is expected. For example, ensuring that there is no data older than 3 years old, or ensuring that there are no account values outside a certain range. The Data Governance Team may become involved in these activities across all projects.

Next in line is the “intermediate” data such as derived metrics, aggregates, specialized subsets, and more. These will also need to be verified.

  • Are the calculated values correct?
  • Are the aggregates correct? Test aggregate data with small, medium and large sets of data
  • Verify metric calculations

Then the user-facing data or data prepared for self-service usage needs to be validated.

  • Does the data on the dashboard match the data in the database?
  • Are the KPIs correctly reflecting the status?

Test the full flow of the data. The validity of the data should be verified at each stage of the data flow – from the source, to the staging, to the final tables in the data warehouse, to aggregates or subsets, to the dashboard.

Take snapshots of key datasets or reports so you can compare results post data migration.

Some additional data prep might be needed in some cases.

  • These include making sure that you have sourced adequate data for testing. For example, if you need to test an annual trend, then it might be best to have at least a year’s worth of data, preferably two.
  • You may need to scramble or redact some data for testing. Often Test data is taken from the Production environment and then scrambled and/or redacted in order to not expose sensitive information.
  • You may need to temporarily load in data for testing. For various reasons, you may need to load some Production data into the QA environment just to test the solution or a particular feature and then remove the data after the testing is complete. While this can be time consuming, sometimes it’s necessary, and it’s good to be aware of the need early and make plans accordingly.

Aesthetics & Representation of Data

Presentation matters. Although the most critical thing is data correctness, how the data is presented is also very important. Good presentation helps with understanding, usability, and adoption. A few things to consider include:

  • Does the application, such as dashboard, look good?  Does it look right? 
  • Are the components laid out properly so that there is no overcrowding?
  • Are the logos, colors and fonts in line with company expectations?
  • Are proper chart options used to display the various types of data and metrics?
  • Is the information provided in a way that users can digest?

Usage

The data application or solution should be user friendly, preferably intuitive or at least have good documentation. The data must be useful to the intended audience, in that, it should help them to understand the information and make good decisions or take sensible actions based on it.

The application should present data in a manner that is effective – easy to access, and easy to understand.

The presentation should satisfy the analytic workflows of the various users. Users should be able to logically step through the application to find information at the appropriate level of detail that they need based on their role.

A few things that affect usability include:

  • Prompts – ensure that all the proper prompts or selections are available to users to slice and filter the data as necessary. And of course, verify that they work.
  • Drill downs and drill throughs – validate that users can drill-down and across data to find the information they need in a simple, logical manner.
  • Easy interrogation of the data – if the application is ad-hoc in nature, validate that users can navigate it or at least verify that the documentation is comprehensive enough for users to follow.

Security

Securing the application and its data so that only authorized users have access to it is critical.

Application security comprises of “authentication”– access to the application, and “authorization” – what a user is authorized to do when he or she accesses the application.

Authorization (what a user is authorized to do within the application) can be broken into “object security” – what objects or features a user has access to, and “data security” – what data elements a user has access to within the various objects or features.

For example, a user has access to an application (authenticated / can log in), and within the application the user has access to (authorized to see and use) 3 of 10 reports (object-level security). The user is not authorized to see the other 7 reports (object-level security) and, therefore, will not have access to them. Now, within the 3 reports that the user has access to, he or she can only see data related to 1 of 5 departments (data-level security).

All object-level and data-level security needs to be validated. This includes negative testing. Not only test to make sure that users have the access they need, but testing should also ensure that users do not have access that they should not have.

  • Data for testing should be scrambled or redacted as appropriate to protect it.
  • Some extremely sensitive data may need to be filtered out entirely.
  • Can all the appropriate users access the application?
  • Are non-authorized users blocked from accessing the application?
  • Can user see the data they should be able to see to perform their jobs?

Performance

Performance of the data solution is important to user efficiency and user adoption. If users cannot get the results they need in a timely manner, they will look elsewhere to get what they need. Even if they have no choice, a poorly performing application will result in wasted time and dollars.

A few things to consider for ensuring quality around performance:

  • Application usage – is the performance acceptable? Do the results get returned in an acceptable time?
  • Data Integration – is the load performance acceptable?
  • Data processing – can the application perform all the processing it needs to do in a reasonable amount of time?
  • Stress Testing – how is performance with many users? How is it with a lot data?
  • How is performance with various selections or with no selections at all?
  • Is ad-hoc usage setup to be flexible but avoid rogue analyses that may cripple the system?
  • Is real-time analysis needed and is the application quick enough?

These items need to be validated and any issues need to be reported to the appropriate teams for performance tuning before the application is released for general usage.

Methodology

Each organization, and even each team within an organization, will have a preferred methodology for application development and change management, including how they perform QA activities.

Some things to consider include:

  • Get QA resources involved in projects early so that they gain an early understanding of the requirements and the solutions to assess and plan how best to test.
  • When appropriate, do not wait until all testing is complete before notifying development teams of issue discovered. By notifying them early, this could make the difference between your project being on-time or late.
  • Create a test plan and test scripts – even if they are high-level.
  • Where possible, execute tasks in an agile, iterative manner.
  • Each environment will have unique rules and guidelines that need to be validated. For example, your application may have a special naming convention, color & font guidelines, special metadata items, and more. You need to validate that these rules and guidelines are followed.
  • Use a checklist to ensure that you validate with consistency from deliverable to deliverable
  • When the solution being developed is replacing an existing system or dataset, use the new and old solutions in parallel to validate the new against the old.
  • Document test results. All testing participants should document what has been tested and the results. This may be as simple as a checkmark or a “Done” status, but may also include things like data entered, screenshots, results, errors, and more.
  • Update the appropriate tracking tools (such as your SDLC or Change Management tools) to document changes and validation. These tools will vary from company to company, but it is best to have a trail of the development, testing, and release to production.
  • For each company and application, there will a specific, unique set of things that will need to be done. It is best if you have a standard test plan or test checklist to help you confirm that you have tested all important aspects and scenarios of the application.

This is not an all-encompassing coverage of Quality Assurance for data solutions, but I hope the article gives you enough information to get started or tips for improving what you currently have in place. You can share your questions, thoughts and input via comments to this post. Thanks for reading!

Creating a Business Intelligence (BI) & Analytics Strategy and Roadmap

This post provides some of my thoughts on how to go about creating a Business Intelligence (BI) & Analytics Strategy and Roadmap for your client or company.  Please comment with your suggestions from your experience for improving this information.

 

When creating or updating the BI & Analytics Strategy and Roadmap for a company, one of the first things to understand is:

Who are all the critical stakeholders that need to be involved?

Understanding who needs and uses the BI & Analytics systems is critical for starting the process of understanding and documenting the “who needs what, why, and when”.

These are some of the roles that are typically important stakeholders:

  • High-level business executives that are paying for the projects
  • Business directors involved in the usage of the systems
  • IT directors involved in the developing and support of the systems
  • Business Subject Matter Experts (SME’s) & Business Analysts
  • BI/Analytics/Data/System Architects
  • BI/Analytics/Data/System Developers and Administrators

 

Then, you need to ask all these stakeholders, especially those from the business:

What are the drivers for BI & Analytics? And what is the level of importance for each of these drivers?

This will help you to understand and document what business needs are creating the need for new or modified BI & Analytics solutions. You should then go deeper to understand … what are the business objectives and goals that are driving these business needs.  This will help you to understand and document the bigger picture so that a more comprehensive strategy and roadmap can be created.

The questions and discussions surrounding the above will require deep and broad business involvement. Getting the perspective of a wide range of users from all business areas that are using the BI & Analytics Systems is critical.  The business should be involved throughout the process of creating the strategy and roadmap, and all decisions should tie back to support for business objectives and goals. And the trail leading to all these decisions must be documented.

Some examples of business drivers include:

  • Gain more insight into who our best customers are and how best to acquire them.
  • Understand how weather affects our sales/revenue.
  • Determine how we can sell more to our existing customers.
  • Understand what causes employee turnover.
  • Gain insight into how we can improve staffing schedules.

 

And examples of business objectives and goals may include things like:

  • Increase corporate revenues by 10%
  • Grow our base of recurring customers
  • Stabilize corporate revenues over all seasons
  • Create an environment where employees love to work
  • Reduce payroll costs without a reduction in staff, for example, reduce turnover.

 

Then, turn to understanding and documenting the current scenario (if not already known). Identify what systems (including data sources) are in place, who are using them (and why and how), what capabilities do they offer, what are the must-haves, and what are the pain points and positive highlights.

Also, you will need to determine the current workload (and future workload if it can be determined) of the primary team members involved in developing, testing, and implementing BI & Analytics solutions.

This will help you understand a few things:

  • Some of the highest priority needs of the users
  • Gaps in capabilities and data between what is needed and what is currently in place (including an understanding of what is liked and disliked about the current systems)
  • Current user base knowledge and engagement
  • IT knowledge and skills
  • Resource availability – when are people available to work on new initiatives

 

What are the options and limitations?

  • Can existing systems be customized to meet the requirements?
  • Can they be upgraded to a new version that has the needed functionality?
  • Do we need to consider adding a new platform or replacing one or more of the existing systems with a new platform?
  • Can we migrate from/integrate one system to/with another system that we already have up and running?
  • Are any of our current systems losing vendor support or require an upgrade for other reasons? Has the pricing changed for any of our software applications?
  • What options does our budget permit us to explore?
  • What options do our knowledge and skills permit us to explore?

 

Once you have identified these items …

  • Identify and engage stakeholders, and document these roles and the people
  • Identify and document business drivers, objectives and goals
  • Understand and document the current landscape – needs (including must-haves), technology, gaps, users, IT staff, resource availability, and more
  • Identify and document options – based on current landscape, technology, budget, staff resources, etc.

… you can develop a “living” Strategy and Roadmap for BI & Analytics. And when I say “living”, I mean it will not be a static document, but will be fine-tuned over time as new information emerge and as changes arise in business needs, technology, and staff resources.

 

Your Strategy and Roadmap for BI & Analytics should include, but is not limited to:

  • BI & Analytics that will be used to satisfy business drivers, objectives and goals
  • Data acquisition and storage plan for meeting the analytics needs
  • Technology platforms that will be used to process and store data, and deliver the analytics
  • Information about any new technologies that needs to be acquired or implemented, and schedules
  • Roles and Responsibilities for all stakeholders involved in BI & Analytics projects
  • Planned staffing allocations and schedules
  • Planned staffing changes and schedules
  • User training (business users) and Delivery team training (technical implementers & developers for example)
  • List dependencies for each item or set of items

Terminology Primer – Goals, Objectives, Measures, Metrics, Key Performance Indicators (KPIs), Meters

The essential purpose of business intelligence is to provide information that will allow people throughout an organization to make informed decisions relevant to their business processes and responsibilities.  This post is a simple terminology primer that describes the meaning of a few terms commonly used in business intelligence, and explain how they relate to each other, and their relevance to supporting the overall goals of an organization.

Goals are a business’ desired outcomes. They are typically around growth, cost savings, innovation, improvement in efficiency, the company’s workforce, and the competition, but may include a lot of other things.
Goals help a company to stay focused by providing team members within a company with an aim to work towards.

Objectives are specific strategies and steps that a business needs to take to achieve the goals that have set. These objectives are usually specific and measurable.  Success toward achieving objectives usually indicates progress toward achieving goals.
Objectives are sometimes referred to as Critical Success Factors because they are critical to the success of achieving the goals.

Measures are numeric representations of various transactions that occur through various business processes. For example, when the company makes a sale (during the sales process), some measures that are generated in that transaction are:  sales amount, discount amount, number of items sold, and number of items discounted.  Then from this, other measures can be determined, such as total sales for all customers, total number of items sold, total number of sales for each customer, and so on.  Measures are numeric and therefore can have mathematical calculations performed on them – such as sum, avg., min, max, etc. – to generate metrics.

Metrics are calculations derived from one or more measures. For example, as mentioned in an example above, you may have the measures “Discount Amount” and “Number of Items Discounted”, and you may use these measures to calculate a metric of “Average Discount Amount per Discounted Item” or “Average Discount Amount per Sale”.  As another example, you may simply add up all the Discount Amounts over a specific time period, such as month, to get a “Total Discount Amount by Month” metric.

Key Performance Indicators (KPIs) are metrics that measure how well a company is doing toward their objectives. Companies will have hundreds or thousands of metrics, but there will be a few key ones that the executive team wants to keep a close eye on for the overall company or divisions, and other managers will want to keep an eye on KPIs relevant to their respective departments.  Those key metrics are the KPIs. All KPIs are metrics, but not all metrics are KPIs.

Meters are a group of metrics that collectively provide a broader, overall view of a subject area.
For example, you may have individual metrics for Sales to Date, Sales in Pipeline, Number/Value of Expiring Contracts, Avg. Time to Close Sales, etc. Putting these all together in a Meter presents the user with a lot of related information that provides a broad, overall picture of sales (and loss of sales) which would allow for analysis such as determining the chances of meeting sales targets. The Meter in this example could be called “Sales Forecast” for example.

Thanks for reading!

Oracle Advanced Security Summary

With the expansion of Self-Service BI, BI Teams need to be more vigilant about protecting sensitive data.
This is a summary of options available for protecting data in Oracle databases.
The information in this post was found here and summarized for a quick read: https://docs.oracle.com/database/121/ASOAG/toc.htm

The 3 features available are (1) Transparent Data Encryption, (2) Data Redaction, and (3) Data Masking and Subsetting Pack.
Here is a quick summary.

(1) Transparent Data Encryption (TDE)

  • Encrypt data so only authorized people can see it
  • Use it to protect sensitive data that maybe in an unprotected environment, such backup data sent to a storage facility
  • You can encrypt an individual column or an entire tablespace
  • Applications using encrypted data can function just the same

(2) Data Redaction

  • Enable the redaction (masking) of column data in tables
  • Redaction can be full, partial, based on regular expressions, or random
    • Full redaction: replaces strings with a single blank space ‘ ‘; numbers with zero (0); dates with 01-JAN-01
    • Partial redaction: replaces a portion of the column data; for example SSN: ***-**-1234
    • Regular expressions: can be used to perform partial or full redactions
    • Random: generates random values for display when accessed
  • The redaction takes place at runtime; not in the permanent data stored

(3) Oracle Enterprise Manager Data Masking and Subsetting Pack

  • enables you to create a “safe” development or test copy of the production database

 

Let’s look into some more details …

(1) Transparent Data Encryption (TDE)

  • TDE uses a two-tiered key-based architecture
  • TDE column encryption uses the two-tiered key-based architecture to transparently encrypt and decrypt sensitive table columns. The TDE master encryption key is stored in an external security module, which can be an Oracle software keystore or hardware keystore. This TDE master encryption key encrypts and decrypts the TDE table key, which in turn encrypts and decrypts data in the table column.
  • A Key Management Framework is used for TDE to store and manage keys and credentials.
    • Includes the keystore to store the TDE master encryption keys and the management framework to manage keystore and key operations
    • The Oracle keystore stores a history of retired TDE master encryption keys, which enables you to change them and still be able to decrypt data that was encrypted under an earlier TDE master encryption key.
  • Types of Keystores
    • Software keystores
    • Hardware, or HSM-based, keystores
  • Types of Software Keystores:
    • auto-login software keystores that are local to the computer on which they are created.
    • cannot be opened on any computer other than the one on which they are created.
    • typically used for scenarios where additional security is required while supporting an unattended operation
    • Password-based software keystores
      • protected by using a password that you create. You must open this type of keystore before the keys can be retrieved or used.
    • Auto-login software keystores
      • protected by a system-generated password, and do not need to be explicitly opened; automatically opened when accessed.
      • can be used across different systems; ideal for unattended scenarios.
    • Local auto-login software keystores
  • Steps for configuring a Software Keystore
    • Step 1: Set the Software Keystore Location in the sqlnet.ora File
    • Step 2: Create the Software Keystore
    • Step 3: Open the Software Keystore
    • Step 4: Set the Software TDE Master Encryption Key
    • Step 5: Encrypt Your Data
  • Oracle Database checks the sqlnet.ora file for the directory location of the keystore, whether it is a software keystore or a hardware module security (HSM) keystore.
  • You cannot change an existing tablespace to make it encrypted
  • You can create or modify columns to be encrypted

 

(2) Data Redaction

  • Define data redaction policies to specify what data needs to be redacted
  • Use policy expressions to set whether a user sees the redacted data or the full data
  • Policy Procedures
    • DBMS_REDACT.ADD_POLICY
    • DBMS_REDACT.ALTER_POLICY
    • DBMS_REDACT.ENABLE_POLICY
    • DBMS_REDACT.DISABLE_POLICY
    • DBMS_REDACT.DROP_POLICY
  • Sample scrip
    • BEGIN
      DBMS_REDACT.ADD_POLICY(
      object_schema => ‘hr’,
      object_name => ’employees’,
      column_name => ‘commission_pct’,
      policy_name => ‘redact_com_pct’,
      function_type => DBMS_REDACT.PARTIAL, –partial;  use DBMS_REDACT.FULL for full
      function_parameters => DBMS_REDACT.REDACT_US_SSN_F5,  — many standard params, but it can also be custom
      expression =>  ‘SYS_CONTEXT(”SYS_SESSION_ROLES”,”MGR”) = ”FALSE”’); –allows MGR role to see data
      policy_description => ‘Partially redacts 1st 5 digits in SS numbers’,
      column_description => ‘ssn contains Social Security numbers’);
      END;
      /
  • Use DBMS_REDACT.ALTER_POLICY and action => DBMS_REDACT.ADD_COLUMN to redact multiple columns
  • Redaction takes place on select lists and not on where clauses
  • Be aware of the scenarios when using redacted tables to build other tables or views

 

(3) Oracle Enterprise Manager Data Masking and Subsetting Pack (DMSP)

  • DMSP enables you to create a development or test copy of the production database, by taking the data in the production database, masking this data in bulk, and/or creating a subset of the data, and then putting the resulting masked data and/or subset of data in the development or test copy.
  • You can still apply Data Redaction policies to the non-production database, in order to redact columns
  • Used to mask data sets when you want to move the data to development and test environments.
  • Data Redaction is mainly designed for redacting at runtime for production applications

——–

I hope you found this helpful to get you started on taking the steps to protect your data internally and externally.
You can visit the link I provided above to find more details.

OBIEE Performance Tuning

This post describes a few tips and things to keep in mind for OBIEE Performance Tuning.

Be Proactive when possible
The need to performance tune can be proactive (tune before a major issue arises) or reactive (tune after a problem is reported by users for example).  It is best to be proactive – so performance tuning should be built into your OBIEE maintenance schedule. For example, OBIEE’s Usage Tracking functionality should be used regularly to identify reports whose performance can be improved and then performance steps should be carried out on the worst performers.

Iterative Process – change one thing or set of things at a time
One of the first things to keep in mind is that performance tuning is an iterative process.  And there is typically no one silver bullet that will resolve all your performance problems.  You may need to analyze and make changes to multiple parts of the system, but you want to make the changes methodically.  It is best to change one parameter or setting at the same time (or one related set of parameters).  Adjust and test the settings for that one parameter/setting (or set of parameters) before moving on to another.  If you change too much at one time, you may have a difficulty determining what is helping from what is hurting your efforts.

Fix user complaints first, worst performers next, and then the next bad performers down the list
Another thing to keep in mind, tune what users are reporting first, then tune the worst problems second, then move on to the next.

Team Effort – problem could be anywhere along the technology stack
Performance problems could be anywhere along the technology stack:
• OBIEE
• Database
• Server
• Network
Due to that span of technology, performance tuning is a team effort.  OBIEE Admins and Developers, DBAs, and ETL Developers can all be key to solving performance issues.
Logs from all components may need to be reviewed depending on the scenario.

Try to isolate or narrow-down the source of the problem
For example, run the report SQL directly on the database and see if you have the same problem. If there is no issue when run directly on that the database, then you have eliminated the database as the problem.
Determine if other applications have been also been experiencing slowness which could indicate the possibility of a network problem.

If your users have reported an issue, then you need to get as much details as possible about the performance problems they are experiencing.  When did this start happening?  Is it just one report or many?  Is it localized to one business area or multiple?  Is it all the time or sometimes?  Knowing this will help you to know where to focus.

Other questions to ask as you try to identify the source of the problem include but not limited to:
Has anything changed?  If reports were running fine, but are now slow, the first thing to ask is …
When the issue start?  Determining exactly when it started might be helpful when correlating with other system or company activity)
What has changed recently?  Has there been any system changes, data changes, database updates, network changes, etc. (even if they seem unrelated)?  For example, rolling into a new calendar year will cause new “Year” value(s) to be included in the data and can impact performance if statistics are not gathered.
Is there a possibility that an index was dropped and not recreated as expected?

Use OBIEE’s Usage Tracking information to analyze specific reports, analyze long running reports, or frequently run reports.  You will want to capture and analyze the SQL from these reports to determine what can be done to improve their performance.

Database
DBAs can monitor the system in real-time, use various tools, or review logs for information that can be helpful in the tuning effort.  Tools such as Oracle Enterprise Manager (EM) or SQL Tuning Advisor can be used to identify, analyze and tune high-load SQL.
OBIEE Usage Tracking can also be used to identify high-load SQL.
Without getting into much detail, these are some database features that could be used to help improve performance:
• Gather Statistics
• Results Cache database feature
• Partitioning

Servers
The System Admins can monitor the server resources to determine if there is an issue there.
• Use fast disk for the OBIEE cache and/or temporary files.

 

OBIEE-specific performance tuning tips

• OBIEE Caching
Are the tables being used set to cacheable?
Is caching turned on at the application level?
You may consider seeding the cache daily.
CACHE Settings:
o MAX_ROWS_PER_CACHE_ENTRY
o MAX_CACHE_ENTRY_SIZE
o MAX_CACHE_ENTRIES
o ——————-
o USE_ADVANCED_HIT_DETECTION

• Use Aggregation: Aggregate data when applicable
o You can use Aggregate tables or materialized views to realize this benefit.
o Aggregate Fact tables and corresponding Aggregate Dimensions.
o Make sure aggregation rules are applied to Fact table measures.
o Don’t necessarily merge all measures into a single fact.

• Joins and Indexes
o Do not create unnecessary joins.
o Verify that the joins on the tables being investigated are appropriate.
o Performance Indexing could be helpful.  Again, this is an iterative process.

• Prompts and Filters
o Use LOV tables to drive prompt values when possible, instead of building prompts from large transactional data tables.
o Force filter selection / entry by making prompt values required.  Do not allow open ended run of reports.

• Filter out unneeded data.  If there is a significant amount of data that is not being used in one or more tables (especially if they are frequently used), then that data should be filtered out by the ETL before it gets joined in SQL, and then has to be filtered out in the RPD or at the report level.

• Enter the “Number of Elements at this level” value in the logical level in hierarchies.
• Also ensure that all logical level keys are unique.

• Avoid function in the where clause when possible.

• Be careful of sub-queries.

• Check out the features of the OBIEE Performance Monitor
http://server:port/analytics/saw.dll?Perfmon  (enter your OBI server and port)

• When possible, do comparison analysis to determine for example, why is this report running fine, but this other seemingly similar report is not.

• Use fast disk for the OBIEE cache and/or temporary files.

Sometimes a complete overhaul might be required
Review the users’ workflow and determine if new and improved queries can be written or if the number of queries can be reduced.
Present information from a summary level first, and then provide increasing levels of details as requested by users through drill down or navigation.  Basically, present detailed information only when necessary, and minimize the amount of detail provided at a time by filtering on user selections.

Oracle’s OBIEE Performance Tuning Guide
Apply recommendations from the “Best Practices Guide for Infrastructure Tuning Oracle® Business Intelligence Enterprise Edition 11g Release”.  I would recommend applying 1 – 3 changes or set of changes at a time; don’t apply everything at the same time because if there is a problem, it will be more difficult to determine which change caused it.
https://blogs.oracle.com/proactivesupportEPM/entry/wp_obiee_tuning_guide

Oracle Business Intelligence Applications (OBIA) Fact Tables

Dimensionally modeled (star-schema designed) data warehouses are primarily made up of two types of tables – Fact and Dimension.  Fact tables store the measurements generated by business events (# of orders, amount of dollars, etc.); and Dimension tables store the descriptive attributes that provide context to the measurements (product [product name], customer [customer type], date, etc.).

This post describes the types of Fact tables found in Oracle Business Intelligence Applications (OBIA) data warehouse – Oracle Business Analytics Warehouse (OBAW).  There will be future posts that describe in detail the other table types in OBIA (Dimension, Internal, etc.).

The 5 types of Fact tables used in the OBAW are:

  1. Transactional
  2. Aggregate
  3. Cycle Lines
  4. Snapshot
  5. State Transition.

The Transactional Fact Table is the main type of fact table. It stores the lowest-level of information from transactional sources. An example of a Fact table in OBIA (Financial Analytics) is: W_GL_BALANCE_F
Note: Fact tables in OBIA end with “_F”.
This table stores the current balance for GL accounts by GL_ACCOUNT and other dimensions.

The Aggregate Fact Table is typically used for performance improvements.  It is a summarized or rolled-up version of the Transactional fact table.  Instead of querying the data at the transactional level – which is the most detailed level and the level with the most records, the Aggregate table allows you to query the data at a more rolled up level when appropriate.  One of the most frequent roll-ups is time – for example, a transactional table at a day level is rolled up to the month level.
Aggregate tables can be tens of times less (or even hundreds) than their transactional versions.  These types of tables are also very common in OBIA and in data warehousing in general.

An example of an Aggregate Fact Table in OBIA (Financial Analytics) is: W_GL_BALANCE_A
Note: Aggregate Fact tables in OBIA end with “_A”.
This table stores the GL account balances aggregated by GL Account Segment and other dimensions. Instead of having data at the GL_ACCOUNT level as in the Transactional fact table, the data is at the GL Account Segment level in the Aggregate table.  Aggregate Fact tables are derived from Transactional Fact  Tables or other Aggregate Fact tables. This table is derived from the transactional fact table mentioned above: W_GL_BALANCE_F.

The Snapshot Fact Table stores “snapshots” of measurements taken at well-defined, predetermined time intervals – such as daily, monthly, annually, etc.  Examples include Inventory and Account Balance snapshots, and AR/AP aging snapshots.  Common items such as financial reports or bank statements are examples of reports from Snapshot Fact tables.

An example of a Snapshot table in OBIA(Supply Chain Analytics) is: W_INVENTORY_DAILY_BAL_F
Oracle’s description of this table will help to clarify its makeup and purpose.
The W_INVENTORY_DAILY_BAL_F fact table is used to represent at a point in time information of all inventory balances and inventory values related to products whose inventory is maintained by the business organization, these would typically include all inbound (purchased from external entities) products as well as outbound (sold to external entities) products. The inventory balance information is trended by copying historical snapshot information from this table at periodic points in time into history table W_INVENTORY_MONTHLY_BAL_F.
The W_INVENTORY_MONTHLY_BAL_F table stores a snapshot of inventory balance.
There is one row for each product and product storage location whose point in time inventory quantity and value information is maintained. The storage location could represent a warehouse or further divisions within a warehouse. This aspect is configurable within the product. All the dimension key links to the other Oracle Business Analytics Warehouse dimension tables, such as W_DAY_D, W_BUSN_LOC_D, W_PRODUCT_D, W_INVENTORY_PRODUCT_D, and so on, represent information associations at that point in time for that product inventory information. The DATE_WID column represents the date on which the inventory balance information is valid.

These tables can also have Aggregate versions:
As mentioned in the description for the W_INVENTORY_DAILY_BAL_F table above, there is an aggregate version.  However, snapshot tables are not necessarily aggregated like transactional tables, because many times the measures are non-additive or semi-additive. For example, you would not take your account monthly balance in January and add it to your account monthly balance in February to determine how much money you have – that would be wrong.

The W_INVENTORY_MONTHLY_BAL_F fact table is used to represent the monthly information of all the inventory balances and the inventory values related to products whose inventory is maintained by the business organization. This information includes all inbound (purchased from external entities) products and outbound (sold to external entities) products. The aggregation period is configurable, and has a preconfigured value of Monthly.
There is one row for each product and product storage location whose point in time (as of a month) inventory quantity and value information is maintained. All the dimension key links to the other Oracle Business Analytics Warehouse dimension tables such as W_DAY_D, W_BUSN_LOC_D, W_PRODUCT_D, W_INVENTORY_PRODUCT_D, and so on, and represents information and associations at that point in time for that product inventory information. The PERIOD_START_DT_WID and PERIOD_END_DT_WID column represents the aggregation bucket start and end dates. The column INV_BALANCE_DT_WID represents the date within this aggregation period on which the inventory balance information is valid.

The Cycle Lines Fact Table store measurements for multiple related business events and are therefore typically derived from multiple fact tables. They typically store process cycle times or provide the ability to easily determine process cycle times.  These tables are also called Accumulating Snapshot Fact tables because they are snapshots of different events accumulated on each other.  An example of a Cycle Lines Fact table is W_PURCH_CYCLE_LINE_F.

Here is Oracle’s description of the table which should help clarify its purpose: W_PURCH_CYCLE_LINE_F table tracks the time duration of all events pertaining to the purchase process commencing with a requisition. Information in this table enables analysis of the direct spend process within an organization beginning with a purchase requisition, its approval, the creation of an approved purchase order, its submission to a supplier, the creation of a purchase schedule and ending with its receipt of the products. It can be used to calculate the time taken to receive products that have been ordered, the time between the first receipt and last receipt of products that have scheduled for delivery. The W_PURCH_CYCLE_LINE_F table contains all the various dates associated with the processes such as submission, approval, ordering and receiving as well as quantities and amounts. While Other spend related fact tables capture individual process such as requesting, ordering, scheduling this table combines all the in one place for ease of analysis and reporting.

These Cycle Lines tables can also have aggregate versions. For example: W_PURCH_CYCLE_LINE_A This is an aggregate table of W_PURCH_CYCLE_LINE_F at a higher level of dimensionality. The Product dimension is replaced by a Product type dimension to give a high level analysis of the sourcing data. It stores Purchase Cycle Line records aggregated over a preconfigured Monthly time period and product types.

State Transition Fact Tables store state-transition metrics based on business events, such as customer state – new, top, dormant, lost, etc – based on the customer order activity.  These tables store or allow you to easily derived counts of the various states.  State Transition Fact tables are derived from Transactional or Snapshot fact tables.

Below are two examples of State Transition Fact tables in OBIA (Marketing Analytics):

The Customer Status History Fact: W_CUSTOMER_STATUS_HIST_F
This is a fact table that tracks the status of customers based on the frequency of orders they place with the organization. Possible statuses are NEW, RECENT, DORMANT and LOST. The time duration for each status bucket is configurable, out of the box being a calendar year.
The grain of this table is at a Customer, Customer Status and the Status Start Date level. Other important columns in this table include the Sold to and the Ship to location for the customer. These are derived based on the status bucket start date against the Customer Locations dimension table.

The Loyalty Member Status History Fact: W_LOY_MEMBER_STATUS_HIST_F
W_LOY_MEMBER_STATUS_HIST_F Fact table stores status changes of Loyalty members. Grain: One record for each member status changed.

That’s it for OBIA fact tables.  Understanding the types of fact tables and their purpose helps us to make better design choices when we set out to build new fact tables to represent business events, and it also helps us to quicker recognize and better analyze the data in these tables.
I hope you found this information useful. If you have information about other fact table types, please share.

OBIEE Tuning Whitepaper from Oracle (has been updated)

Oracle has released an updated version of their OBIEE Tuning Whitepaper.

You can find the document here …

https://blogs.oracle.com/pa/entry/test

… or here …

https://support.oracle.com/epmos/faces/ui/km/DocumentDisplay.jspx?_afrLoop=212370301476321&id=1333049.1&_afrWindowMode=0&_adf.ctrl-state=w65avp7pa_30

You will need to have an Oracle ID to access it (which is a free sign up).

In addition to all the great information that was in the original document, the updates to the document include:

  • New improved HTTP Server Caching algorithm
  • Oracle iPlanet Web Server tuning parameters
  • New tuning parameters settings / values for OPIS/OBIS components

The topics included in the document are:

1.0 Performance Overview

1.1 Introduction to Oracle Business Intelligence EE Performance
1.2 Performance Terminology
1.3 Understanding Key Performance Drivers

2.0 Top Tuning Recommendations for OBIEE

2.1 Tune Operating Systems parameters.
2.2 Tune Oracle WebLogic Server (WLS) Parameters
2.3 Tune 64bit Java Virtual Machines (JVM)
2.4 Tune 32bit Java Virtual Machines (JVM)
2.5 Tune HTTP Server Parameters
2.6 Tune HTTP Server Compression / Caching
2.7 Tune Oracle Database Parameters

3.0 Performance Monitoring OBIEE

3.1  Built-in BI Metrics for Performance Monitoring
3.2  Performance Monitoring In Windows Environment
3.3. Performance Monitoring In Unix Environment

4.0 Tuning OBIEE Components

4.1 Oracle BI Presentation Services Component
4.2 Oracle BI Server Component

5.0 Tuning Essbase

5.1 Essbase ASO Tuning