Configure Spark workspace settings (DP-700 Exam Prep)

This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub. 
This topic falls under these sections:
Implement and manage an analytics solution (30–35%)
--> Configure Microsoft Fabric workspace settings
--> Configure Spark workspace settings


Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.

Introduction

One of the key responsibilities of a Fabric Data Engineer is configuring Spark settings at the workspace level. Proper Spark configuration helps ensure that notebooks, Spark job definitions, and Data Engineering workloads run efficiently, reliably, and cost-effectively.

For the DP-700 exam, you should understand the Spark settings available at the workspace level, when to modify them, and how they affect performance, scalability, concurrency, and resource consumption. Microsoft Fabric provides centralized Spark workspace settings that apply across Data Engineering and Data Science workloads within a workspace. (Microsoft Learn)


What Are Spark Workspace Settings?

Spark Workspace Settings are administrative configurations that control the default Spark behavior for a Fabric workspace.

These settings allow administrators to configure:

  • Default Spark pools
  • Starter pool behavior
  • Default environments
  • Spark job management
  • High concurrency settings
  • Automatic logging
  • Session timeout settings
  • Compute customization options

These settings are found under:

Workspace Settings → Data Engineering/Science → Spark Settings. (Microsoft Learn)


Why Spark Workspace Settings Matter

Without centralized Spark settings:

  • Every notebook would require individual configuration.
  • Resource consumption would be inconsistent.
  • Performance could vary significantly.
  • Capacity utilization would be difficult to control.

Workspace-level settings establish consistent defaults across all Spark workloads.

Benefits include:

  • Standardized compute resources
  • Faster notebook startup
  • Better workload governance
  • Improved capacity management
  • Simplified administration

Spark Pools in Microsoft Fabric

Spark workloads run on Spark pools.

Fabric supports two primary options:

Starter Pools

Starter pools are pre-warmed Spark clusters maintained by Fabric.

Advantages:

  • Extremely fast startup times
  • Minimal administrative effort
  • Automatically managed by Microsoft
  • Ideal for development and general workloads

Starter pools use medium-sized nodes and can automatically scale based on workload demand. Workspace administrators can configure maximum node counts and executor limits based on capacity size. (Microsoft Learn)

When to Use Starter Pools

Use Starter Pools when:

  • Fast startup is important
  • Workloads are relatively standard
  • Custom Spark configurations are unnecessary
  • Development and testing workloads dominate

For many organizations, Starter Pools are sufficient for most notebook workloads.


Custom Spark Pools

Custom Spark Pools allow administrators to define:

  • Node size
  • Autoscaling settings
  • Executor allocation
  • Compute characteristics

Advantages:

  • Greater control
  • Better support for specialized workloads
  • Ability to optimize for large-scale processing

Tradeoff:

  • Session startup is typically slower than Starter Pools because compute must be provisioned. (Microsoft Learn)

Configuring the Default Pool

A workspace can specify a default Spark pool.

Options include:

  • Starter Pool
  • Workspace-level Custom Pool
  • Capacity-level Custom Pool

When users launch notebooks or Spark jobs without explicitly selecting a pool, the workspace default is used. (Microsoft Learn)

DP-700 Exam Tip

Know the distinction:

  • Starter Pool = fastest startup
  • Custom Pool = greatest control

Microsoft frequently tests scenarios where you must balance startup speed against customization requirements.


Configuring Starter Pool Settings

Administrators can customize Starter Pool behavior.

Common settings include:

Autoscale

Autoscaling allows Spark resources to expand and contract automatically based on workload demand.

Benefits:

  • Better resource utilization
  • Reduced waste
  • Improved scalability

Autoscaling is enabled by default. (Microsoft Learn)


Dynamic Executor Allocation

Dynamic allocation automatically adjusts the number of executors used by Spark jobs.

Benefits:

  • Better performance
  • Reduced idle resources
  • More efficient capacity usage

This setting is also enabled by default. (Microsoft Learn)


Maximum Nodes

Administrators can define the maximum number of nodes available to Starter Pools.

Higher limits:

  • Support larger workloads
  • Consume more capacity resources

Lower limits:

  • Reduce resource consumption
  • May slow large jobs

The available maximum depends on the Fabric capacity SKU. (Microsoft Learn)


Default Environment Configuration

Fabric allows administrators to configure a workspace-level default environment.

An environment can define:

  • Spark runtime version
  • Libraries
  • Compute settings
  • Spark configurations

Benefits:

  • Consistency across notebooks
  • Simplified deployment
  • Easier governance

When a default environment is configured, new notebooks automatically inherit those settings. (Microsoft Learn)


Spark Runtime Version

The workspace default environment can specify the Spark runtime version.

Examples include:

  • Runtime 1.2
  • Runtime 1.3
  • Future Fabric runtime releases

Benefits:

  • Consistent execution behavior
  • Predictable package compatibility
  • Easier testing and validation

A common exam scenario involves selecting a runtime version to ensure compatibility with libraries or workloads.


High Concurrency Mode

High Concurrency allows multiple notebook executions to share Spark resources.

Benefits include:

  • Improved resource utilization
  • Reduced capacity consumption
  • Increased throughput

Workspace administrators can enable high concurrency for:

  • Interactive notebook runs
  • Pipeline notebook runs

High Concurrency settings are configured at the workspace level. (Microsoft Learn)

When High Concurrency Is Useful

Consider enabling it when:

  • Many notebooks run simultaneously
  • Workloads are lightweight
  • Capacity utilization is a concern

Job Management Settings

Workspace Spark settings also include Spark job management controls.

Session Timeout

Administrators can configure how long inactive Spark sessions remain active.

Benefits of shorter timeouts:

  • Reduced resource consumption
  • Lower capacity usage

Benefits of longer timeouts:

  • Better user experience
  • Less frequent cluster startup

The timeout can be configured up to 14 days. (Microsoft Learn)


Conservative Job Admission

Conservative Job Admission determines how Fabric allocates Spark resources.

Enabled

Fabric reserves the maximum cores potentially required by active jobs.

Benefits:

  • Improved reliability
  • Reduced risk of resource contention

Tradeoff:

  • Fewer jobs may run simultaneously

Disabled

Fabric allocates only the minimum required cores initially.

Benefits:

  • More concurrent jobs

Tradeoff:

  • Potential resource competition if jobs scale up later

This setting is particularly important for capacity planning and workload management. (Microsoft Learn)


Automatic Logging

Automatic Logging can be enabled at the workspace level.

Purpose:

  • Automatically capture Spark execution information
  • Support troubleshooting
  • Improve monitoring
  • Assist machine learning experiment tracking

Administrators can enable or disable automatic logging through Spark Workspace Settings. (Microsoft Learn)


Customize Compute Settings

Workspace administrators can determine whether users may override workspace compute defaults.

This governance feature helps organizations:

  • Standardize Spark usage
  • Prevent excessive resource consumption
  • Improve compliance

Fabric environments can also provide workload-specific compute settings while maintaining centralized governance. (Microsoft Learn)


DP-700 Exam Focus Areas

You should be comfortable answering questions about:

✓ Starter Pools

✓ Custom Spark Pools

✓ Autoscaling

✓ Dynamic Executor Allocation

✓ Default Pool Selection

✓ Default Environment Configuration

✓ Spark Runtime Versions

✓ High Concurrency

✓ Session Timeout Settings

✓ Conservative Job Admission

✓ Automatic Logging

✓ Compute Governance


10 DP-700 Practice Questions

Question 1

You need Spark sessions to start as quickly as possible for notebook developers.

Which pool type should you configure as the workspace default?

A. Starter Pool

B. Custom Pool

C. Dedicated SQL Pool

D. KQL Pool

Answer: A


Question 2

Which Starter Pool feature automatically increases or decreases resources based on workload demand?

A. Dynamic Partitioning

B. Autoscale

C. High Concurrency

D. Session Timeout

Answer: B


Question 3

A workspace administrator wants Spark executors to be allocated and released automatically as workload demands change.

Which setting should be enabled?

A. Conservative Job Admission

B. Automatic Logging

C. Dynamic Executor Allocation

D. High Concurrency

Answer: C


Question 4

You need multiple notebooks to share Spark resources and improve capacity utilization.

Which Spark setting should you enable?

A. Autoscale

B. Automatic Logging

C. Dynamic Allocation

D. High Concurrency

Answer: D


Question 5

What is the primary purpose of a workspace default environment?

A. Configure Power BI semantic models

B. Define Spark runtime and related settings for workloads

C. Configure capacity metrics

D. Manage OneLake shortcuts

Answer: B


Question 6

Which setting controls how long an inactive Spark session remains active before termination?

A. Dynamic Allocation

B. High Concurrency

C. Session Timeout

D. Autoscale

Answer: C


Question 7

An administrator wants to maximize Spark job reliability by reserving sufficient cores for jobs that may scale up.

Which setting should be enabled?

A. Conservative Job Admission

B. Dynamic Allocation

C. Automatic Logging

D. Session Timeout

Answer: A


Question 8

Which Spark workspace feature automatically records Spark execution information for monitoring and troubleshooting?

A. High Concurrency

B. Autoscale

C. Dynamic Allocation

D. Automatic Logging

Answer: D


Question 9

What is a key advantage of a Custom Spark Pool compared to a Starter Pool?

A. Faster startup times

B. Greater control over compute configuration

C. No capacity consumption

D. Automatic logging support

Answer: B


Question 10

A Fabric administrator wants notebook authors to use standardized compute configurations across the workspace.

Which approach should be used?

A. Disable Autoscale

B. Reduce Session Timeout

C. Configure a default environment

D. Disable Dynamic Allocation

Answer: C


This topic is tested frequently because Spark settings directly influence performance, scalability, governance, and cost management across Microsoft Fabric Data Engineering workloads. Understanding the interaction between pools, environments, concurrency, and job management settings is essential for success on the DP-700 exam.


Go to the DP-700 Exam Prep Hub main page.

Leave a comment