This post is a part of the DP-700: Implementing Data Engineering Solutions Using Microsoft Fabric Exam Prep Hub.
This topic falls under these sections:
Implement and manage an analytics solution (30–35%)
--> Configure Microsoft Fabric workspace settings
--> Configure Spark workspace settings
Note that there are 10 practice questions (with answers) at the end of each section to help you solidify your knowledge of the material. Also, there are 2 practice tests with 60 questions each available from the hub's main page below the exam topics section.
Introduction
One of the key responsibilities of a Fabric Data Engineer is configuring Spark settings at the workspace level. Proper Spark configuration helps ensure that notebooks, Spark job definitions, and Data Engineering workloads run efficiently, reliably, and cost-effectively.
For the DP-700 exam, you should understand the Spark settings available at the workspace level, when to modify them, and how they affect performance, scalability, concurrency, and resource consumption. Microsoft Fabric provides centralized Spark workspace settings that apply across Data Engineering and Data Science workloads within a workspace. (Microsoft Learn)
What Are Spark Workspace Settings?
Spark Workspace Settings are administrative configurations that control the default Spark behavior for a Fabric workspace.
These settings allow administrators to configure:
- Default Spark pools
- Starter pool behavior
- Default environments
- Spark job management
- High concurrency settings
- Automatic logging
- Session timeout settings
- Compute customization options
These settings are found under:
Workspace Settings → Data Engineering/Science → Spark Settings. (Microsoft Learn)
Why Spark Workspace Settings Matter
Without centralized Spark settings:
- Every notebook would require individual configuration.
- Resource consumption would be inconsistent.
- Performance could vary significantly.
- Capacity utilization would be difficult to control.
Workspace-level settings establish consistent defaults across all Spark workloads.
Benefits include:
- Standardized compute resources
- Faster notebook startup
- Better workload governance
- Improved capacity management
- Simplified administration
Spark Pools in Microsoft Fabric
Spark workloads run on Spark pools.
Fabric supports two primary options:
Starter Pools
Starter pools are pre-warmed Spark clusters maintained by Fabric.
Advantages:
- Extremely fast startup times
- Minimal administrative effort
- Automatically managed by Microsoft
- Ideal for development and general workloads
Starter pools use medium-sized nodes and can automatically scale based on workload demand. Workspace administrators can configure maximum node counts and executor limits based on capacity size. (Microsoft Learn)
When to Use Starter Pools
Use Starter Pools when:
- Fast startup is important
- Workloads are relatively standard
- Custom Spark configurations are unnecessary
- Development and testing workloads dominate
For many organizations, Starter Pools are sufficient for most notebook workloads.
Custom Spark Pools
Custom Spark Pools allow administrators to define:
- Node size
- Autoscaling settings
- Executor allocation
- Compute characteristics
Advantages:
- Greater control
- Better support for specialized workloads
- Ability to optimize for large-scale processing
Tradeoff:
- Session startup is typically slower than Starter Pools because compute must be provisioned. (Microsoft Learn)
Configuring the Default Pool
A workspace can specify a default Spark pool.
Options include:
- Starter Pool
- Workspace-level Custom Pool
- Capacity-level Custom Pool
When users launch notebooks or Spark jobs without explicitly selecting a pool, the workspace default is used. (Microsoft Learn)
DP-700 Exam Tip
Know the distinction:
- Starter Pool = fastest startup
- Custom Pool = greatest control
Microsoft frequently tests scenarios where you must balance startup speed against customization requirements.
Configuring Starter Pool Settings
Administrators can customize Starter Pool behavior.
Common settings include:
Autoscale
Autoscaling allows Spark resources to expand and contract automatically based on workload demand.
Benefits:
- Better resource utilization
- Reduced waste
- Improved scalability
Autoscaling is enabled by default. (Microsoft Learn)
Dynamic Executor Allocation
Dynamic allocation automatically adjusts the number of executors used by Spark jobs.
Benefits:
- Better performance
- Reduced idle resources
- More efficient capacity usage
This setting is also enabled by default. (Microsoft Learn)
Maximum Nodes
Administrators can define the maximum number of nodes available to Starter Pools.
Higher limits:
- Support larger workloads
- Consume more capacity resources
Lower limits:
- Reduce resource consumption
- May slow large jobs
The available maximum depends on the Fabric capacity SKU. (Microsoft Learn)
Default Environment Configuration
Fabric allows administrators to configure a workspace-level default environment.
An environment can define:
- Spark runtime version
- Libraries
- Compute settings
- Spark configurations
Benefits:
- Consistency across notebooks
- Simplified deployment
- Easier governance
When a default environment is configured, new notebooks automatically inherit those settings. (Microsoft Learn)
Spark Runtime Version
The workspace default environment can specify the Spark runtime version.
Examples include:
- Runtime 1.2
- Runtime 1.3
- Future Fabric runtime releases
Benefits:
- Consistent execution behavior
- Predictable package compatibility
- Easier testing and validation
A common exam scenario involves selecting a runtime version to ensure compatibility with libraries or workloads.
High Concurrency Mode
High Concurrency allows multiple notebook executions to share Spark resources.
Benefits include:
- Improved resource utilization
- Reduced capacity consumption
- Increased throughput
Workspace administrators can enable high concurrency for:
- Interactive notebook runs
- Pipeline notebook runs
High Concurrency settings are configured at the workspace level. (Microsoft Learn)
When High Concurrency Is Useful
Consider enabling it when:
- Many notebooks run simultaneously
- Workloads are lightweight
- Capacity utilization is a concern
Job Management Settings
Workspace Spark settings also include Spark job management controls.
Session Timeout
Administrators can configure how long inactive Spark sessions remain active.
Benefits of shorter timeouts:
- Reduced resource consumption
- Lower capacity usage
Benefits of longer timeouts:
- Better user experience
- Less frequent cluster startup
The timeout can be configured up to 14 days. (Microsoft Learn)
Conservative Job Admission
Conservative Job Admission determines how Fabric allocates Spark resources.
Enabled
Fabric reserves the maximum cores potentially required by active jobs.
Benefits:
- Improved reliability
- Reduced risk of resource contention
Tradeoff:
- Fewer jobs may run simultaneously
Disabled
Fabric allocates only the minimum required cores initially.
Benefits:
- More concurrent jobs
Tradeoff:
- Potential resource competition if jobs scale up later
This setting is particularly important for capacity planning and workload management. (Microsoft Learn)
Automatic Logging
Automatic Logging can be enabled at the workspace level.
Purpose:
- Automatically capture Spark execution information
- Support troubleshooting
- Improve monitoring
- Assist machine learning experiment tracking
Administrators can enable or disable automatic logging through Spark Workspace Settings. (Microsoft Learn)
Customize Compute Settings
Workspace administrators can determine whether users may override workspace compute defaults.
This governance feature helps organizations:
- Standardize Spark usage
- Prevent excessive resource consumption
- Improve compliance
Fabric environments can also provide workload-specific compute settings while maintaining centralized governance. (Microsoft Learn)
DP-700 Exam Focus Areas
You should be comfortable answering questions about:
✓ Starter Pools
✓ Custom Spark Pools
✓ Autoscaling
✓ Dynamic Executor Allocation
✓ Default Pool Selection
✓ Default Environment Configuration
✓ Spark Runtime Versions
✓ High Concurrency
✓ Session Timeout Settings
✓ Conservative Job Admission
✓ Automatic Logging
✓ Compute Governance
10 DP-700 Practice Questions
Question 1
You need Spark sessions to start as quickly as possible for notebook developers.
Which pool type should you configure as the workspace default?
A. Starter Pool
B. Custom Pool
C. Dedicated SQL Pool
D. KQL Pool
Answer: A
Question 2
Which Starter Pool feature automatically increases or decreases resources based on workload demand?
A. Dynamic Partitioning
B. Autoscale
C. High Concurrency
D. Session Timeout
Answer: B
Question 3
A workspace administrator wants Spark executors to be allocated and released automatically as workload demands change.
Which setting should be enabled?
A. Conservative Job Admission
B. Automatic Logging
C. Dynamic Executor Allocation
D. High Concurrency
Answer: C
Question 4
You need multiple notebooks to share Spark resources and improve capacity utilization.
Which Spark setting should you enable?
A. Autoscale
B. Automatic Logging
C. Dynamic Allocation
D. High Concurrency
Answer: D
Question 5
What is the primary purpose of a workspace default environment?
A. Configure Power BI semantic models
B. Define Spark runtime and related settings for workloads
C. Configure capacity metrics
D. Manage OneLake shortcuts
Answer: B
Question 6
Which setting controls how long an inactive Spark session remains active before termination?
A. Dynamic Allocation
B. High Concurrency
C. Session Timeout
D. Autoscale
Answer: C
Question 7
An administrator wants to maximize Spark job reliability by reserving sufficient cores for jobs that may scale up.
Which setting should be enabled?
A. Conservative Job Admission
B. Dynamic Allocation
C. Automatic Logging
D. Session Timeout
Answer: A
Question 8
Which Spark workspace feature automatically records Spark execution information for monitoring and troubleshooting?
A. High Concurrency
B. Autoscale
C. Dynamic Allocation
D. Automatic Logging
Answer: D
Question 9
What is a key advantage of a Custom Spark Pool compared to a Starter Pool?
A. Faster startup times
B. Greater control over compute configuration
C. No capacity consumption
D. Automatic logging support
Answer: B
Question 10
A Fabric administrator wants notebook authors to use standardized compute configurations across the workspace.
Which approach should be used?
A. Disable Autoscale
B. Reduce Session Timeout
C. Configure a default environment
D. Disable Dynamic Allocation
Answer: C
This topic is tested frequently because Spark settings directly influence performance, scalability, governance, and cost management across Microsoft Fabric Data Engineering workloads. Understanding the interaction between pools, environments, concurrency, and job management settings is essential for success on the DP-700 exam.
Go to the DP-700 Exam Prep Hub main page.
