Azure Synapse - Optimize SQL Pools - Query Optimization

How to optimize performance with dedicated sql pool?

Optimize Queries
- Transactions
- Execution Plan
- Statistics
- Data Load
- Data Movement
- Execution Plan

OPTIMIZE QUERIES

Transactions

Transactions in Dedicated SQL Pool
- Isolation levels
  - Default: READ UNCOMMITTED.
  - Can be changed to READ COMMITTED SNAPSHOT ISOLATION at DB level, but then cant use READ UNCOMMITTED at session level
- Transaction size
  - varies with DWU capacity
  - ceilings are presumed when data is evenly distributed with HASH or ROUND_ROBIN. If data is skewed, limits will be reached earlier
- Trasaction state : XACT_STATE() returned with value -2
- Limitations
  - No distributed transactions
  - No nested transactions permitted
  - No save points allowed
  - No named transactions
  - No marked transactions
  - No support for DDL such as CREATE TABLE inside a user-defined transaction
- THROW, RAISERROR
  - THROW is the more modern implementation for raising exceptions in dedicated SQL pool but RAISERROR is also supported
  - THROW: 100K-150K. RAISERROR: 50K
  - Use of sys.messages not supported
Transaction Optimization Best Practices
- Transactions and logging
  - Each distribution has its own transaction log, log writes are automatic but can be made efficient with good code
    - use minimal logging where possible
    - process data as scoped batch instead of individual transactions
    - adopt partition switching patter for large updates
  - Min vs Full logging
    - minimal logging involves logging only the informati on that is required to roll back the transaction after a failure, or for an explicit request (ROLLBACK TRAN). Since minimally logged operations can participate in explicit transactions, they can be rolled back as well.
      - CREATE INDEX, ALTER INDEX REBUILD, DROP INDEX, TRUNCATE TABLE, DROP TABLE, ALTER TABLE SWITCH PARTITION
      - CREATE TABLE AS SELECT (CTAS), INSERT..SELECT (they both are bulk load options)
        
        Any writes to update CI or NCI will always be fully logged operation
        
        When loading to non-empty, CI table, operation is mix of full logged if writing to non empty page, and min logged if writing to empty page.
      - Internal data movement operations (such as BROADCAST and SHUFFLE) are not affected by the transaction safety limit
    - Optimize for DELETE and UPDATE using mil-logged options, or use Partition Switching. Another option is to use smaller batch.
  - Pause, Scaling
    - When you pause or scale your dedicated SQL pool, it is important to understand that any in-flight transactions are terminated immediately; causing any open transactions to be rolled back.
Key Call out’s
- Dynamic SQL
  - Synapse doesnt support blob data type, meaning very large strings cant stored in memory. You need to break the code into chunks and use the EXEC statement instead.
- GROUP BY
  - Limited support by dedicated SQL Pool, and not supported by serverless sql pool

Execution Plan

MPP Plan (D-SQL)
SMP Plan

Statistics

Stats in dedicated SQL Pool

REF
Auto creation of stats
Update stats
Last stats update
Managing Stats
- USP to create stats on all columns in a Database
- Stats Metadata
  - Catalog Views
  - System Functions
  - DBCC SHOW_STATISTICS()
    Stats in serverless SQL Pool
Auto creation of stats
Manual creation of stats
Update stats
Managing stats
- Stats Metadata
  - Catalog Views
  - System Functions

Data Loading

Best Practices
- Prepare data in Azure Storage
- Run loads with enough compute
- Allow multiple users to load (DENY CONTROL across different schemas)
- Load to a staging table
- Load to a columnstore index
- Increase batch size when using SQLBulkCopy API or BCP
- Manage loading failures
- Insert data into a production table
- Create statistics after the load
- Rotate storage keys
Delta Lake integration is a key ask which can be referenced here

Data Movement

Shuffle_Move, and this, and another example, and [this]
Broadcast_Move, and this
ReturnOperation: ReturnOperation is the process of getting results at each Compute Node based on redistribution data.
Distribute_Replicated_Table_Move
Master_Table_Move
Partition_Move
Trim_Move

Azure Synapse - Optimize SQL Pools - Query Optimization

OPTIMIZE QUERIES

Transactions

Execution Plan

Statistics

Stats in dedicated SQL Pool

Stats in serverless SQL Pool

Data Loading

Data Movement