How to optimize performance with dedicated sql pool?
- Optimize Queries
- Transactions
- Execution Plan
- Statistics
- Data Load
- Data Movement
- Execution Plan
OPTIMIZE QUERIES
Transactions
- Transactions in Dedicated SQL Pool
- Isolation levels
- Default: READ UNCOMMITTED.
- Can be changed to READ COMMITTED SNAPSHOT ISOLATION at DB level, but then cant use READ UNCOMMITTED at session level
- Transaction size
- varies with DWU capacity
- ceilings are presumed when data is evenly distributed with HASH or ROUND_ROBIN. If data is skewed, limits will be reached earlier
- Trasaction state : XACT_STATE() returned with value -2
- Limitations
- No distributed transactions
- No nested transactions permitted
- No save points allowed
- No named transactions
- No marked transactions
- No support for DDL such as CREATE TABLE inside a user-defined transaction
- THROW, RAISERROR
- THROW is the more modern implementation for raising exceptions in dedicated SQL pool but RAISERROR is also supported
- THROW: 100K-150K. RAISERROR: 50K
- Use of sys.messages not supported
- Isolation levels
- Transaction Optimization Best Practices
- Transactions and logging
- Each distribution has its own transaction log, log writes are automatic but can be made efficient with good code
- use minimal logging where possible
- process data as scoped batch instead of individual transactions
- adopt partition switching patter for large updates
- Min vs Full logging
- minimal logging involves logging only the informati on that is required to roll back the transaction after a failure, or for an explicit request (ROLLBACK TRAN). Since minimally logged operations can participate in explicit transactions, they can be rolled back as well.
- CREATE INDEX, ALTER INDEX REBUILD, DROP INDEX, TRUNCATE TABLE, DROP TABLE, ALTER TABLE SWITCH PARTITION
- CREATE TABLE AS SELECT (CTAS), INSERT..SELECT (they both are bulk load options)
- Any writes to update CI or NCI will always be fully logged operation
- When loading to non-empty, CI table, operation is mix of full logged if writing to non empty page, and min logged if writing to empty page.
- Internal data movement operations (such as BROADCAST and SHUFFLE) are not affected by the transaction safety limit
- Optimize for DELETE and UPDATE using mil-logged options, or use Partition Switching. Another option is to use smaller batch.
- minimal logging involves logging only the informati on that is required to roll back the transaction after a failure, or for an explicit request (ROLLBACK TRAN). Since minimally logged operations can participate in explicit transactions, they can be rolled back as well.
- Pause, Scaling
- When you pause or scale your dedicated SQL pool, it is important to understand that any in-flight transactions are terminated immediately; causing any open transactions to be rolled back.
- Each distribution has its own transaction log, log writes are automatic but can be made efficient with good code
- Transactions and logging
- Key Call out’s
- Dynamic SQL
- Synapse doesnt support blob data type, meaning very large strings cant stored in memory. You need to break the code into chunks and use the EXEC statement instead.
- GROUP BY
- Limited support by dedicated SQL Pool, and not supported by serverless sql pool
- Dynamic SQL
Execution Plan
- MPP Plan (D-SQL)
- SMP Plan
Statistics
Stats in dedicated SQL Pool
- REF
- Auto creation of stats
- Update stats
- Last stats update
- Managing Stats
- USP to create stats on all columns in a Database
- Stats Metadata
- Catalog Views
- System Functions
- DBCC SHOW_STATISTICS()
Stats in serverless SQL Pool
- Auto creation of stats
- Manual creation of stats
- Update stats
- Managing stats
- Stats Metadata
- Catalog Views
- System Functions
- Stats Metadata
Data Loading
- Best Practices
- Prepare data in Azure Storage
- Run loads with enough compute
- Allow multiple users to load (DENY CONTROL across different schemas)
- Load to a staging table
- Load to a columnstore index
- Increase batch size when using SQLBulkCopy API or BCP
- Manage loading failures
- Insert data into a production table
- Create statistics after the load
- Rotate storage keys
- Delta Lake integration is a key ask which can be referenced here
Data Movement
- Shuffle_Move, and this, and another example, and [this]
- Broadcast_Move, and this
- ReturnOperation: ReturnOperation is the process of getting results at each Compute Node based on redistribution data.
- Distribute_Replicated_Table_Move
- Master_Table_Move
- Partition_Move
- Trim_Move