UK SQL Server User Group https://sqlserverfaq.com Community of Microsoft Data Platform Professionals Thu, 09 Feb 2017 07:58:26 +0000 en-GB hourly 1 https://wordpress.org/?v=4.7.5 https://sqlserverfaq.com/wp-content/uploads/2016/01/cropped-UKSQLFAQLogo-1-32x32.png UK SQL Server User Group https://sqlserverfaq.com 32 32 Compare The Market – Data Science vs Analytical Skills http://sqlserver-qa.net/2016/10/19/compare-the-market-data-science-vs-analytical-skills/ Wed, 19 Oct 2016 10:06:44 +0000 http://sqlserver-qa.net/?p=1556 Read more →

]]>


“You can have data without information, but you cannot have information without data.” – Daniel Keys Moran

By far, from my experience it is essential to build your argument/position having a supportive data at hand, we can’t just win the arguments ‘..I think..‘ rather ‘…here is the proof…’. If you cannot explain it simply, you don’t understand it well enough!

The data management is a key holder in any business, which differentiates today’s thriving organizations.

Data in all forms & sizes is being generated faster than ever before

At many organisations there is a process to setup the struggle between IT and Business have to run a huge portfolio of apps, the business always wants more apps but IT are struggling to just keep running what you have. In part this is also to cement that you are an expert and you understand their challenges. So the ideal aspect would be not to deal data management as another project, rather design the solution as an evolving process.

  • What makes is so special that data science is becoming popular?
  • How could you elevate data mining/machine learning skills for data science?
  • Where will the statistical and operational research can help you to accomplish a stepping stone career in data science?

At current times the new titles within job industry is a buzz word, as the core job roles and responsibilities have been associated. By far whoever is dealing with data either collecting or analysing, they would be called a data analyst. You will need to draw a line (or a virtual wall) between data analysis and business intelligence developer. Based on Big Data University reference these are essential skills and tools for the data analysts need to have a baseline understanding of the following:

  • Skills: statistics, data munging, data visualization, exploratory data analysis
  • Tools: Excel, SPSS, SPSS Modeler, SAS, SAS Miner, SQL, Access, Tableau, SSAS.

What differentiates Data Engineers is that who prepares the ‘data’ infrastructure that will be analysed by the Data Scientists. Not to mention that software engineers are essential to design, build and integrate data from different resources. Having to write complex queries on the data, make it possible to access, analyse and process the data in optimising the business performance, this is what Big Data ecosystem is.

Over a period of time from the maturity of RDBMS platform both data warehouse and business intelligence have been evolved as a key route for organisation success and business growth. Making this as a baseline to the core, the IT skills must build several analytical disciplines that can help the organisation to grow within Data Platform.

So the mathematical skills are essential in this discipline that will create differences and denominators, by design. There a multiple categories that how best data science and data scientists domain is increasing, see here.

 

A key skill to develop in Analytics is to build knowledge in whole spectrum of business acumen and domain expertise. So there is no doubt that if you have mathematical skills build upon your academics will help the individual to step into data science world at a better place. The data science will sprawl across multiple disciplines and domains with a dominance. Based on my research and collection the following are highlights of where one can begin their data science journey:

  • Computer Science – branched into multiple sectors of software, hardware, application and business arenas. The new concepts are data plumbing (in-memory analytics), machine learning programming, modeling (Python, R etc.) and RFID/Streaming data analytics.
  • Statistician – a baseline to perform series of experiments by testing, cross validating, sampling and programming methods.
  • Data Mining – there is an evidence that both data mining and machine learning overlaps between them. Either of these will land you in the core of data science
  • Research – operational research, building optimisations and techniques will let you to encompass into data analytics & data science.
  • Business Intellience/Data Warehous – from the matured RDBMS world that both of these aspects have better benchmarking in desgining, creating, generating KPIs, database schemas, dashboard design and visualisations based on the data-driven strategies to build/optimize/abstract better decisions & ROI.
  • Machine Learning – there is need to sustain the new changes in the IT field with this discipline which is closely related to the data mining. This trade is very specific in building algorithms and design automated prototypes based on data-sets.  A further dive into building core algorithms include clustering and supervised classification, rule systems, and scoring techniques is a hot-trade now and a flavour of AI (artificial intelligence) is bonus for you. This is where Pyton and R balances.

Few more references from the world wide web (mainly from Analytical Bridge website):

  •  Data mining: This discipline is about designing algorithms to extract insights from rather large and potentially unstructured data (text mining), sometimes called nugget discovery, for instance unearthing a massive Techniques include pattern recognition, feature selection, clustering, supervised classification and encompasses a few statistical techniques (though without the p-values or confidence intervals attached to most statistical methods being used). Data mining is applied computer engineering, rather than a mathematical science. Data miners use open source and software such as Rapid Miner.
  •  Predictive modeling: Not a discipline per se, this modeling projects occur in all industries across all disciplines. Predictive modeling applications aim at predicting future based on past data, usually but not always based on statistical modeling. Predictions often come with confidence intervals. Roots of predictive modeling are in statistical science.
  •  Statistics. Currently, statistics is mostly about surveys (typically performed with SPSS software), theoretical academic research, bank and insurance analytics (marketing mix optimization, cross-selling, fraud detection, usually with SAS and R), statistical programming, social sciences, global warming research (and space weather modeling), economic research, clinical trials (pharmaceutical industry), medical statistics, epidemiology, bio-statistics and government statistics.

Jobs requiring a security clearance are well paid and relatively secure, but the well paid jobs in the pharmaceutical industry (the golden goose for statisticians) are threatened by a number of factors – outsourcing, company mergings, and pressures to make healthcare affordable.

  • Mathematical optimization. Solves business optimization problems with techniques such as the simplex algorithm, Fourier transforms (signal processing), differential equations, and software such as Matlab. These applied mathematicians are found in big companies such as IBM, research labs, NSA (cryptography) and in the finance industry (sometimes recruiting physics or engineer graduates). Mathematical optimization is however closer to operations research than statistics, the choice of hiring a mathematician rather than another practitioner (data scientist) is often dictated by historical reasons, especially for organizations such as NSA or IBM.
  •  Actuarial sciences. A key, just a subset of statistics focusing on insurance (car, health, etc.) using survival models: predicting when you will die, what your health expenditures will be based on your health status (smoker, gender, previous diseases) to determine your insurance premiums. They have seen their average salary increase nicely over time: access to profession is restricted and regulated just like for lawyers, for no other reasons than protectionism to boost salaries and reduce the number of qualified applicants to job openings. Actuarial sciences is indeed data science (a sub-domain).
  •  HPC. High performance computing, not a discipline per se, but should be of concern to data scientists, big data practitioners, computer scientists and mathematicians, as it can redefine the computing paradigms in these fields. HPC should not be confused with Hadoop and Map-Reduce. HPC is hardware-related, Hadoop is software-related (though heavily relying on Internet bandwidth and servers configuration and proximity).
  •  Six sigma. It’s more a way of thinking (a business philosophy, if not a cult) rather than a discipline, used for quality control and to optimize engineering processes. Applied, simple statistics are used (simple stuff works must of the time, I agree), and the idea is to eliminate sources of variances in business processes, to make them more predictable and improve quality.
  • Artificial intelligence. It’s coming back. The intersection with data science is pattern recognition (image analysis) and the design of automated (some would say intelligent) systems to perform various tasks, in machine-to-machine communication mode, such as identifying the right keywords (and right bid) on Google AdWords (pay-per-click campaigns involving millions of keywords per day).
  • Data engineering. New kid on the block, performed by software engineers (developers) or architects (designers) in large organizations (sometimes by data scientists in tiny companies), this is the applied part of computer science that allow all sorts of data to be easily processed in-memory or near-memory, and to flow nicely to (and between) end-users, including heavy data consumers such as data scientists.
  • Business intelligence. Abbreviated as BI. Focuses on dashboard creation, metric selection, producing and scheduling data reports (statistical summaries) sent by email or delivered/presented to executives, competitive intelligence (analyzing third party data), as well as involvement in database schema design (working with data architects) to collect useful, actionable business data efficiently.
  • Data analysis. This is the new term for business statistics since at least 1995, and it covers a large spectrum of applications including fraud detection, advertising mix modeling, attribution modeling, sales forecasts, cross-selling optimization (retails), user segmentation, churn analysis, computing long-time value of a customer and cost of acquisition, and so on.
  • Business analytics. Same as data analysis, but restricted to business problems only. Tends to have a bit more of a financial, marketing or ROI flavor.

The first step is to discover yourself as an analyst ‘by nature’ or developer by inclination within the IT world. Sometimes the job title will mislead, so it is better to read the definition of the role and list out where you will excel. The four pillars to gain the excellence are: university degree, technical skills, business skills (new requirement) and professional certification.

Finally, networking is essential to know the latest-happenings in the world and see how a simple business is attempting to make big change in day-to-day life. If you are a ‘geek’ then participate in ‘hackathon’ type of events or as a developer you could contribute to the technical community as open source projects (search for github).

Microsoft has started the new professional program called Data Science Degree program and  requirements beyond the current published schedule and Course schedule is now available.

]]>
Leeds SQL Server User Group https://sqlserverfaq.com/blog/2016/10/09/leeds-sql-server-user-group/ Sat, 08 Oct 2016 23:00:37 +0000 https://sqlserverfaq.com/?p=11278  

SQL Server User Group based in Leeds with presentations on SQL Server related topics covering all areas such as Azure, Big Data, BI, Programming, Performance Tuning, Management, Architecture and Design.

Coming Events

Leeds SQL Server User Group - 31st May 2017 –   Europe/London @ 2017-05-31T17:30:00

]]> Who did what to my database and when… http://dataidol.com/davebally/2016/08/22/who-did-what-to-my-database-and-when/ Mon, 22 Aug 2016 12:26:36 +0000 http://dataidol.com/davebally/?p=2507 Continue reading ]]> One of the most popular questions on forums / SO etc is, “How can i find out who dropped a table, truncated a table, dropped a procedure….” etc.   Im sure we have all been there,  something changes ( maybe schema or data ) and we have no way of telling who did it and when.

SQL auditing can get you there, and i’m not suggesting you shouldn’t use that,  but what if that is not setup for monitoring the objects you are interested in ?

If you look back through my posts, I have been quite interested in ScriptDom and TSQL parsing,  so if you have a trace of all the actions that have taken place over a time period, you can parse the statements to find the activity.

Drawing inspiration ( cough, stealing ) from Ed Elliot’s dacpac explorer,  I have created a git repo for a simple parser that is created using T4 templates. T4 templates are an ideal use here as the full language domain can be exploded out automatically and then you as a developer can add in your code to cherry pick the parts you are interested in.

At this time the project has hooks to report on SchemaObjectName objects , so any time any object is referenced be that in a CREATE, SELECT , DROP, MERGE …  will fall into the function OnSchemaObjectName and  OnDropTableStatement that will be hit when DROP TABLE is used.

This is not intended to be a full end to end solution, not least as the SQL to be parsed could be coming from and number of sources,  but if you have the skills to run it as is , you probably have enough to tailor it for your requirements.

As ever,  let me know what you think and any comments gratefully received.

The github repo is   :   https://github.com/davebally/TSQLParse

 

]]>
Solving The Issue Of SQL Server Physical File Fragmentation http://sqlserver-qa.net/2016/08/01/solving-physical-file-fragmentation-issue/ Mon, 01 Aug 2016 07:18:58 +0000 http://sqlserver-qa.net/?p=1546 Read more →

]]>


SQL Server physical file fragmentation causes major performance issue in database, it happens when data is deleted from a drive and left small gaps to be filled by new data files. In File fragmentation the logical sequential pages are not exist in physical sequence. When there is physical file fragmentation, auto-growing files will not get the sufficient continuous space, therefore the files get scattered throughout the hard drive.

The physical file fragmentation cause slow access or seek time as for the time taken for accessing the data is increased and also, system needs to find all fragments of file before opening the file.

In addition, the data file pages are Out-of-order that also increases the seek time. To lessen the seek time, user can defrag the fragmented file. In this article we will discuss the problem of SQL Server physical file fragmentation and the way to defrag the file.

Problem

Usually DBA do not consider the SQL Server physical file fragmentation as a big issue. However, it takes lots of time to access fragmented file as compare to the file stored in continuous storage space.

If the auto grow option is enabled in the file and the file is heavily fragmented, in that case the files can not grow beyond a certain limit, which may cause error 665 in the system.

Cause of File Fragmentation

  • If DBA performs backup operation repeatedly, this could leads physical file fragmentation in the SQL server.
  • If DBA shares database server space with other applications such as web server, Sharepoint, etc.causes disk file fragmentation as the space allocated to these applications is not continuous.

Solution

SQL Server physical file fragmentation can be fixed with he help of Windows utilities, there is a tool called Sysinternal’s Contig (contig.exe) tool which is a free utility from Microsoft. This tool will create a new files that are contiguous in nature.

It is a great tool that will show fragmentation of files and allow them to be defragmentated.

DBA can easily deploy this tool, to analyze the fragmentation of a specific file, DBA can use contig-a option.

SQL-Server-Physical-File-Fragmentation

To defrag the file, DBA can run simple command Contig

Note: To defrag any database, it must be in Offline state.

User Can Follow The Given Steps To Defrag The Database:

  • In order to defrag the database, user needs to bring it OFFLINE
  • ALTER DATABASE [Database name] SET OFFLINE

  • Use Contig [Filename] command, to defrag the file
  • Again bring back the database in ONLINE state
  • ALTER DATABASE [Database name] SET ONLINE

Other Practices That Resist Fragmentation

  • By keeping data files and log file on different physical disk arrays.
  • User can fix the problem of out of order page by reorganizing the index with altered index statements or with the help of SQL server maintenance plan. This problem is arises when data file pages are Out of order.
  • The database file should be sized well and autogrowth must be set to suitable value.
  • Monitor fragmentation with the help of Microsoft tools.
  • Set up plans for the SQL server maintenance.

Conclusion

The issue of SQL Server physical file fragmentation is a curable problem, DBA can easily fix this problem with the help of Microsoft tools.

]]>
Guidelines For SQL Database Performance Monitoring http://sqlserver-qa.net/2016/07/19/database-performance-monitoring/ Tue, 19 Jul 2016 13:31:06 +0000 http://sqlserver-qa.net/?p=1538 Read more →

]]>


Overview

Monitoring SQL Server databases and instances provide the necessary information to diagnose and troubleshoot the performance issues. Once its performance is tuned then, it has to be constantly monitored as everyday data schema. Its configuration changes may lead to a situation where additional and manual tuning is required. In the following section, we will be discussing the SQL database performance monitoring.

What are the Metrics to Monitor on SQL Server?

The monitoring of the metrics depends upon the performance goals. However, there is a wide range of usually monitored metrics, which offers the information sufficient for basic troubleshooting. It could be monitored in a way to find the root cause that is based on its values, additional or more metrics that are specific memory and processor usage, disk activity, and network traffic. SQL Server offers two built-in monitoring features, i.e. Active Monitoring and Data Collectors.

Active Monitoring

It tracks the most useful performance metrics of SQL Server. To get them, it executes the queries in every 10 seconds that are against its host SQL Server instance. The performance is monitored when the activity monitored is open that makes it a lightweight solution without overhead. The metrics are shown as discussed in five collapsible panes:

  • Overview: The pane that views the processor number of waiting tasks, time percentage, number of batch requests, and database I/O operations.
  • Processes: It shows the presently running SQL Server processes for each database on the instance. It shows the information of application, login, task state, wait time, command, host used, etc. It filters the specific column value from the information table. It offers useful feature of troubleshooting and deeper analysis. It is tracing the selected procedure in SQL Server profiler.
  • Resource Waits: It shows waits for various resources, i.e. memory, network, complication, etc. It displays the wait time, cumulative wait time, recent wait time, and average waiter counter.
  • Data File I/O: It shows a list of all database files: NDF, LDF, and MDF, their paths, names, recent read and write activity, and response time.
  • Recent Expensive Queries: It shows the queries that are executed in last 30 seconds, which used most of the hardware resources such as memory disk, network, and processor. It enables opening the query in query tab of management studio and opening its plan for execution.
  • Way to Utilize Active Monitoring

    It can be opened with SQL Server Management Studio toolbar’s activity monitor icon, context menu in SQL Server instance, keyboard Ctrl +Alt + A in object explorer. It only tracks a pre-defined set of the important metrics in SQL Server. The metrics cannot be monitored but the monitored one cannot be removed. Therefore, only the monitoring of real time is possible. It does not have any option to store the history for future usage. Therefore, Activity Monitoring is useful for the present monitoring and basic troubleshooting, threshold defined values, monitoring tool where metrics for monitoring can be selected, etc. data storage is necessary.

    Data Collectors

    It is another built-in performance for monitoring and tuning features of Management studio. It collects the metrics performance from Server instances and stores it in local repository, so that it can be used for further analysis. It uses SQL Server Agent, Integration Services, and Data Warehousing. It allows the user to specify the metrics that user will monitor. It provides three built-in monitor data collectors with the important and commonly monitored performance metrics. If user needs to monitor additional metrics performance, then custom data collectors can be created by using T-SQL code or API.

    Method to use Data Collector

    Users can follow the steps the mentioned steps but before that user must check that Data collection, Management Data Warehouse, and SQL Server Agent are enabled. Along with this, SQL Server Integration Services are installed.

  • In Management studio object explorer, expand the management.
  • Choose the configure management data warehouse option in data collection.
  • Choose an option to set up data collection.
  • Choose the server instance name and database, which will host the management data warehouse, and the local folder in which the collected data can be cached.
  • Select the next option » Review Settings » Finish.

Data collection offers three pre-defined sets that are available in object explorer management in the system data collection sets folder, Data collection note: Query Statistics, Disk Usage, and Server Activity. Each has its built-in report as discussed.

  • The Disk usage data collection is set of collection of data for database data file, I/O statistics, and transaction log file.
  • The reports of Disk usage are available in context menu of data collection. It shows the usage of space by growth trends, data base files, and average day growth.
  • Query Statistics set the collection of query code, activity, and query execution plans for the ten expensive queries.
  • Server Activity of data collection is a set of data about disk I/O, processor, memory, and usage of network. The report displays CPU, disk I/O, network usage, memory, Server instances, Server waits, and Operating system activity.

Note: Data collection has to be configured and start data capturing. Whereas in Active Monitoring, no option for real-time graphs. However, the data that is captured can be stored for specified number of days.

Conclusion

In the above discussion, SQL Performance Monitoring is described. Its two main metrics are discussed that helps to monitor on the Server, i.e. Active Monitoring and Data Collectors. Both have different functionalities to perform the monitoring on the SQL Server database.

]]>
SQL Server – Making Backup Compression work with Transparent Data Encryption by using Page Compression https://sqlserverfaq.com/tonyrogerson/2016/07/15/sql-server-making-backup-compression-work-with-transparent-data-encryption-by-using-page-compression/ Fri, 15 Jul 2016 15:17:49 +0000 https://sqlserverfaq.com/tonyrogerson/?p=166 Read More »]]> Encrypted data does not compress well if at all so using the BACKUP with COMPRESSION feature will be ineffective on an encrypted database using Transparent Data Encryption (TDE), this post deals with a method of combining Page Compression with TDE and getting the best of both worlds.

Transparent Data Encryption (TDE) feature encrypts data at rest i.e. the SQL Server Storage Engine encrypts on write to storage and decrypts on reading – data resides in the buffer pool decrypted.

Page compression is a table level feature that provides page level dictionary and row/column level data type compression, pages read from storage reside in the buffer pool in their compressed state until a query reads them and only at that point are expanded which gives better memory utilisation and reduces IO. You need to be aware both Encryption and Page Compression add to the CPU load on the box, the additional load will be dependent on your access patterns – basically you need to test and base your decision on that evidence, without drawing this into a discussion around storage tuning you tend to find that using Page Compression moves a query bottleneck from storage and more into CPU simply because less data is being transferred to/from storage so latency is dramatically reduced – don’t be put off if your box is regularly consuming large amounts of CPU – it’s better to have a query return in 10 seconds with 100% CPU rather than 10 minutes with 10% CPU and storage the bottleneck!

Why is data not held encrypted in the Buffer Pool? TDE encrypts the entire page of data at rest on storage and decrypts on load into memory (https://msdn.microsoft.com/en-us/library/bb934049.aspx), if it were to reside in the Buffer Pool it would require page header information in a decrypted state so that things like Checkpoint and inter-page linkage work as such there would be security risk. Page Compression does not compress page header information which is why pages can reside in the Buffer Pool in their compressed state (see https://msdn.microsoft.com/en-us/library/cc280464.aspx).

Coupling Page Compression with TDE gives you the benefit of encryption (because that phase is done on write/read to/from storage) and compression (the page is compressed when it either read or written to storage via the TDE component in the Storage Engine – basically your data is always decrypted when coming to compressing/decompressing.

The table below shows space comparison and timings between normal (no TDE or Compression), Page Compression on it’s own and TDE coupled with Page Compression. You will see that only Data in the Database is compressed – log writes are not compressed so there is only a marginal improvement in using Page Compression but you can see TDE has no effect on Page Compression to the Transaction log either.

PageCompressionWithTDE

If you use this approach then compress all the database tables with Page Compression, also – stop using the COMPRESSION option of BACKUP because it will save resource – SQL Server won’t be trying to compress something already compressed!

Example

Prepare the test database, we use FULL recovery to show the performance and space to the transaction log:

 

CREATE DATABASE [TEST_PAGECOMP]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'TEST_PAGECOMP', 
  FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL13.S2016\MSSQL\DATA\TEST_PAGECOMP.mdf' , 
  SIZE = 11534336KB , 
  MAXSIZE = UNLIMITED, FILEGROWTH = 65536KB )
 LOG ON 
( NAME = N'TEST_PAGECOMP_log', 
  FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL13.S2016\MSSQL\DATA\TEST_PAGECOMP_log.ldf' , 
  SIZE = 10GB , FILEGROWTH = 65536KB )
GO

alter database TEST_PAGECOMP set recovery FULL
go

backup database TEST_PAGECOMP to disk = 'd:\temp\INITIAL.bak' with init;
go

 

Test with SQL Server default of no-compression and no-TDE.
 

use TEST_PAGECOMP
go


--
--	No compression
--
create table test (
	id int not null identity primary key clustered,

	spacer varchar(1024) not null
) ;

set nocount on;

declare @i int = 0;

begin tran;

while @i <= 5 * 1000000
begin
	insert test ( spacer ) values( replicate( ' a', 512 ) )

	set @i = @i + 1;

	if @i % 1000 = 0
	begin
		commit tran;
		begin tran;

	end
	
end

if @@TRANCOUNT > 0
	commit tran;
go


dbcc sqlperf(logspace) 
--	Size		Percent Full
-- 6407.992		99.49185

backup log TEST_PAGECOMP to disk = 'd:\temp\TEST_PAGECOMP_NOComp.trn'
--Processed 811215 pages for database 'TEST_PAGECOMP', file 'TEST_PAGECOMP_log' on file 3.
--BACKUP LOG successfully processed 811215 pages in 15.170 seconds (417.772 MB/sec).

checkpoint
go

backup database TEST_PAGECOMP to disk = 'd:\temp\TEST_PAGECOMP_NOComp.bak' with init;
go

--Processed 718128 pages for database 'TEST_PAGECOMP', file 'TEST_PAGECOMP' on file 1.
--Processed 4 pages for database 'TEST_PAGECOMP', file 'TEST_PAGECOMP_log' on file 1.
--BACKUP DATABASE successfully processed 718132 pages in 12.157 seconds (461.495 MB/sec).

Test with Page Compression only.

--
--	Page will use dictionary so really strong compression
--
drop table test
go

create table test (
	id int not null identity primary key clustered,

	spacer varchar(1024) not null
) with ( data_compression = page );

set nocount on;

declare @i int = 0;

begin tran;

while @i <= 5 * 1000000
begin
	insert test ( spacer ) values( replicate( ' a', 512 ) )

	set @i = @i + 1;

	if @i % 1000 = 0
	begin
		commit tran;
		begin tran;

	end
	
end

if @@TRANCOUNT > 0
	commit tran;
go

backup log TEST_PAGECOMP to disk = 'd:\temp\TEST_PAGECOMP_Comp.trn' with init;
--Processed 706477 pages for database 'TEST_PAGECOMP', file 'TEST_PAGECOMP_log' on file 1.
--BACKUP LOG successfully processed 706477 pages in 12.035 seconds (458.608 MB/sec).

checkpoint
go

backup database TEST_PAGECOMP to disk = 'd:\temp\TEST_PAGECOMP_Comp.bak' with init;
--Processed 9624 pages for database 'TEST_PAGECOMP', file 'TEST_PAGECOMP' on file 1.
--Processed 2 pages for database 'TEST_PAGECOMP', file 'TEST_PAGECOMP_log' on file 1.
--BACKUP DATABASE successfully processed 9626 pages in 0.207 seconds (363.274 MB/sec).

go

Test with both Page Compression and TDE:

drop table test
go

checkpoint
go


USE master;  
GO  
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '**************';  
go  
CREATE CERTIFICATE MyServerCert WITH SUBJECT = 'My DEK Certificate';  
go  

USE TEST_PAGECOMP;  
GO  
CREATE DATABASE ENCRYPTION KEY  
WITH ALGORITHM = AES_128  
ENCRYPTION BY SERVER CERTIFICATE MyServerCert;  
GO  

ALTER DATABASE TEST_PAGECOMP  
SET ENCRYPTION ON;  
GO  

/* The value 3 represents an encrypted state   
   on the database and transaction logs. */  
SELECT *  
FROM sys.dm_database_encryption_keys  
WHERE encryption_state = 3;  
GO  



--
--	Now repeat the page compression version
--

create table test (
	id int not null identity primary key clustered,

	spacer varchar(1024) not null
) with ( data_compression = page );

set nocount on;

declare @i int = 0;

begin tran;

while @i <= 5 * 1000000
begin
	insert test ( spacer ) values( replicate( ' a', 512 ) )

	set @i = @i + 1;

	if @i % 1000 = 0
	begin
		commit tran;
		begin tran;

	end
	
end

if @@TRANCOUNT > 0
	commit tran;
go

backup log TEST_PAGECOMP to disk = 'd:\temp\TEST_PAGECOMP_CompTDE.trn' with init;
--Processed 722500 pages for database 'TEST_PAGECOMP', file 'TEST_PAGECOMP_log' on file 1.
--BACKUP LOG successfully processed 722500 pages in 12.284 seconds (459.502 MB/sec).

checkpoint
go

backup database TEST_PAGECOMP to disk = 'd:\temp\TEST_PAGECOMP_CompTDE.bak' with init;
--Processed 55296 pages for database 'TEST_PAGECOMP', file 'TEST_PAGECOMP' on file 1.
--Processed 7 pages for database 'TEST_PAGECOMP', file 'TEST_PAGECOMP_log' on file 1.
--BACKUP DATABASE successfully processed 55303 pages in 1.118 seconds (386.450 MB/sec).

go

 

]]>
SSMS July 2016 – get on with it and SQL Server 2014 SP2 – available now http://sqlserver-qa.net/2016/07/13/ssms-july-2016-get-on-with-it-and-sql-server-2014-sp2-on-the-way/ Tue, 12 Jul 2016 08:06:57 +0000 http://sqlserver-qa.net/?p=1530 Read more →

]]>


Being a summer holiday season in most of western side of the world, in general it is better to keep an eye what’s new coming in the data platform world. When it comes to SQL Server, you may have seen a flurry of announcements since SQL Server 2016 has been RTM’d.

In this month alone there are few announcements that you may need to keep an eye, such as SQL Server Management Studio July 2016 release with an enhancements/additions like:

  1.  Improved support for SQL Server 2016 (1200 compatibility level) tabular databases in the Analysis Services Process dialog and the Analysis Services Deployment wizard..
  2. Support for Azure SQL Data Warehouse in SSMS.
  3. Significant updates to the SQL Server PowerShell module. This includes a new SQL PowerShell module and new CMDLETs for Always Encrypted, SQL Agent, and SQL Error Logs. You can find out more in the SQL PowerShell update blogpost.
  4. Support for PowerShell script generation in the Always Encrypted wizard.
  5. Significantly improved connection times to Azure SQL databases.
  6. New ‘Backup to URL’ dialog to support the creation of Azure storage credentials for SQL Server 2016 database backups. This provides a more streamlined experience for storing database backups in an Azure storage account.
  7. New Restore dialog to streamline restoring a SQL Server 2016 database backup from the Microsoft Azure storage service. The dialog eliminates the need to memorize or save the Shared Access signature for an Azure storage account in order to restore a backup.
  8. ….any few more bug fixes, see  the SSMS download page for additional details, and to see the full changelog.

Another news to make a note (be prepared when you are back from holidays) that announcement of Service Pack2 for SQL Server 2014 version. The Engineering team at the Microsoft is working to bring you SQL Server 2014 Service Pack 2 (SP2), it will equipped with a rollup of released hotfixes contains 20+ improvements centered around performance, scalability and diagnostics based on the feedback from customers and SQL community. SQL Server 2014 SP2 will include

  • All fixes and CUs for SQL 2014 released to date.
  • Key performance, scale and supportability improvements.
  • New improvements based on connect feedback items filed by the SQL Community.
  • Improvements originally introduced in SQL 2012 SP3, after SQL 2014 SP1 was released

Here is the SQL Server 2014 Service Pack2 download page and don’t forget to read Microsoft SQL Server 2014 SP2 Release Notes  page

To feedback or raise any question then you can contribute/search at  Microsoft Connect page and little more more enthusiastic make sure you can use twitter to tweet Engineering Manager at @sqltoolsguy on Twitter

 

]]>
Understanding SQL Server Transaction Log Architecture http://sqlserver-qa.net/2016/06/22/transaction-log-architecture/ Wed, 22 Jun 2016 09:16:58 +0000 http://sqlserver-qa.net/?p=1517 Read more →

]]>


Transaction log is a log that maintains the record of every transactions that has occurred. It involves data modifications, rollback modifications and database modifications. It is a major part of database’s architecture. It records every transaction in an order by utilizing LSN (Log Sequence Number). Every transaction is appended to physical log file and utilizes LSN, which is of higher value than the previous LSN. It is mainly used by SQL engine to confirm the data integrity. In the following section, we will discuss an architecture of transaction log.

Logical Architecture Of Transaction log

Transaction logs works as if they are saved with string of log records. Every record is recognized with the help of Log Sequence Number. It has sufficient space to maintain successful rollback, either by the occurrence of an error within database or by an explicit action for rollback request. This amount of space can be varied but it mainly copies the amount of space, which is used to save logged operation. It is loaded into virtual log that are not controllable.

There are some steps, which are used to recover an operation. It depends on type of log record such as for data modifications or before and after images modification of data.

  • Logical operation is logged
    1. The operation is performed again if user needs to roll the logical operation forward.
    2. The reverse logical operation is performed that helps to roll the logical operation back.
  • Before and after of image logged
    1. When the image is applied after, user needs to roll the forward operation
    2. If the image is applied before, then user needs to roll the back operation.

There are various types of operations, which are recorded in transaction log as mentioned:

  • The beginning and end of every transaction
  • Each modification in data, i.e. update, delete, or insert includes the DDL (data definition language) statements of any table by including table system.
  • Each modification in data, i.e. update, delete, or insert includes the DDL (data definition language) statements of any table by including table system.
  • Each extend and page de-allocation or allocation
  • Dropping or creating an index or table.

The section of log files from the first log records, which should exist for a proper database-wide rollback to the last-written log records is known as active part of log. In this, section full recovery of database is required.

Physical Architecture Of Transaction Log

Transaction log in a database maps multiple physical files. Physically, the order of log records is saved proficiently in the usual physical files, which implement the transaction log. The SQL Server Database Engine divides internally every physical file of log into various virtual log files. Virtual log files do not have any fix size as well as no fixation of number of files present for physical log file. While extending or creating log files, the database engine selects dynamically the size of virtual log files. Virtual log files number or size are not configured or not even set by administrators. The virtual files only affect the system when its size and increment growth values are defined. If the log files grow to large size due to many small increments, then it will have many virtual log files. It results in slow down of database startup operations as well as backup and restores operations.

Checkpoints Effect on Transaction Logs Arcchitecture

Checkpoints remove the dirty data pages from the current database to disk of buffer cache. It reduces the active portion of log, which is used to process at the time of full recovery of database. At the time of full data recovery, there are various types of actions performed as mentioned:

  • Before the system stops or rolls forward, there are log records of modifications should not be removed.
  • The modifications that are associated through incomplete transaction are rolled back.

Checkpoint Operations

Checkpoint executes the mentioned way in database:

  • Note the record of log file by marking the start of checkpoint
  • Saves the data recorded for checkpoint in a chain.
  • If the simple recovery model is used to mark for the reuse of the space, which leads the MinLSN.
  • Note the record marking to end log file of checkpoint
  • Note the LSN of start of the chain

NOTE: Checkpoints contains the record of active transactions, which have modified the database.

Reasons for Occurrence of Checkpoints

Checkpoints can occur in some situations as discussed below:

  • It occurs in the present database for the connection when A CHECKPOINT statement is executed.
  • When negligible logged operations are performed in the database such as Bulk-Logged recovery model is used to perform an operation on database.
  • SQL Server is stopped with a SHUTDOWN statement or by simply stopping the MSSQLSERVER service. This action may cause a checkpoint in every database in the case of SQL Server.

Write-Ahead Transaction Log

There is a write-ahead log (WAL) in SQL Server, which gives the guarantee of no modification of data before the associated log record is written to disk. It helps to maintain the ACID properties of transactions. As the SQL Server keeps the buffer cache, on which it reads data pages at time when data is saved. The modifications on data are not done directly to disk, but are made to replica of the page in buffer cache. A page, which is modified but not written to disk, is known as dirty pages.

Conclusion

As discussed above the transaction logs play an essential feature in recovering and maintaining the database. If the records are set and maintained properly then, it will help in providing the additional backup support, which the user need without affecting system’s performance. Even it also plays an integral file in recovering database to a point in time.

]]>
Why is creating excel sheets from SSIS so hard ? http://dataidol.com/davebally/2016/06/17/why-is-creating-excel-sheets-from-ssis-so-hard/ Fri, 17 Jun 2016 15:16:25 +0000 http://dataidol.com/davebally/?p=2500 Continue reading ]]> If there is one process that should be simpler than it is out of the box, it is creating Excel spreadsheets from SSIS.   Over the years i’ve tried doing it a number of ways, using the built in component,  Interop , OLE DB etc all suck to one degree or another.  Either unreliable or to slow or simply unusable.

A twitter conversation, A) proved I wasn’t alone and B) Pointed me in the direction of EPPlus.

Over on SSC there is already a post on using EPplus with SSIS, some of which, such as putting EPPlus into the GAC, is still relevant for this post.

However, right now, i have a big love of BIML, simply put i think that this is what SSIS should have been in the first place and I personally find all the pointing and clicking a real time sink.  Additionally,  in BIML, one you have written a package to do something , ie a simple dataflow, its a snip to repeat that over 10, 20, 50 , 100 or 1000s of tables.  But the real time saver for me is when you need to re-architect,  ie turn sequential dataflows into a parallel dataflow.  Its only really a case of changing where you iterate in your BIML code.

Anyway,  i’ve combined these two pain points to create a BIML routine that uses EPPlus to output multi-sheeted Excel spreadsheet reliably and fast.

At the moment its very basic , take SQL statements and output the data to an excel file, but in time i will be hoping to create some meta data to start ‘getting the crayons out’ and making them look a bit more pretty.

Code is on GitHub at https://github.com/davebally/BIML-SSIS-Excel-Output ,  hope this of use to someone.

 

 

]]>
SQL Server 2016 Data Discovery Day – sessions and photos http://sqlserver-qa.net/2016/06/10/sql-server-2016-data-discovery-day-sessions-and-photos/ Fri, 10 Jun 2016 09:37:23 +0000 http://sqlserver-qa.net/?p=1506 Read more →

]]>


We must admit that SQL server 2016 Data Discovery day was a successful event with a valuable feedback that we have received from the attendees.

Here is the link to download sessions and photos from that day.

Later I will also blog about the data discovery problem dataset and the winning solution development on that day.

More later….

 

 

 

SQL Server 2016 general availability and discovery day

]]>