SQL Archives - SQL Authority with Pinal Dave

Sometimes while reproducing a situation results in another new error and I get an idea for a new blog. In this blog we would talk about an error Login-based server access validation failed with an infrastructure error. Login lacks connect endpoint permission.

The earlier blog was about connecting to SQL Server using Dedicated Admin Connection (DAC) in SQL Server. When I tried connecting to SQL using SQLCMD to DAC, I received below error.

Sqlcmd: Error: Microsoft ODBC Driver 17 for SQL Server : Login failed for user ‘daclogin’..

Here is the screenshot.

Above is one of the most common error which doesn’t tell the cause of the issue. Whenever you get such error, you should always look at SQL Server ERRORLOG SQL SERVER – Where is ERRORLOG? Various Ways to Find ERRORLOG Location

In ERRORLOG, I saw the following message:

Error: 18456, Severity: 14, State: 149.
Login failed for user ‘daclogin’. Reason: Login-based server access validation failed with an infrastructure error. Login lacks connect endpoint permission. [CLIENT: 127.0.0.1]

WORKAROUND/SOLUTION

The key part of the error message was “Login lacks connect endpoint permission.”. I checked further and realized that this account “dacadmin” which I created was part of “public” role.

As soon as I gave him sysadmin, it was able to log in.

Have you encountered the same error in non-DAC connection also? What was the solution?

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Error: 18456, State 149 – Login-based Server Access Validation Failed With an Infrastructure Error. Login Lacks Connect Endpoint Permission

This was the first experiment with gMSA account in my lab and I faced an interesting issue. In my lab environment, I have a complete domain server and member servers. Once I configured gMSA for SQL Server service and restarted the machine, SQL Service didn’t start automatically even though it was set for an automatic startup as shown below.

SQL SERVER - SQL Service Not Getting Started Automatically After Server Reboot While Using gMSA Account gMSA-auto-err-01

There was no ERRORLOG because SQL didn’t start. Event log showed me a bunch of Errors which I have listed below:

Event ID	Source	Details
7038	Service Control Manager	The MSSQL$SQL_XFBIZ service was unable to log on as SQLAuthority\gmsaQUICK$ with the currently configured password due to the following error: The specified domain either does not exist or could not be contacted. To ensure that the service is configured properly, use the Services snap-in in Microsoft Management Console (MMC).
7034	Service Control Manager	The SQL Server (MSSQLSERVER) service terminated unexpectedly. It has done this 1 time(s).
700	Service Control Manager	The MSSQLSERVER service failed to start due to the following error: The service did not start due to a logon failure.

The interesting messages out of all are:

The specified domain either does not exist or could not be contacted.
The service did not start due to a logon failure.

WORKAROUND/SOLUTION

From the messages its clear that server was not able to contact the domain controller when it was getting started along with server startup. There are few things, which I am aware of, which would help.

Set SQL Server Service to “Automatic (Delayed Start)” as shown below.
Using registry editor, set the dependencies of SQL Server service on Netlogon and W32time service. Here are the steps:
1. Go to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\MSSQLSERVER
2. Look for “DependOnService” on the right pane.
3. Edit the values and add W32Time and Netlogon as shown below. Note: KEYISO was already there.
4. Close the settings and check via services to make sure dependency is set correctly.

After doing above, I never faced the same issue on this server.

Have you ever faced the same issue? Is there any other solution which you found? Please share via comments and I would write a blog with due credit.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – SQL Service Not Getting Started Automatically After Server Reboot While Using gMSA Account

Sometimes if the sequence is not followed correctly, we might see some weird errors. In this blog, we would learn how to fix always on an error while adding new replica “The specified instance of SQL Server is hosted by a system that is not a Windows Server Failover Cluster(WSFC) node.”

This was one of my existing clients to whom I worked and configured Always On Availability Group. They had some issues with nodes, and they ended up in rebuilding the cluster.

Later, they were able to redeploy and bring the cluster back. Now, when they were trying to add a replica Node2 from Node1 via SQL Server Management Studio, an error was displayed.

Here is the text of the error message

The specified instance of SQL Server is hosted by a system that is not a Windows Server Failover Cluster(WSFC) node. (Microsoft.SqlServer.Management.HadrTasks)
——————————
Program Location:
at Microsoft.SqlServer.Management.Hadr.SelectReplicasController.VerifyConnection(UIConnectionInfo ci, IServerType serverType)
at Microsoft.SqlServer.Management.UI.ConnectionDlg.Connector.ConnectionThreadUser()

From the message is clear that there is something wrong between SQL and Cluster communication. I asked them to run below command on all nodes which are part of the windows cluster.

SELECT *
FROM sys.dm_hadr_cluster_members
GO
SELECT *
FROM sys.dm_hadr_cluster

When we ran the query on the node which was giving an error, there were no rows.

WORKAROUND/SOLUTION

We disabled Always On Availability Group on the “bad” node using SQL Server Configuration Manager. Once done, we restarted the SQL Service. Then we enabled it again followed by another SQL Service restart. After finishing this, we were able to get information about the cluster and nodes via the query which I mentioned earlier.

Have you seen such an issue earlier? What were the steps to break it?

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – FIX: The specified instance of SQL Server is hosted by a system that is not a Windows Server Failover Cluster(WSFC) Node.

Today’s blog post is directly inspired by the conversation I had during my Comprehensive Database Performance Health Check. During the consulting engagement, one of the developers reported below error in the database while trying to open a database diagram.

Cannot execute as the database principal because the principal ‘dbo’ does not exist, this type of principal cannot be impersonated, or you do not have permission. (Microsoft SQL Server, Error: 15517)

As per message, it is clear that something is not right with database principal dbo. I asked history about it and learned that this database was restored from some other server. I was able to run some queries and find the issue.

To verify if you are running into the same issue, you can run below query to find who is mapped to “dbo” in the database. My sample database name is AdventureWorks, so please change accordingly.

USE AdventureWorks
GO
SELECT SUSER_SNAME(sid), * from sys.database_principals

If the first column shows as NULL then below fix would work.

WORKAROUND/SOLUTION

This was just a different variation of error which I have explained in my earlier blog. SQL SERVER – ERROR: FIX – Database diagram support objects cannot be installed

We went ahead and changed the owner of the database in the UI (Properties > Files tab)

Here is the equivalent command.

USE [AdventureWorks]
GO
ALTER AUTHORIZATION ON DATABASE::[AdventureWorks] TO [sa]
GO

After doing this, the issue was resolved, and they were able to use this feature?

How many of you use this old feature? Truly speaking, I have not seen many using it in production.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – FIX: Database Diagram Error 15517 – Cannot Execute as the Database Principal Because the Principal ‘dbo’ Does Not Exist

After finishing my lab and testing, I went to my client and they showed me error while setting up cloud witness. They showed me a different error then earlier. In this blog, we would learn about fixing error Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host while configuring cloud witness.

I have written an earlier blog with a different error:

SQL SERVER – Unable to Set Cloud Witness. Error: The Client and Server Cannot Communicate, Because They do Not Possess a Common Algorithm

Here is the exact error message which was seen:

The error message clearly tells that there is some issue with connectivity between node and cloud witness.

WORKAROUND/SOLUTION

I always use Telnet or Test-NetConnection to test connectivity and here also it was helpful. I found that this server was not allowing internet outbound connections. As soon as I opened port 443 to storage, the issue was resolved.

For more details about an opening port in Azure VM, refer my earlier blog

SQL SERVER – What is the Meaning of PREEMPTIVE_HTTP_EVENT_WAIT? How to Fix it?

Even though the behavior is different, but the solution remains the same as above mentioned.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Unable to Set Cloud Witness. Error: An Existing Connection Was Forcibly Closed by The Remote Host

Let me tell you about a real-world scenario which I often see at organizations I work with. In the perfect world, as organizations grow, their team should grow too. But often that isn’t the reality. The way it works out in most cases is, when organizations are small, the team put in place rock solid plans to monitor their servers. However, when servers, data, and the business grows, the development and management team always gets busy with multiple tasks and eventually do not spend sufficient time building their monitoring solution. This leads to a poor response time to business-critical issues with the system, resulting in sub-optimized servers.

Whenever I see such a scenario, I am always keen to introduce Redgate’s SQL Monitor. It is one of my favorite monitoring tools and it alerts you to problems before they occur, and automatically provides the data you need to monitor your server estate proactively. Recently, in version 9, I noticed it has a new addition in the form of the Estate pages.

The new Estate pages provide a very simple estate-wide view of important SQL Server metrics and tasks, such as disk space usage, backupsand other jobs, and information on recent updates and patches. They provide a central dashboard from which a team can review the overall health of their servers, spot any issues quickly, before they become real problems, and proactively assign priorities.

The Estate pages contain essential diagnostic data and provide information on installed versions, backups, disk space, and agent jobs, across every server and database on the monitored estate. The goal of the Estate page is to get a clear picture of the health and security of the database estate and provide a way to predict future behavior. It also helps to spot impending issues quickly and anticipate when resource constraints will escalate into the sort of problems that cause downtime, and unplanned maintenance work.

Here are few of the screenshot of the Estate Pages in the SQL Monitor 9.

SQL Monitor 9 - Proactively Monitor Large SQL Server Estates rg-q11

Installed Version

SQL Monitor 9 - Proactively Monitor Large SQL Server Estates rg-q12

Disk Usage

SQL Monitor 9 - Proactively Monitor Large SQL Server Estates rg-q13

Backup

SQL Monitor 9 - Proactively Monitor Large SQL Server Estates rg-q14

Agent Jobs

Alongside on-premises servers, it also manages Azure SQL Databases. I personally feel that Estate pages are going to change how I do SQL Server monitoring.

What Next –

Well, to learn more about SQL Monitor head here and read this article.

You can also download the SQL Monitor 9 Free Trial.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL Monitor 9 – Proactively Monitor Large SQL Server Estates

This was my first experiment with this feature called Stretch Database. I have started the wizard and encountered at the very end. Here is the error message.

SQL SERVER - Stretch Database - ERROR: The Provided Location is Not Available for Resource Group StrechDB_Reg_Err-01

And here is the text of error message.

Operation to create resource group stretchgroup-australiacentral failed. Details : {“error”:{“code”:”LocationNotAvailableForResourceGroup”,”message”:”The provided location ‘australiacentral’ is not available for resource group. List of available regions is ‘centralus,eastasia,southeastasia,eastus,eastus2,westus,westus2,northcentralus,southcentralus,westcentralus,northeurope,westeurope,japaneast,japanwest,brazilsouth,australiasoutheast,australiaeast,westindia,southindia,centralindia,canadacentral,canadaeast,uksouth,ukwest,koreacentral,koreasouth,francecentral,southafricanorth’.”}}

It also gave an option to read the logs and I found the same error there also.

[Informational] TaskUpdates: Message:Task : ‘Provision Azure Sql Server stretchserver-stretchdbdemo-20190317-031938571’ — Status : ‘Started’ — Details : ‘Task ‘Provision Azure Sql Server stretchserver-stretchdbdemo-20190317-031938571′ started ….’.
[Informational] TaskUpdates: Message:Task : ‘Provision Azure Sql Server stretchserver-stretchdbdemo-20190317-031938571’ — Status : ‘Running’ — Details : ‘Task failed due to following error: Microsoft.SqlServer.Management.StretchDatabase.Model.Tasks.CreateResourceGroupFailedException: Operation to create resource group stretchgroup-australiacentral failed. Details : {“error”:{“code”:”LocationNotAvailableForResourceGroup”,”message”:”The provided location ‘australiacentral’ is not available for resource group. List of available regions is ‘centralus,eastasia,southeastasia,eastus,eastus2,westus,westus2,northcentralus,southcentralus,westcentralus,northeurope,westeurope,japaneast,japanwest,brazilsouth,australiasoutheast,australiaeast,westindia,southindia,centralindia,canadacentral,canadaeast,uksouth,ukwest,koreacentral,koreasouth,francecentral,southafricanorth’.”}}

at Microsoft.SqlServer.Management.StretchDatabase.Model.Tasks.ProvisionSqlAzureServerTask.CreateNewResourceGroup(ResourceManagement resourceManagementChannel, ServiceOperationStatus& status)
at Microsoft.SqlServer.Management.StretchDatabase.Model.Tasks.ProvisionSqlAzureServerTask.Perform(IExecutionPolicy taskExecutionPolicy)
at Microsoft.SqlServer.Management.StretchDatabase.Model.Common.Task.Perform(IExecutionPolicy policy, CancellationToken token, ScenarioTaskHandler taskDelegate), retrying …’.

WORKAROUND/SOLUTION

I must say the error is intuitive and tells that the location is not available. But I was wondering where did I choose the location? So, I launched the wizard again and found the place (highlighted below)

SQL SERVER - Stretch Database - ERROR: The Provided Location is Not Available for Resource Group StrechDB_Reg_Err-02

The problem here is that the location came as the default (alphabetically first) and I didn’t pay attention to it.

As soon as I selected “South India” I was able to proceed and stretch the table to Azure.

SQL SERVER - Stretch Database - ERROR: The Provided Location is Not Available for Resource Group StrechDB_Reg_Err-03

Have you tested this feature? Do you have any interesting learning?

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Stretch Database – ERROR: The Provided Location is Not Available for Resource Group

One of my clients contacted me to give advice about an error. In this blog, we would discuss error message – A read of the file at offset succeeded after failing 1 time(s)

Here is the complete error message they were seeing while all of the scheduled jobs in the SQL Server Agent were failing due to the IO errors.

Message. A read of the file ‘K:\MSSQL14.MSSQLSERVER\MSSQL\Data\HR_NCIDX.ndf’ at offset 0x000002a84dd000 succeeded after failing 1 time(s) with error: 21(failed to retrieve text for this error. Reason: 15105). Additional messages in the SQL Server error log and system event log may provide more detail. This error condition threatens database integrity and must be corrected. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

Whenever I see such a message, I always ask to make sure that hardware is healthy. In the error message, we also see the error: 21. What does that mean?

Most of the cases you should be able to use NET HELPMSG command to convert the number to the text. I have written this trick in an earlier blog.

How to Convert Hex Windows Error Codes to the Meaningful Error Message – 0x80040002 and 0x80040005 and others? – Interview Question of the Week #182

So, error 21 means “The device is not ready”.

WORKAROUND/SOLUTION

Based on the error message, it is clear that in this situation there are definitely issue with hardware hosting K drive (in the error message we have the path of the file). This is also mentioned in Microsoft documentation – This message indicates that the read operation had to be reissued at least one time and indicates a major problem with the disk hardware.

I recommended my client to move ALL the file to new physical drive (on a new hardware LUN). The quick way was to do below.

Add a new Drive (let’s call it as X)
Create the same folder structure as on the K drive.
Stop SQL Service.
Move files from K drive to X drive.
Swap Drive letters (K > Y, X > K, Y > X)
Start the SQL Service. (you might get permission errors so set permissions accordingly)

I also asked them to engage hardware vendor and check the health of hardware. This is what they have replied.

Hi Pinal,

It was indeed a storage issue. After we have moved all our data new LUNs, all the jobs are running fine now. Thanks for quick turnaround time and your immediate help!

Such emails are so lovely to read.

Have you seen such error in ERRORLOGs? If yes, please take action right now before its too late.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Error: 825 – A Read of the File at Offset Succeeded After Failing 1 Time(s)

When you are planning data platform consolidation, it is very important to track down the changes in the overall system performance after the actual migration is done. You should understand the differences in the configurations and performance of the instances and databases. Then you will also understand better how well the actual planning work went, and which are the possible challenges in the data platform.

Let’s assume you had 100 servers with a total of 150 instances in your old data platform and then you carefully planned a new data platform. The new platform has only 30 target servers, each containing 5 instances on average. The container databases remained intact. Even though it is easier to maintain a significantly smaller set of servers, it should be able to ensure that sufficient service levels and performance are met in the new platform—or even exceeded—by understanding the correlation between the configurations and correlation between the performance counters between the old and new system. When the number of servers and CPU cores changes, this may also affect the instance level configuration settings such as Max Degree Of Parallelism and Cost Threshold For Parallelism. Also, the relative CPU workload may change if we are renewing the servers from the smaller server into bigger ones. Then it is a good idea to compare the processor queues as well.

What does this mean in practice? Well, you need to have a couple of things available: the instance and database level diagnostics results stored from the old data platform. Then you need to compare those results into new ones. You should include things like wait statistics, instance level configurations and index fragmentation per minimum. Wait statistics are a good way to point out where the hardware and DBMS bottlenecks are. Instance level configurations are important to fine-tune and balance the overall instance performance. Checking the indexes is important as they may become fragmented during the migration causing severe system slow-down.

How to Track Data Platform Service Level and Performance Before and After Consolidation? counters

Image: annual comparison on performance counters between the old and the new data platform in SQL Governor software.

In addition to this, you need consecutive monitoring data history at least from a couple of months on server, instance, and database level for all the important performance counters, such as average and maximum CPU utilization %, RAM usage, buffer cache hit ratio, page life expectancy, CPU time, IOPS, throughput, latencies, database growth, and such. Also, you need to be able to compare these figures between the old and new data platform. If, for example, the instance level CPU hits the ceiling continuously in an OLTP system, it will slow down the workload processing dramatically. Also, if there is insufficient RAM to handle all the data on the particular instance, the buffer cache hit ratio and page life expectancy start to fall down. What comes to the database level counters it is good to understand the actual IOPS, throughput and latencies so you can, compare the performance between the old and new storage system.

Actually, there is a software available that can do all the above, in addition to the consolidation planning and right-sizing of the system, making your job easier, faster and more accurate. It is called SQL Governor. With SQL Governor, you can run the diagnostics jobs, monitor your data platform, create consolidation plans from old to new data platform, and compare the monitoring data between the old and new data environment. This makes it easy to understand how the capacity is being used from different perspectives of performance, and you can focus in most important areas in terms of performance optimization and capacity management. This way, it is easy to follow-up for example that how the max degree of parallelism and cost threshold for parallelism changes will affect the actual wait stats and monitoring performance counters, and to compare their respective SLA’s between the old and new data platform. Or you can just simply compare the performance counter metrics between the old and the new system, and see what is the difference between them on the desired timespan.

Check out my previous articles related to SQL Governor: Why is SQL Server Consolidation Better Than Having a Scattered Environment, and Automated SQL Server Consolidation and Right Sizing – Save 50% Data Platform Costs.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on How to Track Data Platform Service Level and Performance Before and After Consolidation?

I wrote a blog earlier about my first experience (and error encountered) with “Stretch Database” feature. In this blog I would talk about the error – Free Trial subscriptions can provision Basic, Standard S0 through S3 databases, up to 100 eDTU Basic or Standard elastic pools and DW100 through DW400 data warehouses.

One of my blog readers read my earlier blog SQL SERVER – Stretch Database – ERROR: The Provided Location is Not Available for Resource Group.

He followed the wizard, but it failed with an error. He sent me an email and I asked for a complete log which was generated and here is what we see.

[Informational] TaskUpdates: Message:Task : ‘Configure Stretch on the Database StretchTestDB’ — Status : ‘Running’ — Details : ‘Task failed due to following error: Microsoft.SqlServer.Management.Smo.FailedOperationException: Alter failed for Database ‘StretchTestDB’. —> Microsoft.SqlServer.Management.Common.ExecutionFailureException: An exception occurred while executing a Transact-SQL statement or batch. —> System.Data.SqlClient.SqlException: ‘Free Trial subscriptions can provision Basic, Standard S0 through S3 databases, up to 100 eDTU Basic or Standard elastic pools and DW100 through DW400 data warehouses’
ALTER DATABASE statement failed.

WORKAROUND/SOLUTION

The error message gives and hints that the reader was using “Free” subscription which doesn’t have the capability to consume Database Stretch Unit (DSU). Here is what I see in my management studio.

SQL SERVER - Stretch Database - Free Trial Subscriptions Can Provision Basic, Standard S0 through S3 Databases, up to 100 eDTU Basic or Standard Elastic Pools and DW100 Through DW400 Data Warehouses StretchDB-free-err-01

Below is the screenshot from the portal.

As per error message below service tiers are possible.

Basic, Standard S0 through S3 databases.
Up to 100 eDTU Basic or Standard elastic pools
DW100 through DW400 data warehouses

And none of them talk about DSU and hence the error. So, I replied to my reader that he needs a paid subscription to use this feature. I am surprised that Microsoft has not documented it, or at least I am not able to search it. If you find it, please share it via comments.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Stretch Database – Free Trial Subscriptions Can Provision Basic, Standard S0 through S3 Databases, up to 100 eDTU Basic or Standard Elastic Pools and DW100 Through DW400 Data Warehouses

I have helped many clients in deploying Always On Availability Group. Based on their requirement they keep using additional features provided by availability groups. This time they wanted to use always-on availability group read-only routing feature. Their goal was to offload read-only workload to the secondary replica. This was failing with error: Client unable to establish a connection because an error was encountered during handshakes before login.

I informed them that while making a connection from the application, they need to make sure of three things.

The routing URL is setup correctly in SQL Server.
The routing list is setup correctly in SQL Server.
Connect to the listener in the connection string.
Provide default database name in the connection string.
Provide application intent parameter in the connection string.

As per them, the above things were checked already so they wanted me to look into it and fix it.

Here is the error message which they were getting while connecting to the listener using read-only intent.

Client unable to establish connection because an error was encountered during handshakes before login. Common causes include client attempting to connect to an unsupported version of SQL Server, server too busy to accept new connections or a resource limitation (memory or maximum allowed connections) on the server..
Sqlcmd: Error: Microsoft ODBC Driver 13 for SQL Server: TCP Provider: An existing connection was forcibly closed by the remote host.
Sqlcmd: Error: Microsoft ODBC Driver 13 for SQL Server: Client unable to establish connection.
Sqlcmd: Error: Microsoft ODBC Driver 13 for SQL Server: Client unable to establish connection due to prelogin failure.

Here is the image

SQL SERVER - Read Only Routing Error: Client Unable to Establish Connection Because an Error was Encountered During Handshakes Before Login ao-ror-err-01

WORKAROUND/SOLUTION

Without wasting a lot of time, I asked them to show me the routing via SSMS.

SQL SERVER - Read Only Routing Error: Client Unable to Establish Connection Because an Error was Encountered During Handshakes Before Login ao-ror-err-02

This UI the Availability Group properties window. If you are using a lower version of SSMS then you might want to use the catalog views to query them. The latest SSMS can be downloaded free from below link

Download SQL Server Management Studio (SSMS).

As we can see, the routing URL is setup incorrectly. The port in the routing URL should be the port on which SQL connections are made to the instance. Typically, for default instance, it is 1433.

SQL SERVER – Find Port SQL Server is Listening – Port SQL Server is Running

As soon as the URL was changed, the read-only routing started working like a charm.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Read Only Routing Error: Client Unable to Establish Connection Because an Error was Encountered During Handshakes Before Login

While trying to upgrade my SQL instance to the latest patch of SQL 2017, I encountered an issue. This has happened earlier with a few of my clients earlier but this time the cause was different. In this blog we would learn about cause and the fix of upgrade error:

The query has been canceled because the estimated cost of this query (37) exceeds the configured threshold of 30. Contact the system administrator.

Here are few earlier blogs about troubleshooting same issue.

This time it was a new error, so I am blogging it. As usual, I asked for ERRORLOG and was looking for the cause of failure.

Error: 8649, Severity: 17, State: 1.
The query has been canceled because the estimated cost of this query (37) exceeds the configured threshold of 30. Contact the system administrator.
Error: 912, Severity: 21, State: 2.
Script level upgrade for database ‘master’ failed because upgrade step ‘msdb110_upgrade.sql’ encountered error 8649, state 1, severity 17. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the ‘master’ database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
Error: 3417, Severity: 21, State: 3.
Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.
SQL Server shutdown has been initiated
SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required.

WORKAROUND/SOLUTION

Whenever we have such upgrade script failure issue and SQL is not getting started, we need to use trace flag 902 to start SQL which would bypass script upgrade mode. This would allow us the find the cause and fix it. So, here are the steps I have done.

As I mentioned earlier, first we started SQL with trace flag 902. I started SQL using trace flag 902 as below via command prompt.

NET START MSSQLSERVER /T902

For named instance, we need to use below (replace instance name based on your environment)

NET START MSSQL$INSTANCENAME /T902

Refer: SQL SERVER – 2005 – Start Stop Restart SQL Server from Command Prompt

As soon as SQL was started, I was able to connect because the script didn’t run.

In Object Explorer, right-click a server and select Properties.
Click the Connections page.
Clear the Use query governor to prevent long-running queries check box.

Here is the equivalent T-SQL:

EXEC sys.sp_configure N'query governor cost limit', N'0'
GO
RECONFIGURE WITH OVERRIDE
GO

After modifying the configuration, I stopped SQL Service and started normally (without trace flag) using SQL Server Configuration Manager.

And the issue was resolved. Have you faced any such interesting issue during SQL upgrades?

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Script Level Upgrade for Database ‘master’ Failed Because Upgrade Step ‘msdb110_upgrade.sql’ Encountered Error 8649, State 1, Severity 17

There are many deployments where I have assisted my clients in creating Always On Availability Groups. Sometimes they come to me with special requirements, based on their infrastructure, and I always learn from them. In this situation, my client was trying to create a listener in a workgroup, and it was failing with error: Unable to determine if the computer ‘ListenerName’ exists in the domain ‘WORKGROUP’

THE SITUATION

My client has created two nodes Always On availability group in the cluster. This whole setup was in Azure infrastructure. Since they were testing, to save the cost they have deployed only 2 nodes availability group. Since they can’t afford 3^rd machine, they planned to deploy a domain-less cluster. They followed blogs on the internet, and it was deployed fine. They were also able to deploy availability group and databases synchronization was healthy.

The last step was to create a listener for this availability group. They were following this article

https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sql/virtual-machines-windows-portal-sql-availability-group-tutorial#configure-listener

One of the steps was to create Client Access Point for the listener and it was failing for them. Here is the screenshot of the error message.

SQL SERVER - Error in Validation: Listener in Workgroup - Unable to determine if the computer exists in the domain 'WORKGROUP' list-wg-err-01

WORKAROUND/SOLUTION

Instead of Cluster manager to create a listener, we need to use SSMS or T-SQL to create the listener. In the SSMS UI, we need to make sure that we are choosing “Static IP” and providing the IP Address as shown below.

SQL SERVER - Error in Validation: Listener in Workgroup - Unable to determine if the computer exists in the domain 'WORKGROUP' list-wg-err-02

Here is the T-SQL to achieve the same.

USE [master]
GO
ALTER AVAILABILITY GROUP [FIN-USW-AG]
ADD LISTENER N'FIN-USW-LIST' (
WITH IP
((N'10.0.1.22', N'255.255.255.0')
), PORT=1433);
GO

After creating a listener, they were able to follow the rest of the article and deploy Always On Availability Group in Workgroup in Azure Virtual Machines along with listener.

Have you seen such error earlier?

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Error in Validation: Listener in Workgroup – Unable to determine if the computer exists in the domain ‘WORKGROUP’

Recently I was helping a client where SQL Server upgrade was failing with an error. If you have seen my blogs, there have been many blogs where I explain steps to fix upgrade-related issues. In this blog, we would learn about fixing another cause of script level upgrade for database ‘master’ failed.

Similar to earlier blogs of upgrade failures, here are the messages seen in the ERRORLOG. If you are new to SQL and don’t know about SQL Server ERRORLOG then please refer below blog. SQL SERVER – Where is ERRORLOG? Various Ways to Find ERRORLOG Location

Here are the messages:

Error: 15151, Severity: 16, State: 1.
Cannot find the user ‘##MS_SSISServerCleanupJobUser##’, because it does not exist or you do not have permission.
Error: 912, Severity: 21, State: 2.
Script level upgrade for database ‘master’ failed because upgrade step ‘SSIS_hotfix_install.sql’ encountered error 15151, state 1, severity 16. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the ‘master’ database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.
Error: 3417, Severity: 21, State: 3.
Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.

From the first message, it is clear that user “##MS_SSISServerCleanupJobUser##” is missing from SSISDB database. Due to this, there is some action in upgrade script of SQL Server upgrade is failing and hence the error. As you can assume, the solution would be to create the user in the database. But keep in mind, SQL is not getting started and hence it is important to know the trick to start SQL without this error first. Read further to understand the trick.

WORKAROUND/SOLUTION

As I mentioned earlier, first we started SQL with trace flag 902. I started SQL using trace flag 902 as below via command prompt.

NET START MSSQLSERVER /T902

For named instance, we need to use below (replace instance name based on your environment)

NET START MSSQL$INSTANCENAME /T902

Refer: SQL SERVER – 2005 – Start Stop Restart SQL Server from Command Prompt

As soon as SQL was started, I was able to connect because the upgrade script didn’t run. Here is the T-SQL which I ran to create the user which was missing.

USE [SSISDB] GO
CREATE USER [##MS_SSISServerCleanupJobUser##] FOR LOGIN [##MS_SSISServerCleanupJobLogin##] WITH DEFAULT_SCHEMA=[dbo] GO

After creating the login, I stopped SQL Service using SQL Server Configuration Manager. We can also do it via command prompt using below command. Below is for the default instance.

NET STOP MSSQLSERVER

If you are dealing with named instance, then below is the command ((replace InstanceName based on your environment)

NET START MSSQL$INSTANCENAME

Then start SQL normally (without trace flag) using SQL Server Configuration Manager.

And the issue was resolved. Have you faced any such interesting issue during SQL upgrade?

Reference : Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Script Level Upgrade for Database ‘master’ Failed Because Upgrade Step ‘SSIS_hotfix_install.sql’ Encountered Error 15151

One of the most common connectivity errors which I have heard about it is “A network-related or instance-specific error occurred while establishing a connection to SQL Server.” In this blog, we would learn about how to fix this error if it comes during the installation of SQL Server.

THE SITUATION

If you search for the error which I mentioned in blog subject, one of my blogs which would come up is the following SQL SERVER – FIX : ERROR : (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server) (Microsoft SQL Server, Error: )

There are many other on the internet which would have any suggestions to fix the error.

The situation here is a little different than all the others. My client was getting this error during the installation of SQL Server. So, this was not in any client application, website or SQL Server Management Studio, but seen in installation logs. The following information is picked from “Detail.txt” file, which is one of the most important files, used to fix issues related to installation.

SQLEngine: –SqlDatabaseServiceConfig: Connection String: Data Source=\\.\pipe\SQLLocal\MSSQLSERVER;Initial Catalog=master;Integrated Security=True;Pooling=False;Connect Timeout=300;Network Library=dbnmpntw;Application Name=SqlSetup
SQLEngine: : Checking Engine checkpoint ‘ServiceConfigConnect’
SQLEngine: –SqlDatabaseServiceConfig: Connecting to SQL….
Sco: Attempting to connect script
Connection string: Data Source=\\.\pipe\SQLLocal\MSSQLSERVER;Initial Catalog=master;Integrated Security=True;Pooling=False;Connect Timeout=300;Network Library=dbnmpntw;Application Name=SqlSetup
Sco: Connection error code from SqlException is : 2
Prompting user if they want to retry this action due to the following failure:
—————————————-
The following is an exception stack listing the exceptions in outermost to innermost order
Inner exceptions are being indented
Exception type: Microsoft.SqlServer.Configuration.Sco.ScoException
Message:
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server)
HResult : 0x84bb0001
FacilityCode : 1211 (4bb)
ErrorCode : 1 (0001)
Data:
DisableRetry = true
Inner exception type: System.Data.SqlClient.SqlException
Message:
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server)
HResult : 0x80131904
Data:
HelpLink.ProdName = Microsoft SQL Server
HelpLink.EvtSrc = MSSQLServer
HelpLink.EvtID = 2
HelpLink.BaseHelpUrl = http://go.microsoft.com/fwlink
HelpLink.LinkId = 20476
Stack:
at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)
at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)
at System.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, Boolean encrypt, Boolean trustServerCert, Boolean integratedSecurity, SqlConnection owningObject, Boolean withFailover)
at System.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, Boolean ignoreSniOpenTimeout, Int64 timerExpire, SqlConnection owningObject, Boolean withFailover)
at System.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(String host, String newPassword, Boolean redirectedUserInstance, SqlConnection owningObject, SqlConnectionString connectionOptions, Int64 timerStart)
at System.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(SqlConnection owningObject, SqlConnectionString connectionOptions, String newPassword, Boolean redirectedUserInstance)
at System.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, Object providerInfo, String newPassword, SqlConnection owningObject, Boolean redirectedUserInstance)
at System.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionFactory.CreateNonPooledConnection(DbConnection owningConnection, DbConnectionPoolGroup poolGroup)
at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
at System.Data.SqlClient.SqlConnection.Open()
at Microsoft.SqlServer.Configuration.Sco.SqlScriptExecution.GetConnection()
at Microsoft.SqlServer.Configuration.Sco.SqlScriptExecution.Connect()

WORKAROUND/SOLUTION

After doing a lot of investigation, I found that this was due to TLS settings. My client was installing SQL Server 2014 RTM version which it doesn’t support TLS 1.2.

We uninstalled everything and disabled TLS 1.2 and enabled TLS 1.0

Here are the steps:

Enable TLS 1.0 on the server. by following below KB article.TLS 1.2 support for Microsoft SQL Server
Restart the server.
Install SQL Server 2014 and patch to latest version.
Disable TLS 1.0.
Restart the server.

After this, the SQL was installed, and the application was able to connect to SQL Server.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – FIX: Install Error: A Network-Related or Instance-Specific Error Occurred While Establishing a Connection to SQL Server

During my consultancy engagement, my client’s DBA team was checked ERRORLOG and asked me the possible cause of below error about service broker.

DateTime spid74s Error: 9642, Severity: 16, State: 3.
DateTime spid74s An error occurred in a Service Broker/Database Mirroring transport connection endpoint, Error: 8474, State: 11. (Near endpoint role: Target, far endpoint address: ”)

From the above message, we can figure out that there is some system session in SQL which is trying to connect to some endpoint. I checked and they confirmed that they were not using service broker. But they were using Always On Availability Group.

I checked data synchronization and it was fine. Rows which were modified on primary was reaching to secondary as well. This means there was no issue with data movement also.

While enquiring further I learned that they are using a Read-Only Routing feature that was not working properly.

WORKAROUND/SOLUTION

As soon as I checked the availability group properties and checked Read-Only Routing configuration, everything has fallen in place and started making sense. Here is what I saw in the properties.

Do you see a problem here?

When the request hits primary, it finds the routing and request try to go to 5022 port on secondary. That port is not meant for client connectivity and hence the error.

Whenever we try to connect to a listener and perform read-only routing at the very same time, we were seeing mirroring endpoint error.

To fix the issue, we corrected the port number to the ports on which SQL Server was listening, After fixing the issue, routing started working and we also found that error disappeared.

The same error can be reproduced if you try to connect to SQL Server on a port on which SQL is listening but not for client connectivity.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Error: 9642 – An error occurred in a Service Broker/Database Mirroring transport connection endpoint

One of my clients was trying to automate SQL deployment. While testing they were stuck with an interesting error and then they contacted me. In this blog, I would share my learning about an installation error: System.ArgumentNullException – Value cannot be null.

THE SITUATION

They had an internal website where they would take inputs from the user about the server, instance name, standalone or clustered, features, etc. and based on that they generate a setup configuration file and install SQL using that file. Randomly they noticed that in some cases, installation was failing.

When I was engaged, I asked for the installation log to find the error message from a failed installation. I found the following information in the Detail.txt file.

The following is an exception stack listing the exceptions in outermost to innermost order Inner exceptions are being indented
Exception type: System.ArgumentNullException
Message:
Value cannot be null.
Parameter name: instanceName
HResult : 0x80004003
Stack:
at Microsoft.SqlServer.Configuration.Fulltext.Util.IsDefaultInstanceName(String instanceName)

From Summary.txt, I was able to find the configuration file parameter which points to the setup configuration file of SQL Server.

WORKAROUND/SOLUTION

As you can imagine, the issue was due to an empty value of the parameter. After carefully examining the files, we found that the InstanceName parameter was missing in the file in certain conditions. They thought since it is a default instance, they need not pass anything, but we needed to pass MSSQLSERVER.

They changed their code to take care of this empty value, the issue was resolved.

Reference : Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Installation Error: System.ArgumentNullException – Value Cannot be Null

While working with one of the clients to recover from a disaster, I encountered an error. In this blog we would talk about error Drop failed for Availability Group. If you want any such assistance, you can hire me for quick consultation using On Demand offering.

THE SITUATION

When I got a call from my client, they were into the situation where availability group was not coming online. After an investigation, we concluded that their windows cluster was having issues and it was unable to start. The solution in such disaster is to start windows cluster in force quorum mode.

Once we started to cluster it in ForceQuorum mode (also called as fix quorum) we tried deleting the AG and received below error.

Failed to destroy the Windows Server Failover Clustering group corresponding to availability group ‘AG’. The operation encountered SQL Server error 41000 and has been terminated. Refer to the SQL Server error log for details about this SQL Server error and corrective actions.
An error occurred while removing availability group ‘AG’. The DROP AVAILABILITY GROUP command removed the availability group configuration from the local metadata. However, the attempt to remove this configuration from the Windows Server Failover Clustering (WSFC) cluster failed because the Always On Availability Groups manager is not online (SQL Server error: 41081). To remove the availability group configuration from the WSFC cluster, re-enter the command. (Microsoft SQL Server, Error: 41081)

In above error message, we see another error number: 41000 for which the text is as follows. “Failed to obtain the local Windows Server Failover Clustering (WSFC) handle (Error code %d). If this is a WSFC availability group, the WSFC service may not be running or may not be accessible in its current state. Otherwise, contact your primary support provider. For information about this error code, see “System Error Codes” in the Windows Development documentation.”

WHAT YOU SHOULD DO?

Well, no action needed. The message raised is not a dangerous error message. It clearly mentions that availability group information has been removed from SQL Server, but it might still exist in the failover cluster. So, you can click OK on message and check availability group in SSMS and it should be gone from “Always On High Availability” > “Availability Group”. At this point, the database would be in restoring state and you need to bring them online using below

ALTER DATABASE <DBName> SET ONLINE

RESTORE DATABASE <DBName> WITH RECOVERY

Once things are back to normal, you need to reconfigure availability again.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Drop failed for Availability Group – Failed to Destroy the Windows Server Failover Clustering Group Corresponding to Availability Group

While deploying Always On Availability Group with a client, they found that when they use automatic seeding, the database was not shown on secondary. After digging more, we found that it was there earlier in “Restoring” state when seeding was in progress but then it automatically disappeared from the list of databases on secondary. Here is the option in the wizard.

Here are a few earlier blogs which I wrote about seeding failures.

SQL SERVER – AlwaysOn Automatic Seeding – Database Stuck in Restoring State

SQL SERVER – AlwaysOn Automatic Seeding Failure – Failure_code 15 and Failure_message: VDI Client Failed

I checked SQL Server ERRORLOG to figure out what might have gone wrong with auto-seeding.

SQL SERVER – Where is ERRORLOG? Various Ways to Find ERRORLOG Location

The important messages about the database, which we tried to seed, are as follows.

Error: 1412, Severity: 16, State: 211.
The remote copy of database “DB” has not been rolled forward to a point in time that is encompassed in the local copy of the database log.
Starting up database ‘DB’.
The database ‘DB’ is marked RESTORING and is in a state that does not allow recovery to be run.
Automatic seeding of availability database ‘DB’ in availability group ‘AG’ failed with an unrecoverable error. Correct the problem, then issue an ALTER AVAILABILITY GROUP command to set SEEDING_MODE = AUTOMATIC on the replica to restart seeding.

The last bullet has an interesting message and tells who to restart the seeding. The very first message (Error: 1412, Severity: 16, State: 211.) tells us the cause of the problem.

You need to monitor seeding using DMVs. There are two dynamic management views (DMVs) for monitoring seeding: sys.dm_hadr_automatic_seeding and sys.dm_hadr_physical_seeding_stats. One of them shows information about seedings happening currently and “stats” show the historical data about seeding. In this case, seeding failed with SQL Error.

WORKAROUND/SOLUTION

The cause of the problem was an additional log backup which happened while seeding was in progress. I used my own earlier blog to find backup history SQL SERVER – Get Database Backup History for a Single Database

After running the script from my blog (on ALL replicas), we found that a backup job which was taking log backup on “another” replica every 15 minutes. We disabled the job and started seeding again. This time it worked without any issue.

Have you encountered some error during Always On Availability Group? Please share via comments and help others.

Reference: Pinal Dave (https://blog.SQLAuthority.com)

First appeared on SQL SERVER – Automatic Seeding of Availability Database ‘DB’ in Availability Group ‘AG’ Failed With an Unrecoverable Error

I recently had a very simple and interesting error received while I was working with Docker and Persistence Storage for my upcoming SQL Server Performance Tuning Practical Workshop. The error was related to script upgrade mode and the fix was extremely simple. Let us see the entire story today.

Earlier on this blog, I wrote a blog post about SQL SERVER – Docker Volume and Persistent Storage. During the blog post, I explained how with the help of the docker we can instantly upgrade our SQL Server instantly. It was a fantastic feature as if you think when we have to update SQL Server with the latest update, we have to re-install SQL Server and it takes a lot of time. However, with the help of Volume, we can easily upgrade SQL Server in just within a minutes.

Fix Error for Upgrade Mode

Recently, I attempted to upgrade my SQL Server CU to the latest version of SQL Server and instantly I got the following error:

Login failed for user ‘sa’. Reason: Server is in script upgrade mode. Only administrator can connect at this time. (Microsoft SQL Server, Error: 18401)

Here is the screenshot of the error.

Fix/Workaround/Solution:

Well, the solution if this issue is very simple. I just waited for about 1 minute and attempted to log in again and I was able to login immediately. Whenever we are using Docker and Volume, our data remains on the persistent storage but our core engine is updated. Once SQL Server comes up with the new update, it runs few of the upgrade scripts for your database. While the scripts are running the database even though looks up it does not allow you to connect with it.

In most of the cases, the script upgrade takes 1 minute or 2 minutes of time. If you wait for that much and try to log in, your error will go away. Well, if you are here searching for the solution to this problem. I am very confident by the time you finished reading this blog post the error is gone. Just go ahead and try to login again and it should work.

Reference: Pinal Dave (https://blog.sqlauthority.com)

First appeared on SQL SERVER – Fix Error – Login failed for user. Reason: Server is in script upgrade mode. Only administrator can connect at this time. (Microsoft SQL Server, Error: 18401)