KB0077 - Startup Failure of PowerShell Universal Server in Multi-Node SQL Environment

KB0077 - Startup Failure of PowerShell Universal Server in Multi-Node SQL Environment

Scope

This article applies to PowerShell Universal environments earlier than 4.3.0 that are configured with multiple PSU servers using a SQL database. 

Problem

The PowerShell Universal server may fail to start due to SQL timeout errors. Background jobs, such as the heartbeat and app token sync jobs, may run rapidly, and concurrently, during server startup. This will be evident in the system logs due to errors related to SQL and multiple heartbeats run at once. 

Root Cause

The root cause is due to how the Hangfire scheduler works when a job queue is not available for a scheduled job. This can happen if a PowerShell Universal service is stopped before it can remove its scheduled background jobs. The schedules will remain in the Hangfire scheduler queue and will continue to create new jobs in the queue for the computer that is no longer running. This requires that at least 1 PowerShell Universal server is still active. If the server is down for some time, this queue will grow large, and the server will attempt to run all the jobs at once when it is started. If it's unable to do so, it may crash, causing the server to never succeed in starting. 

Solution

PowerShell Universal 4.3.0 introduced logic to skip these jobs during the startup process to only process the most recent copy of the job. In order to clean up the jobs table, you can truncate the Hangfire.Jobs table in the SQL database and remove the recurring job for the service that is no longer running. These can be safely removed because restarting a PowerShell Universal service will cause the schedules to be recreated. Always backup your database before performing any SQL operation. 

1. Truncate Hangfire.Jobs

To remove all queued Hangfire jobs, run the following SQL command. 
  1. DELETE FROM Hangfire.Job

2. Remove Unused Schedules

You can remove unused schedules from the Hangfire dashboard. Visit http://<servername>:<port>/hangfire to view the dashboard. Select the 6 recurring jobs for the server that is no longer active. You can see the selected schedules below. 
  1. <NodeName>.ProcessMonitor
  2. AppTokenRefresh.<NodeName>
  3. GitSync.<NodeName>
  4. Heartbeat.<NodeName>
  5. ModuleRefresh.<NodeName>
  6. HealthCheck.<NodeName>


If you remove a recurring job in error, you can restart the affected service to force it to recreate the schedule. 
    • Related Articles

    • PowerShell Universal Service crashes on startup after an upgrade to 1.4.6

      Version: PowerShell Universal 1.4.6 Issue The PowerShell Universal service will crash with the following error in Event Viewer.  Application: Universal.Server.exe CoreCLR Version: 4.700.19.56402 .NET Core Version: 3.1.0 Description: The process was ...
    • KB0069 - PowerShell Universal Startup Process

      Purpose The purpose of this document is to outline the steps that PowerShell Universal takes when starting up. Process 1. Insert current product version and install date in database Updates the database with a record about the current product version ...
    • KB0011 - Are licenses different between Production, QA and Test/Development servers?

      Update January 24th, 2023 Adam recently summarized the Developer's license per below: The only real limitation on the developer license is that it cannot be accessed remotely. The server is only available on loopback when using the dev license. If ...
    • IIS: HTTP Error 502.5 - ANCM Out-Of-Process Startup Failure

      Applies to: PowerShell Universal 1.4 or later When configuring PowerShell Universal to run under IIS, you may be presented with an error that states ANCM Out-Of-Process Startup Failure. This error can be presented for a variety of reasons but the ...
    • KB0036 - How to set the license via environment variable

      Purpose The purpose of this article is to explain how to set the license for PowerShell Universal by way of environment variable instead of the traditional file-based method. Scope For now, only Windows is in the scope of this article. Linux and ...