My Workflow Runs are not running in the order I expect

While Workflow Runs are triggered on a first come, first serve basis, the order in which the Workflow Runs will be executed is not guaranteed to adhere to that order.  This is not a Platform shortcoming but simply the case where potentially hundreds of Workflows may be queued to run or currently running cannot be handled in an enforced consecutive manner.  In the vast majority of cases this will not have any impact on the correct functioning of the application, but it can present a potential issue if the Workflows have been designed in such a manner that timing down to the millisecond level becomes important - and should be a foremost design concern in the application architecture.

Consider a Parent Workflow A, which itself contains Child Workflows X, Y, Z and there are numerous Activities that result in firing off Triggers which in turn start their associated Workflows, each of which may have their own Child and Triggered Workflows.  When designing a series of coordinated Workflows, and walking through all the possible paths and triggering events, there may be a scenario where the value of a particular Field in particular Record is used for a Calculation in another Workflow, and that value may not be the expected value at the time the latter Workflow executes.    

Sample Case Study

The Tenant Administrator had noticed that on a very few Records, an important piece of data was missing which was meant to populated by a Workflow.  These Records were for IT Recovery Scenarios which included the fields Actual Start Time and Actual End Time, which are in turn used in calculating the real elapsed time taken to execute that IT Recovery Scenario.

However, out of thousands of IT Recovery Scenarios, there were 6 that were missing Actual Start Time.  As such, the elapsed time could not be calculated.

The Tenant Administrator knows which Workflow is responsible for setting the Actual Start Time and opens it in the Workflow Builder to commence troubleshooting.  This Workflow is called Act Start and End Time, and the Input is an IT Recovery Scenario record. BELOW

The Act Start and End Time Workflow

The Tenant Administrator inspects the Gateway Activity and opens the Calculation used to determine the criteria that the Gateway will take the Start Time path.

Gateway Calculation for the Start Time path

This Calculation is saying "if the IT Recovery Scenario has no Actual Start Time value and the value for the field Status on this IT Recovery Scenario is equal to Running, then take this path".

The Tenant Administrator inspects the Update Start Activity - this appears to be straight forward.  

The 'Update Start' Activity

It simply updates the value of Actual Start Time on the Input Record (IT Recovery Scenario) with the current date and time.

There is nothing that looks complicated about this Activity, and it has worked for the vast majority of Records that have been passed through it.  Using the Examine Workflow Run, the Tenant Administrator views the Steps Taken tab for the Act Start and End Time Workflow for the record that was missing the Actual Start Time and notices a key piece of information - the Gateway did not take the Start Time path, and instead went straight to the End Activity.

The Gateway was the only Activity executed

For some reason, based on the available evidence, the Calculation: 

[Input].[Actual Start Time] is null and [Input].[Status] = 'Running' 

... is returning FALSE (for the sake of accuracy, both paths evaluate as FALSE since neither path was taken, but we are focusing on Start since the expectation is that this returns TRUE).  For that to occur, either the Actual Start Time is not empty, or the value for the Status field on the Input IT Recovery Scenario is not Running.  Given that the problem the Tenant Administrator is troubleshooting is that the Actual Start Time is empty, that leaves only the other alternative - that the value for the field Status is not set as Running when this Gateway was evaluated.

The Tenant Administrator determines that the Act Start and End Time Workflow has a Trigger for it.

The Trigger responsible for Act Start and End Time

Inspecting the Trigger, it is seen that the Act Start and End Time Workflow will be run whenever the Relationship Status on a Record of type IT Recovery Scenario has been updated.

Trigger definition for when to run 'Act Start and End time'

The Tenant Administrator opens the Form for the IT Recovery Scenario to check in what ways the Status Field may be changed.   When editing a Record, it can be seen that the Status Field is not directly editable from the Form.  However, there is an Action Button labelled Start.

An IT Recovery Scenario Record

The Tenant Administrator then opens the Start Workflow in the Workflow Builder.

The 'Start' Workflow

There are several Gateways which provide validation checks to ensure that this IT Recovery Scenario may be started - whether the initiator is part of the group that owns this IT Recovery Scenario, whether it has been marked as ready, whether it has been allowed to be started, and whether it was already marked as Running.

Inspecting the sole Update Activity in the Workflow reveals that this is how the Trigger to run Act Start and End Time is fired - an update to the Status Relationship.

The 'Start' Workflow's Update Activity

The scenario is as follows:

  1. A user is viewing an IT Recovery Scenario who is a member of the group that owns it
  2. The IT Recovery Scenario is marked as both ready and allowed to start, and it is not marked as Running
  3. The user presses the Start Action button, and the IT Recovery Scenario Record is passed into the Start Workflow
  4. It passes all of the Gateway checks contained within the Start workflow and reaches the Update Activity
  5. The Running Update Activity changes the value of the Status Field from Not Started to Running
  6. This change triggers the Act Start and End Time Workflow to execute, which takes the same IT Recovery Scenario Record as input
  7. The Gateway in Act Start and End Time evaluated to FALSE, and by elimination it is because the value of the Status Field is not Running
  8. The Act Start and End Time Workflow proceeds directly the End activity, and the value for Actual Start Time is not set

Due to the way that Workflows are queued, and depending on the precise timing, it is possible that the Act Start and End Time workflow runs to completion before the change to the value of the Status Field has been propagated to the database.  In the vast majority of the cases this does not happen - when the Act Start and End Time Workflow's Gateway evaluates, the Status Field has been changed to Running.  

However, in a very small number of cases this condition does occur, and as such the way these Workflows are implemented needs to be changed.

Consider that there are two Fields which are being acted on - the Status Field and the Actual Start Time field.  Both  of these values are on the same IT Recovery Scenario Record, and yet the updates are in two separate Workflows.  The second Workflow, Act Start and End Time serves no other function than to set the Actual Start Time (and Actual End Time).

There is no reason to split the updates into two Workflows - there is an existing Update Activity in the Start Workflow that is already making a change to one Field.  The solution is set the Actual Start Time in the same Update Activity that changes the Status Field.

Mitigating the race condition

Before the Tenant Administrator is moves the update to the Actual Start Time out of the Act Start and End Time Workflow, the impact needs to be considered.  The Act Start and End Time had Gateway conditions to dictate when to set the Actual Start Time:

[Input].[Actual Start Time] is null and [Input].[Status] = 'Running' 

The second condition is checking whether the Status Field is set to Running, and given that the Running Update Activity is setting Status Field to Running, this does not represent a problem - there is no need to check as that condition is being set in that Update Activity.

The first condition is checking that there is no Actual Start Time value.  This could be a problem if someone is able to run the Start Workflow using an IT Recovery Scenario that is already Running or Completed where there is an existing value for Actual Start Time - the original value would be overridden with an incorrect value.

The Tenant Administrator revisits the IT Recovery Scenario Form and checks the Start Action Button.  It is only visible when the Status Field is set to Not Started.

Start Action button visibility calculation

As long as the following criteria are met, it is safe to make the change:

  1. There is no other valid path to populate Actual Start Time apart from the Start Workflow
  2. The Start Workflow can only be initiated on an IT Recovery Scenario Record via the Start Action button on the Form
  3. That the default value for all IT Recovery Scenario Records for the Status Field is Not Started

Once this is done, the Tenant Administrator knows there is another avenue to check, even though there is not yet an issue arising from it - the Act Start and End Time Workflow, as the name states, does both the Start and End times.  It is possible that the Gateway that evaluates the condition for the End path may be vulnerable to the same problem.

Act Start and End Time's Gateway Calculation for End Path

The Tenant Administrator now follows a similar analytical approach as they did with the Actual Start Time to determine if there is a safer way to implement this.