All alarms exist for a reason, but in many offshore assets too many alarms can be worse than too few. Amor Group’s Brendon Glass explains why.
In today’s automated plants or platforms, thousands of alarms have been configured into the control system. Each alarm is intended to represent an abnormal situation requiring attention. However, an unintended consequence is the large number of nuisance alarms that exist, causing frustration and anxiety for the operator.
Alarm management is therefore becoming an increasing priority. Indeed in the UK, the Health & Safety Executive has identified alarms as one of its top ten concerns. Incident investigations reveal that floods of alarms from automatic systems distract the operator from dealing with a problem, increase stress and conceal important new information against a deluge of low-value, repeat or consequential warnings.
Poor human machine interface (HMI) design and poor alarm prioritization are issues that significantly hinder an operator’s ability to respond effectively in order to handle process upsets. It is estimated that even operators of small plants can lose up to $100 million per year to upsets and shutdowns. Restoring the effectiveness of alarm systems has to be a priority so that plants and assets can be operated safely and cost-efficiently.
Asset managers need to understand the alarms issue and how significant improvements can be achieved within reasonable time and cost constraints.
Modern highly automated systems are efficient, reliable and generally safe. Most of the time, the operator has little to do but monitor the overall situation and respond to the occasional minor alarm. However, when something more significant does happen, boredom can turn to panic rapidly. Often this transition is accompanied by a barrage of bells, klaxons, bleepers and flashing lights. Suddenly an operator is presented with more than anyone can process and act upon in the time available. It has become widely accepted that a sustained rate of about two alarms per minute is as much as process operators can handle, with four or five per minute perhaps being acceptable but only for a short period.
Excessive alarms have two consequences. First, the higher the alarm rate, the lower the probability of an operator noticing it and responding effectively. The second is more subtle and potentially more dangerous. Humans are not particularly good at vigilance tasks – hence the need for alarms in the first place. As the first few alarms sound, the human operator moves quickly from vigilance mode through analysis to action. Evidence supports the fact that once into action mode, people rarely re-evaluate their initial analysis, even if subsequent information clearly indicates that the initial analysis is wrong.
Once into action, there is a powerful human tendency to confirm rather than hypothesis and this bias may be more prominent under stress.
Alarm systems commonly suffer from three major problems.
Standing alarms can obscure other more important information. These are alarms that remain active even though the asset is operating normally, in effect a disabled alarm. They are usually caused by instrument faults, inappropriate alarm limit settings and out-of-service equipment. If asset conditions change, they cannot alarm again.
These alarms are relatively easy to deal with – instrument faults can be fixed, limits can be adjusted and logic can be devised to suppress alarms on out of service equipment. This should however be part of a process of continuous improvement and not a one-off exercise.
Nuisance and repeating alarms can be repeatedly activated and then cleared and are seriously distracting to the operator. The Engineering Euipment & Material Users’ Association guidelines and research by the UK’s HSE cite evidence that 50% of alarms are caused by a small number of alarm points. Nuisance alarms are typically caused by faulty instruments, alarm limits set too close to normal operating conditions and ineffective or no use of mechanisms designed to minimise repeating alarms.
Reducing the amount of nuisance and repeating alarms is not technically difficult but does require a sustained and determined effort from the operations personnel.
Alarm floods are by far the most serious issue. It is quite common for several hundred alarms to occur in the first ten minutes following an upset on an asset and peak rates of one alarm per second are not unusual. The result is that the operator effectively abandons the alarm system, acknowledging alarms without looking at them. This means important information may be missed or misinterpreted.
Ten in ten
The primary cause of alarm floods is not hard to find but the solution is more so. Most modern computer-based systems simply have too many alarms. A typical distributed control system on an offshore platform was found to have over 5000 points with one or more alarms and more than 10,000 individual alarms.
The HSE has set a challenging target – no more than ten alarms to be displayed in the first ten minutes following a major plan upset.
Given the safety and cost implications of failing to respond appropriately to alarms, improving the management of alarm systems is vital.
The plant design process provides an opportunity to deal with many alarm problems at source – by prevention rather than cure. If an alarm isn’t configured, it can’t become a standing or nuisance alarm later, or contribute to any alarm flood.
It is important that design decisions regarding alarms are not made in an ad hoc manner. Instead, decisions should be made within the framework of an overall design strategy and philosophy which should include a formal set of principles and policies for alarms. In particular, each alarm should require a formal justification and a formally defined operator response.
The HAZOP technique should be revised to treat alarms in the context of the overall operation, or perhaps not to consider alarms at all (on the basis that alarms cannot be relied upon to maintain safety). Alarm reduction techniques should be considered only where there is both a need and a realistic chance of success.
Once a plant is operational, alarm system performance should be subject to a process of continuous review and improvement.
An alarm system improvement exercise on an existing asset should be anchored to a formal set of principles and policies. However, it may be better first to conduct a review of the design, configuration and performance of the existing system. This requires pulling together the numbers and priorities of the alarms configured, the number of standing alarms, evidence of nuisance and repeating alarms and alarm rates following upsets and trips.
Most modern systems provide some form of auto-documentation facility, and this is usually the best way of determining how many alarms are configured, and what their priorities are. Most systems also provide a ‘current alarm’ report which can be run at regular intervals when the plant is operating normally, in order to identify standing alarms. If the system has an alarm/event historian facility, then this can be used to investigate nuisance and repeating alarms, and to obtain some measure of average and peak alarm rates.
Alarm rationalization
If a design review or alarm performance survey shows there is a problem with alarms, then an alarm rationalization exercise may be necessary to establish the purpose of the alarms.
This leads to some basic principles including elimination of duplicate alarms, ensuring normal or expected events and operator’s actions are not alarmed and there is no more than one pre-alarm for the cause of each trip.
In addition, the number of alarms at the higher priority should be limited. The EEMUA guidelines suggest no more than 10% at the highest priority and 20% at medium priority, with the remainder being low priority.
A risk-based methodology, similar to that used in the IEC 61508 standard for assessing instrumented protective systems, can be an effective tool in determining alarm priorities if the relevant questions are asked:
Can the operator do anything?
Is there time for the operator to act?
What happens if the operator fails to act in time, or at all?
Risks must be in relation to personnel, the environment, equipment and production. With the possible exception of fire and gas systems, alarms should not be relied upon to prevent human injury or serious environmental damage.
An effective alarm review should take other aspects of plant operability into account. In particular, the ergonomics of the operating displays and the robustness of the automatic controls can have a large impact on the operator’s ability to maintain ‘situational awareness’ during upsets, and to return the plant rapidly to normal thereafter.
A number of techniques can assist with achieving good alarm system performance in practice.
One of the biggest problems with alarms is that there are simply too many. It is quite common for installations to have many thousands. Reviewing these individually would be prohibitive in terms of cost and time. Furthermore, it would be hard to maintain consistency among several reviewers working over an extended period.
One effective solution is to categories alarms as far as possible, for instance: ‘trip pre-alarm’, ‘failure to trip on demand’, ‘product off-spec’. All alarms in a category would then be assigned the same priority and other characteristics. This technique can be very effective, but does require the reviewer to have a good understanding of the process and its operation.
On one offshore oil production platform, Amor Group identified 4397 alarms, of which 48% were high priority, 5% were medium priority and 47% low priority.
The booster and export pumps were configured as duty/standby pairs, with automatic logic which starts the standby pump should the duty item trip. The pressure in the export pipeline is subject to severe disturbances, originating on another installation, which regularly cause a pump to trip, usually on low flow or low suction pressure. Changeover to the standby pumps generated a flurry of alarms, few of which were of any value.
Following rationalization the number of alarms was reduced to 2554, with the new configuration having only 8% high priority alarms, 10% medium ones and 82% low priority. But this is not the whole story, as a number of other measures were also taken to reduce transient and consequential alarms, as well as those from out-of-service equipment.
The trip alarms on shutdown valves were replaced by alarms on failure to trip; duplicated alarms on the pump suctions were eliminated by using an alternative sensing point on the common suction line; the control logic and machinery condition alarms for each pump pair were grouped; and the low suction pressure, low flow and machinery condition alarms were suppressed for any pump that is stopped, and for 15 seconds following any pump start. If the entire export system is shut down, all alarms are suppressed apart from those indicating high pressure, high level or failure to trip (these conditions are hazardous even though the plant is not running). It should be noted that only these last conditions have potentially serious consequences. Their alarms are therefore the only ones set to ‘high’ priority.
Given the large numbers of alarms that need to be examined, software tools are essential. Spreadsheet software such as Microsoft’s Excel can be used to analyze data. However, a specialized database package such as TiPS LogMate provides valuable additional facilities for long-term maintenance and improvement programmes.
Following the initial review and rationalization exercise, achieving and sustaining good alarm system performance requires long-term commitment to a process of continuous review and improvement.
Amor Group has field-proven cost and time effective techniques which have been used to review and rationalise 500,000 alarms on new and existing assets. The methodology comprises five phases: data gathering, standards and policies, fixing problems, applying advanced techniques and then maintaining performance.
Improved alarm management can improve safety, reduce the number of unplanned maintenance issues and costs, reduce risk operation insurance and capture workforce knowledge. It is clear therefore that implementing a pragmatic and cost-effective alarm management programme which does not tie up key resources can improve production and profitability. OE