IIoT for R&M: The Real Struggle, Part 1

About 12 years ago I worked with a pulp and paper mill that had built a home-grown model to forecast inventory levels up to a month into the future. The model pulled in process data, received some “biological” inputs, and within a few months became very accurate at forecasting everything—chip pile inventories, liquor levels, the amount of finished product, and more. Even then, it was clear that the potential benefit of millwide process integration was huge, and has a positive impact on the 4 Pillars of Manufacturing Excellence: Safety, Environmental, Quality, and Profitability.

The Industrial Internet of Things (IIoT) has been a popular topic in the reliability and Maintenance (R&M) realm for some time, and interest is growing rapidly. Arguably, R&M IoT has been here quietly working for a couple of decades. (Anyone have online vibration monitoring? Have you ever used vibration monitoring compared with valve position to tune flow rates? That’s R&M IIoT). After almost two decades working around pulp and paper manufacturing, I have learned that we like lots of data—but the one thing we love even more than data is to ignore it!

I know the IIoT is trendy, cool, and capable of extraordinary improvements, but we’ve all seen similar things come, prove their worth, and fade away. In these two articles, I will discuss some examples of how, in the presence of undeniable data, we choose the wrong course of action; why technological uses similar to IIoT have struggled to sustain themselves; and what it will take for the adoption of R&M IIoT.

The First Hurdle: Distrust of Technology

There seems to be a hardwired human distrust of systems we cannot see (or see into). If we do not know how the system really works, or we lack a detailed understanding of the system, then we seem naturally skeptical about what that system can do. This distrust is especially evident when we’re discussing complex “black box” systems. Here are a few examples, based on my mill experience.

Safety & Environmental Examples:

Stack monitors—Often, theboiler stack monitors are blamed first. Once the monitor is pronounced good, then we look in to the process. The “smart” equipment and data said there was a process problem, but we subordinate that information to our biases, tribal knowledge, and culture.

Confined space entry monitors—I once saw a confined space entry where the hole watch attendant began to get an alarm for high carbon monoxide. The attendant notified the entrants and they were removed from the confined space. The confined space was a couple of floors off the ground and the first suspicion was that the problem was with the monitor—i.e., “There is no source of CO anywhere around here, the monitor must be bad.” This was stated by operations and maintenance, both hourly and salaried alike. Someone went to the safety office to get another monitor.

Guess what? That monitor alarmed for CO as well. Now guess who got the blame? Not the monitor, but the safety office—for “not having any good monitors.” A third monitor was called for; it also alarmed high. At this point, Maintenance called the Safety Office to request a waiver for entry related to CO “because all the monitors are defective—no way there can be any CO.” The Safety Office then brought a fourth monitor to the site, sampled the confined space and found high CO. An effort was made to convince the Safety Officer to allow entry, but he stood his ground. At that point, the Safety Officer instructed a walk down of the area. Lo and behold, a contractor had started up a portable welding machine run by a diesel engine. The CO was real. All four monitors were correct. The welding machine was moved, the confined space then resampled and found safe, and work resumed. Again: we subordinate information to our biases, tribal knowledge, and culture.

Chlorine dioxide monitors—At one plant I knew two process engineers, one of them fairly senior and the other with only a few years’ experience. These two engineers were working together during an outage conducting initial confined space entry air monitoring. While moving from one location to the next, they became aware that the local air monitors for ClO2 were alarming in their immediate area. The junior engineer suggested they leave the area, as the alarm indicated. The senior engineer said that the alarm was a malfunction because there was nothing to produce ClO2 in the area, and they should go investigate. The junior engineer left the area and the senior engineer proceeded into the hazard. Minutes later, the senior engineer returned and, with a slight cough, told the junior engineer that the ClO2 alarm was indeed real. We subordinate information to our biases, tribal knowledge, and culture.

My experience with “small” data, data analyzers, and process control has led me to the conclusion that we first blame equipment failure on either the most complicated sub-system, or the sub-system we know the least about. In the evidence of clear environmental and/or safety hazards, otherwise intelligent, highly trained, and skilled employees ignore the data they are given and substitute their own experience and biases. For “big” data and the R&M IoT to take root and grow, we must address the common failing of ignoring data over our biases, tribal knowledge, and culture.

Productivity Examples:

Black boxes—I have worked with many pieces of equipment that had a “black box”—that is, some sort of logic controller that can not be repaired by the plant. When they fail, these black boxes must be changed in whole. One example was an integrated motor control with a PLC controller. This “black box” could not be troubleshot by maintenance; it could only be replaced with a spare. In many years of working around this equipment, I saw this black box blamed in dozens of failures and replaced, while the failure remain unresolved. This would force maintenance to fix some other problem to resolve the failure. Eventually, I began to document the steps we were taking. Generally, maintenance would spend about 20 minutes troubleshooting, pronounce the black box as the problem, order a replacement from the storeroom, change it out, and then begin looking for the real problem once everyone saw that the black box was not the issue.

When I had compiled enough data, I presented it to the maintenance folks and suggested that, on the next failure, we not just jump to changing the black box after 20 minutes of troubleshooting. On the very next failure, at about the 20-minute point, maintenance pegged the failure on the black box. I reminded them of the data and they scoffed. We ordered up the black box, changed it out… and then began looking for the real problem.

What we learned from this exercise was that all the data and logical explanation in the world could not overcome the desire to blame the thing we cannot control. So, we resolved that the next time we had a failure, the very first thing we would do is change the black box. At least we avoided wasting the first 20 minutes troubleshooting! We subordinate information to our own biases, tribal knowledge, and culture.

Amazing Maintenance Supervisor – I knew a maintenance supervisor who was extremely skilled at understanding the operating process for his production area. He knew the set up and operation better than most operators. After years of being frustrated by calls to fix equipment that was not broken, he developed a way to deal with these false reports: he built his own operating page within the operations control system. He was able to track, trend, and see all the same data points as any operator. With this data at his fingertips, he began to rebut calls to repair equipment that he knew was not broken.

Most of the calls would come from operations management and, early on, he took the time to explain what was going on with the process settings and how that was manifesting itself to look like equipment failure. Frustrated by the fact that operations management kept ignoring his “big” data, he found an easier way to get his point across. When the call would come to fix the equipment, he would look at his data, determine where the problem was, and call the control room operators to assist them with correcting the process upset. This worked well—until operation management wanted to know what he’d done to correct the problem. Once this supervisor explained the truth, operations management would force maintenance to shut down the process and “fix” the equipment. This led to the ultimate work-around: telling folks what they wanted to hear, then doing what was needed to keep the plant running. I remember this story because it is not that uncommon. We subordinate information to our own biases, tribal knowledge, and culture.

In Part 2 of this article, we will discuss what to do about this distrust of technology—plus two more hurdles that we’ll need to overcome before “big data” and the R&M IIoT can really take root in our mills.