Wednesday, September 24, 2014

Reliability Counts…………………

Hello All, welcome to the blog site; The Business of Maintenance. None of you know me, what my background is, and quite honestly, probably don’t give a ‘Rat’s Butt’ on the things I want to say and publish. However, I think there are things that need to be said, especially those thoughts and concepts that may be considered a maintenance taboo. These subjects are ‘taboo’ for many reasons, the most important being you’re ‘squatin’ in another man’s posy patch. And that other guy (or gal) may be your boss; we don’t need to lose a job over an opinion, now do we. Many of the things I will talk about in this forum are topics that really, really torque me off when it comes to Maintenance Management in the 21st century. Things like training and qualification, like condition based and predictive maintenance, like performance based testing and acceptance concepts, like planning and work control, like scheduling a project and sticking to the plan and budget, etc….., etc….. and etc…..
So off we go and as the lead in bullet states, ‘Reliability’ will be my first topic. And what started me off on this subject was a verbiage disagreement with a colleague. On one of the projects I have been working recently, my colleague provided a peer review of a specification I put together for a client company’s Preventive Maintenance (PM) program. My colleague is a chemical engineer, with a background in risk assessment and risk management. His maintenance background was never ‘hands on’ but he does understand that Maintenance is a key factor in managing facility risk and of course Reliability and Availability. He’s just a bit weak in all of the things it takes to get there. And of course he is substantially younger.
For this particular client, we are developing asset management processes for emergency and backup systems to critical assets. And in my PM specification, it seems I had the daring to include concepts of Reliability and Availability in the documents purpose and scope. And these concepts did not fit a truly analytical risk assessment or risk management definition; how dare me!!!
I had made the statement in the specification’s purpose and scope that the focus of the clients PM program would be on Availability while maintaining a reasonable level of Reliability. My bad; too vague, e.g. there is no way to rationalize ‘reasonable’ level of reliability and I could not use the term availability without including an analytical definition. The engineer stated that components, equipment and systems have an absolute numeric value, a calculated value based on Probability of Failure (Pf = 1-e(λt))  for reliability and for availability, you must include Mean Time Between Failure and Mean Time To Repair  (A = MTBF ÷ (MTBF + MTTR)) in any definition. WOW!!! And all this time as far as Maintenance, not system design, not facility design, not upgrades and not retirement, but just the maintenance of existing systems, structures and components, I had always assumed Reliability and Availability were defined concepts, not calculated absolutes.
Now if I were to conceptualize a new process, a new plant or new facility (or modify existing) and then specify a minimum level of reliability, e.g. establish a precise criteria of reliability (less than 1 failure every 1,000 years), then I could use real world data from various industries to evaluate things like failure rate, repair time, parts availability, etc…. and perform one of many calculations that could provide one of several statistical distributions to show this acceptable failure rate (reliability). Same for availability, if I have a clean sheet and I am putting together the conceptual criteria for a new process or new facility, then yes, by all means bring on the ‘spreadsheets’. 
If you like to play with numbers and look at conceptual values for the different reliability and availability models, you should visit the Web Site Reliability Analytics Toolkit. It is pretty complete. The link is;
However, if I have a dirty old plant that may or may not reflect a documented configuration, has unrecognized repair issues or includes business practices that do not support the original design basis or a reliability based  work control process, then all I have left are concepts. Any absolutes have degraded with time, with money (profitability is the priority) and with management. So when discussing maintenance as a business, one must always see Reliability and Availability as a concept, not a calculation.   
So what are these conceptual definitions that I cling to so desperately? Where did they originate?
The answer is MIL-STD-721, Definitions of Terms for Reliability and Maintainability. This standard, originally published in the 1960’s, was (and still is) intended as a common base for reliability and maintainability definitions to reduce the possibility of conflicts, duplications and incorrect interpretations either expressed or implied elsewhere in documentation. These terms and their definitions are;

  • Important in acquisition of weapons systems (or other items of critical nature) for precise definition or reliability and maintainability criteria.
  • Unique in their definitions, allowing no other meaning.
  • Expressed clearly, preferably without mathematical symbols

And since our modern concepts, and of course, calculations relating to reliability and maintainability both as a virtual design and a real world ‘boots on the ground’ are rooted in military design and procurement, it only seems fitting that we use these definitions in our day to day industrial and commercial maintenance business practices. And I had assumed that a majority of engineers and maintenance professionals that deal with conceptual and real world application of these concepts understood these definitions, or at least the source of the definitions; then again I may have assumed incorrectly.

Because of potential future misconceptions, I have attached a copy of the MIL-STD-721 to this blog site. That way you can download it for reference anytime you like. Just click on the link in the upper left corner. Once you go through the definitions, then we can discuss and pontificate on this subject in a professional manor because now we are utilizing the same basis for our thoughts.

What are those definitions for Reliability and Availability?

Availability – A measure of degree to which an item is in an operable and committable state for the mission or the item is in a state of readiness. This definition does not include actual mission time.

Reliability – The duration or probability of failure-free performance under specified conditions or the probability that an item can perform its intended function(s) for a specified interval under specified conditions.

PM’s for Critical Asset Emergency and Backup Systems………….
.
So just what in heck am I talking about here? What are critical asset emergency and backup systems?

These are your redundant process, your backup, your standby, your life safety components, equipment and systems. These are the items that need to come on-line to prevent loss of life and/or loss of critical assets. Generally these components, equipment and systems must start automatically (no operator action). Manual start components, equipment or systems that function as back up to a primary are also included. When I call out a PM process were the focus is on availability, we want to make sure that our emergency and standby components, equipment and systems are mission ready and committable when needed.

So what kind PM’s do this for us?

For items that have some type of remote and automatic start feature or must be manually started in short order, I want to make sure all of those little things are addressed. The fuel tank is topped off, the batteries are charged, the oil is checked, there are no obvious leaks, the cooling water is full, etc…I basically want to ensure that all of those thing that an operator would check prior to starting that backup are done on a routine basis. And then, I want to routinely make sure that backup starts, generally a manual start, not a forced start. However, I do need to include routine forced starts, just not as often; we will call this a surveillance, not a PM.    

So now that we have done everything we can to ensure that components, equipment or systems are ready to and can start and that yes, they routinely do start, how do we maintain a reasonable level of reliability?

There’s that pesky term again; reliability. Here we change the oil, replace batteries, replace filters, collect and analyze data during operation (oil pressure, charging voltage, vibration, temperature, etc…) and all of those other things we need to accommodate so that when that backup is actually functioning as a backup, it performs its specified function failure-free.

Just to re-iterate what should be expected from PM activities on emergency and standby components, equipment and systems; your efforts should focus on availability while maintaining a reasonable level of reliability. And within this opus, managing a high level of component and equipment availability actually improves overall system reliability. WOW!!!

 Maintenance, what a Concept!!!!

MMJennings

No comments:

Post a Comment