DCS reliability analysis

Abstract: Because many units have adopted DCS, the level of automation has been greatly improved, and we have relied on it, so we must ensure that it is not faulty. This paper analyzes the weaknesses of DCS system and proposes ways to improve the reliability of DCS.

I. Overview With the transformation of the DCS of the unit and the adoption of DCS on the new unit, the automation level of the unit has been greatly improved, and the operational personnel have become more and more dependent on the automatic control system. If a fault occurs in the system, safety protection is required. Must be reliable, otherwise, not only make the equipment suffer losses, but also endanger personal safety, and the loss of power generated by downtime will far exceed the price of the control system itself, so the DCS system must ensure long-term stable and trouble-free operation. For this reason, it is necessary to evaluate the reliability of the existing system and to reinforce the maintenance work for the existing weaknesses to ensure the safe operation of the unit.

Second, the reliability of the relevant indicators 2.1 Electronic Product Failure Rate Thermal control equipment mainly consists of electronic components, circuit boards, leads, switches and other components. Among them, the working life of circuit board processing technology, lead wires and electronic components plays a leading role in the reliability of thermal control equipment, among which the length of working life of electronic components is the highest, that is, the failure rate.

Failure rate refers to the ratio of the number of system failures per unit time in the system running to t time and the number of intact systems at time t. Expressed by λ(t), the unit is the reciprocal of time 1/hour. The relationship between the electronic product failure rate λ(t) and time t is shown in the figure.

The curve is divided into three parts. The first part is the early expiry date. The main reason for product failure during this period is caused by defects in the production, installation and commissioning process. As the inspection and handling of these defects are eliminated, it is generally not a year. This situation has rapidly diminished.

The second part is the accidental expiration period, in which λ(t) is very small, and this period is also called the lifetime. Electronic devices generally undergo proper ageing and can quickly pass through the early-expiration period and pass into accidental expiration. This period can last for a long time.

The third part is the effective period of consumption loss. In this time, the product has reached its life, and the failure rate rises rapidly. For electronic products such as long capacitor time, dry electrolyte, resistance aging, etc., a less valuable component can make a system fail. We study the reliability refers to the reliability index of the product in the accidental expiration period.

2.2 Average life, average maintenance time, utilization rate.

In general, the average life of a product is the reciprocal of its failure rate. Using m for m=1/x If the system failure can be repaired, its life m represents the average time between failures, abbreviated MTBF. There is also an important reliability indicator for the repairable system, namely the average repair time, referred to as MTTR. MTTR is a statistical value, and the instrument is much smaller than MTBF. It is a short period of time. The time spent by different people and different technology levels to repair the same fault is not necessarily the same, so it can only be determined through experiments and experience.

The utilization rate is a reliability index of a repairable product, and A=MTBF/(MTBF+MTTR) means that the utilization rate is the ratio of the time that the product normally works to the total time. In order to increase the utilization rate, on the one hand, the product should be improved as much as possible. MTBF, on the other hand, should strive to reduce the MTTR of the product.

Third, the thermal control system reliability analysis from the use of thermal control system points, the system can be simplified as a series system and parallel system. The reliability logic function diagram is as follows:

In a tandem system: Failure of any unit in the system will result in failure of the entire system.

Parallel system: In the parallel system, multiple units perform the same function independently. Only one unit can work normally and the system can work normally. The parallel system can improve system reliability. In theory, the more parallel units, the higher the reliability.

Note: The string and parallel diagrams of reliability logic are different from the string and parallel relationships on the circuit. They are two different concepts. Assume that four filter capacitors are connected in parallel between the power supply circuit on a circuit board and the ground. From the perspective of reliability, any one of the capacitor failures causes a short circuit, which invalidates the entire fast circuit board. Thus, the reliability logic diagram should be Tandem form.

3.1 Reliability Analysis of Series System The DCS system is a complex system composed of many subsystems. Each subsystem is composed of many modules with different functions, and the system must also have a network. Therefore, it is necessary to analyze the reliability of the DCS system. The reliability of the module and network starts.

On each module, regardless of the physical connection of the internal components, failure of any single component usually results in failure of the entire module. From the perspective of reliability, the module is a series system consisting of all components and mounting points on the board. The reliability of a template can be expressed as the product of the reliability of all devices on the board, namely:

Rc(t) = ∏Ri(t)(1)

Where: n is the number of components on the template, and Ri(t) is the reliability of the corresponding component. After conversion, the MTBF of the template can be expressed as:

MTBFc=1/x=1/∑λi(2)

After a large number of experiments, the λ values ​​of many commonly used devices are obtained. A template is composed of many discrete devices. The MTBE size of a template is inversely proportional to the number of discrete components that make up it. The more the number, the shorter the lifetime.

We use a simple automatic adjustment system as an example to analyze high water levels as an example. After using equation (2), we will probably know the λ value of the following template and device.

After a rough calculation of the MTBF of this simple adjustment system of 1.3 years, an advanced system with such a low MTBF is somewhat unacceptable. The theory is that the simpler the equipment failure is, the lower we need to clarify a problem. The more protection devices a system has, the higher the failure rate of the protection equipment is. Is it not to increase system reliability and reduce the number of protection equipment sets? This is not the case. . The role of protection is to protect the device from further damage in the event of a device failure. The more protection functions there are, the less the device will fail. This requires us to strengthen maintenance and scientific management. We must formulate relevant regulations on the maintenance of the unit's DCS system and the regulations for the DCS system engineer station. The purpose is to regulate our maintenance work.

3.2 Reliability Analysis of Parallel System Parallel system can improve system reliability. In theory, the more parallel units, the higher the reliability. Two units of parallel system:

MTBF=∫Rc(t)dt(3)

If: Two systems of λ1 = λ2 = λ Equation (3) can simplify MTBF = 3/2λ

Similarly: when 3 units are connected in parallel:

MTBF=11/6λ

It can be seen that the MTBF of the system can be increased by 1/2 when the two units are connected in parallel, and the MTFB of the system can only be increased by 1/3 again when three units are connected in parallel. In terms of reliability and economy, it is appropriate to use 2 units in parallel.

Fourth, the factors that affect the DCS system Through the above analysis, we can find that more factors affect the reliability of the entire system. Affecting the reliability of a device, a component, and a system, in addition to its own quality process, external factors are also very important, such as ambient temperature, vibration, humidity, electromagnetic interference. The following factors affect the reliability of the system. We analyze and determine the key factors that affect the reliability of the system.

4.1 Internal Factors System internal factors refer to various defects of the system itself, such as: failure of vision, open circuit, short-circuit of capacitors, etc., sometimes including the problem of component design.

The component is the smallest unit that constitutes the system and is the most basic point and starting point for system reliability analysis. Each manufacturer takes this into full consideration when manufacturing equipment. It generally requires component aging, component selection, selection of component electrical properties, and the ability to carry the load on the volt before the device is synthesized. At the same time pay attention to the process, etc. These issues do not need us to consider. Because, as a user, it is only used, a module has been made, we can not change its internal structure, what kind of what kind, so the internal factors that affect system reliability are left to the manufacturer to consider. However, we must understand that the accidental expiration period of a piece of equipment or a system requires us to carry out state maintenance or state replacement when we think that the equipment has reached an accidental expiration date. We must do preventive work to avoid major system failures. The accidental expiration period is a statistical data, which requires us to do meticulous work to accumulate a large amount of data to obtain the accidental expiration period for different equipments of the thermal control system. This work is difficult. If we have reliable data to prove that the system will reach accidental failure within a certain time zone, the reliability of our equipment will be greatly improved.

4.2 External factors External factors mean that the reliability of the system is affected by many external environments, such as environmental temperature, humidity, power supply fluctuations, strong electromagnetic interference, shock, vibration, corrosion, and tightness of line terminals.

In the above we have seen a module with a service life of up to 30 years. After 29 years of use, we are buying spare parts. We analyze that the service life of a module is a possible life, not an absolute life.

4.2.1 Modular work in the control room of the control room, air conditioning unit in winter and summer, the ambient temperature, humidity, impact, vibration has little effect, but the ACM module operating voltage is 24V ± 5% only ± 1,2V The fluctuation value is relatively strict, and the fluctuation of the power supply voltage has a great influence on the life of electrical components. In order to improve the reliability of the system control module, the maintenance of the 24V power supply should be strengthened.

4.2.2 Some plant fire protection protection caused by deflagration of the furnace, this phenomenon is due to the lower limit of the fire inspection board automatically disappears, from the device itself EPROM memory parameters will not be lost, but through the measurement, we found a DC 5V power supply circuit There are AC interference components. This shows that the fundamental reason for the rejection of fire extinguishing protection is caused by interference, which is unequal potential interference. This is because the ground inside the fire detection board is not common ground in the grounding net. Although the resistance value of R between them is very small, the current flowing through R is small during the disturbance, and the voltage drop across resistor R is also very low. Small, when the peak is disturbed, the current flowing through R becomes large, then the voltage drop will also be very large, forming a high level, generally speaking >0.7V is considered high voltage, program memory write erase command Generally it is high level. From the schematic diagram of the fire detection, it has a resistance between the program memory and the ground, which is disturbed and the voltage drop on this resistor increases. Although we have adopted some measures, we have not fundamentally solved this problem. Therefore, during this period, we should focus on checking the set value of the fire inspection table, and find that abnormal phenomena are dealt with in a timely manner. At the same time, check the shield cable grounding of the probe cable after the furnace shutdown. Normally, one end should be grounded, and the resistance of the amplifier to the ground should be less than 1Ω.

4.2.3 The sudden load of a DEH system in a factory can only rise and cannot be lowered. After checking the failure of the DCS control load, the relay fails. There are many DCS automatic adjustment systems and protection systems that use relays for control or isolation. Because relay reliability is the lowest among all devices, the number is relatively large. Suggest:

4.2.3.1 During the overhaul of the unit, check the relays carefully and find that the indicators should be replaced in a timely manner.

4.2.3.2 For automatic circuits using digital inputs, the relays should be replaced periodically.

4.2.3.3 The relay used for protection shall be replaced periodically according to the number of actions.

4.2.3.4 According to the importance of protection and automatic loop circuits, and the frequency of control loop actions, the quality requirements for procurement relays should be differentiated, and the relationship between cost and system reliability should be rationally distributed.

4.2.4 There are two problems that currently affect the safe operation of DEH:

4.2.4.1 Xinhua's DEH control system ±5V, ±24V power supply device surface temperature is too high (up to about 50 °C) affect the electrical characteristics of the components, affect the life of the components, system reliability greatly reduced, the power supply device heating is mainly The reason is that the design is irrational. At present, we have adopted certain measures to directly blow the power supply device with an electric fan to force cooling and achieve prevention. In the design, two redundant power supply devices should be separated, leaving a gap in the middle to allow heat to dissipate and to prevent the device steps from affecting heat dissipation.

4.2.4.2 Speed ​​control valve vibration Servo valve of large speed control valve, feedback LVOT are all installed on the speed control valve, long-term high-frequency vibration causes circuit wear and short circuit, terminal block screw loose contact, etc. If the phenomenon is slight, the valve position is not indicated, the speed control valve is closed to reduce the load, and if it is heavy, the valve is shut down. We take some measures for the actual situation of the equipment, such as adding a sponge in the junction box to slow down the wear of the circuit on the vibration and tighten the screw once a week.

There are internal and external factors that affect the reliability of the DCS system. The internal factors are mainly considered by the production family. However, we must grasp the concept of accidental failure. The external factors are resolved by our maintenance and repair personnel. The composition of a set of systems is more complicated. We should carefully analyze the factors that affect the reliability of the system, grasp the key points, and resolve the key points. This is an important way to improve the reliability of the system. Under different conditions, the same factors affecting the reliability of the same equipment and the same system will not be the same. This requires us to carefully analyze and treat them differently.

V. Ways to Improve System Reliability In the above discussion, we have actually proposed a way to solve system reliability, how to check the factors that affect the reliability of the system, and control, and the other is to increase the MTTR. There are two aspects to improving the average MTTR repair.

5.1 Maintenance personnel to improve the familiarity of the system For users, if the system is more familiar with the technology used, the use of the maintenance is not easy to misunderstand, it is not easy to cause misuse, and thus improve the reliability of the system. This is actually a requirement for our technical level. After the transformation of the unit DCS, thermal control equipment technology has made a qualitative leap. After a period of learning, we have greatly improved our familiarity and understanding of DCS. If we further improve our familiarity with DCS through various methods, it is of great significance to improve the reliability of DCS.

5.2 Spare Parts In the above discussion, we emphasized that we should try to prevent the system from failing. No matter how much we take to ensure that the system is 100% faultless, it is impossible and impractical. Starting from the facts and seeking truth, we should acknowledge the possibility of system failure. In the event of a failure, we should eliminate the failure in the shortest possible time to ensure the safe and reliable operation of the system. To eliminate the fault, first of all, our personnel should have the appropriate technical level, can find the fault point, identify the cause of the fault, and our personnel have the ability to repair. As far as the DCS system is concerned, many equipment repairs are difficult unless there is a dedicated inspection system. From this point of view, we require that we have corresponding spare parts and spare parts.

VI. Conclusion Through the analysis of DCS reliability, there are many specific tasks that need to be done. As long as we carefully analyze the important factors that affect the reliability of the system and formulate corresponding preventive measures, it will greatly help improve the reliability of the equipment.