# Estimation of Failure State of Buck Voltage Regulator Gadila Prashanth Reddy, Rangaiah L, Justin Khoo, Rishab Mukherjee, Srinivasan, Srikanth Kaniyanoor Abstract— Failure Mode Effect and Diagnostic Analysis is typical way to define Failure in time for a given design by performing fault analysis on each element of the design. However it may not always accurately determine the erroneous state for a self-correcting designs. An example of self-correcting designs is a buck voltage regulator in which output and required voltage are continuously compared to achieve desired output voltage. This paper assess the true failure state of a buck regulator by performing FMEDA in detail and with reasonable failure probability on each element a MARKOV state model is applied to estimate true failure state of buck VR. Keywords - FMEDA, MARKOV model, Buck VR ## I. INTRODUCTION A voltage regulator is used to maintain a constant voltage level with respect to the PWM duty cycle and the switching nodes for a given input. Inside a SOC different IPs require different voltages and further each IPs require variable voltages based on the power management states. This demand for variable voltage by the different IPs is being satisfied by the voltage regulators. Based on the variable VID values demanded by the power management unit, respective PWM will be generated which is feeded to the switching nodes which in turn will lead to the generation of the demanded voltage level. Further providing feedback to the comparator of PWM generator helps in improving the output from the voltage regulators. The power management chip from the SOC sends data to the controller for the required voltage via I2C buses which consecutively consists of data and clock cycles. Analyzing each and every components of the voltage regulator will lead to determine the failure states of the regulator. Though actual failure state might then only be considered when the voltage regulator gives a particular output which is less than that of Vmin. Failure modes and effects analysis will lead us to the calculation of failure in time but will not give us any record of the time of being in that state or the likelihood of being in that state. Hence our aim is to implement a MARKOV model in order to overcome the disadvantages of the failure modes and effects analysis and to identify the probability of being in an erroneous or a safe state. # Revised Manuscript Received on August 19, 2019. **GadilaPrashanth Reddy**, Research scholar, RajaRajeshwari college of Engineering, VTU, (prashanth.trr@gmail.com) Rangaiah L,RajaRajeswari college of Engineering, (rleburu@gmail.com) Justin Khoo, Intel Corporation Penang, (justin.khoo@intel.com) Rishab Mukherjee, Bengal Institue of technology, (rishab19aug@gmail.com) **Srinivasan**, Srikanth Kaniyanoor, Intel Technology Pvt Ltd, (srikanth.kaniyanoor.srinivasan@intel.com) # II. POWER SUPPLY FAILURES AND DIAGNOSYS A power supply is the basic building block of any electronic circuit. A failure in power supply will severely affect the circuit which is under consideration. A power supply can fail due to the failure in source or in the transmission line. In PCB stuck at 1 faults occur due to photolithographic printing error, conductive particle contamination, incomplete etching and metal polish, crack in insulator or may be a gate oxide defect causing pinhole. Similarly, stuck at 0 which in turn are caused by photolithographic printing error, step coverage, incomplete via, electro migration, silicide agglomeration, incomplete via etch or via foreign material, insulating particle contamination [1] which along with previously discussed stuck at 1 faults lead to failure of the power supply with some constant output or may lead to internal failures. Beside all these systematic failures, few random hardware failures are also included like aging of the device or electrolytic discharge of the components on the PCB which will also lead to stuck at faults. For an IC also separate metal layers might get in touch due to hole through and leads to stuck at zero or stuck at one leading to failures in the ultimate output of the power supply [2]. Along with these transmission failures, oscillation in the passive components like capacitor and inductors leads to glitches in the output of the voltage regulator whereas oscillation in the PWM generator leads to similar output to that of the input regardless of whatever output is required. In order to control this failures over voltage or under voltage is sensed at a quality amount of time before and all the current workings of the circuit to whom the power supply is connected is saved in a nonvolatile memory and a safety power down routine is performed or may be switched to a secondary power unit for continuous execution [3]. # III. VOLTAGE REULATOR Most of latest generation SOCs go into portable devices which works at low power modes. Each IP inside SOC could be Core, Memory subsystem or Internal Fabrics require different voltage at different power state this demands a Variable output Voltage Regulator. This can be achieved by I2C based or SVID based Voltage regulators. Most of the Voltage regulators are Buck type however Boost and Buck-Boost are also used to power Display Back-Screen and eMMC IPs respectively. In this paper we will # **Estimation of Failure State of Buck Voltage Regulator** analyzing Buck voltage regulator to do detailed fault analysis in which each and every sub block of the voltage regulator are identified and analyzed for faults and failure modes Figure 1: SoC Power Distribution Consider a Typical SOC power distribution Map in figure 1. A battery charger gets input from a Wall adapter and Provides Power Souse to all VRs and in Parallel charges battery. In case of absence of wall adapter battery charger will provide source power for VRs through Battery. Based on SOC IPs various VRs are implement to provide different Voltages and Isolations between IPs. **Appropriate** Decoupling caps & Bulk caps are connected to avoid ripples due to switching or PCB noise Each IP demands for different voltage at different power states and it is managed by Power Management Unit - PMU. Before OS executing next Workload (E.g. Burst Transfer) from existing workload (E.g. idle). PMU will interact with each Voltage regulator through I2C or SVID and updated Voltage Identification values to meet demand of each IP block. VRs will regulate to respective voltage values based on VID value and revert backs to PMU once the Voltage is settled to desired value. The main parts of the voltage regulator are being described over here and further analyzation the failure modes of each and every blocks in order to arrive at a safe model has been described. # A. I2C protocol I2C is a bidirectional open collector and open drain line used to connect to devices. It consist of one data and one clock pulse simultaneously in order to exchange information between two devices [4]. | Potential | Time out, Wrong address decoding, | | | |----------------|-------------------------------------|--|--| | Failure Mode | Change of addresses caused by soft- | | | | | errors in the MMU registers, No or | | | | | continuous arbitration | | | | Element | Bus | | | | Classification | | | | | as per IEC | | | | | Potential | Incorrect data or address | | | | Effect(s) | | | | | of Failure | | | | | FMEDA | Incorrect data or address for I2C | | | | Description | controller caused by I2C Bus due to | | | | | Time out, Wrong address decoding, | | | | | Change of addresses caused by soft- | | | | | errors in the MMU registers ,No or | | | | | continuous arbitration in I2C bus. | | | #### B. Register It holds the bit values of a VID output in order to feed to the PWM. | Potential Failure | Stuck-at for data and addresses, Change | | | |--------------------|-----------------------------------------|--|--| | Mode | of information caused by soft-errors | | | | Element | Register | | | | Classification as | | | | | per IEC | | | | | PotentialEffect(s) | Incorrect data or address | | | | of Failure | | | | | FMEDA | Incorrect data or address for DAC | | | | Description | caused by Register due to Stuck-at for | | | | | data and addresses, Change of | | | | | information caused by soft-errors in | | | | | Register. | | | ## C. PWM GENERATOR It segregates the output of the comparator into two phases. | Potential Failure | Stuck at 1, stuck at 0, stuck at on, drift | | | |-------------------|--------------------------------------------|--|--| | Mode | and oscillation | | | | Element | Discrete Hardware | | | | Classification as | | | | | per IEC | | | | | Potential | Incorrect analog output | | | | Effect(s) | | | | | of Failure | | | | | FMEDA | Incorrect analog output for Buffer caused | | | | Description | by PWM Generator due to Stuck at 1, | | | | | stuck at 0, stuck at on, drift and | | | | | oscillation in PWM Generator. | | | # D. Comparator It is an op amp which is actually configured to act like a comparator. In this block the incoming voltage from the PWM is compared with respect to the actual output voltage from the voltage regulator in order to stabilize the required value from the PWM generator to feed to the buffer and hence to the switch nodes. | Potential Failure | Stuck at 1, stuck at 0, stuck at on, drift | | |-------------------|--------------------------------------------|--| | Mode | and oscillation | | | Element | Discrete Hardware | | | Classification as | | | | per IEC | | | | Potential | Incorrect analog output | | | Effect(s) | | | | of Failure | | | | FMEDA | Incorrect analog output for PWM | | | Description | generator caused by Comparator due to | | | | Stuck at 1, stuck at 0, stuck at on, drift | | | | and oscillation in Comparator. | | | | | | ## E. Buffer It is mainly used to provide sufficient drive capability to pass the voltage to the next stage of the circuit i.e. to the input of the switching gates. | Potential Failure | Stuck at 1, stuck at 0, stuck at on, drift | | | |-------------------|--------------------------------------------|--|--| | Mode | and oscillation | | | | Element | Discrete Hardware | | | | Classification as | | | | | per IEC | | | | | Potential | Incorrect analog output | | | | Effect(s) | | | | | of Failure | | | | | FMEDA | Incorrect analog output for Switching | | | | Description | nodes caused by Buffer due to Stuck at | | | | | 1, stuck at 0, stuck at on, drift and | | | | | oscillation in Buffer. | | | # F. Switching nodes The two MOSFETs in this case act like switches. While the PMOS is ON the current through inductor ramps up and charges the capacitor. But the output voltage need not be high as the input voltage and thus the PMOS is made OFF. But the current stored in the inductor needs to get out which in turn is facilitated by making the NMOS ON. The faster the switching happens more the smoother the output voltage becomes. | Potential Failure | Stuck at 1, stuck at 0, stuck at on, drift | | | |-------------------|---------------------------------------------|--|--| | Mode | and oscillation | | | | Element | Discrete Hardware | | | | Classification as | | | | | per IEC | | | | | Potential | Incorrect analog output | | | | Effect(s) | | | | | of Failure | | | | | FMEDA | Incorrect analog output for passive | | | | Description | components caused by Switching nodes | | | | | due to Stuck at 1, stuck at 0, stuck at on, | | | | | drift and oscillation in Switching nodes. | | | # G. Passive components The passive components such as the inductor and the capacitor are utilize to store the charge and to remove the additional spikes in the output voltage waveform. | Potential Failure | Stuck at faults | | | |-------------------|--------------------------------------|--|--| | Mode | | | | | Element | Discrete Hardware | | | | Classification as | | | | | per IEC | | | | | Potential | Wrong Voltage | | | | Effect(s) | | | | | of Failure | | | | | FMEDA | Wrong voltage for SOC and reference | | | | Description | circuit caused by Passive Components | | | | | due to Stuck at faults in passive | | | | | components. | | | | | | | | ## H. REFERENCE VOLTAGE CIRCUIT This circuit take the output voltage to the input of the comparator in order to compare with the generated one from the PWM generator and give out the required correct output. | Potential Failure | Stuck at 1, stuck at 0, stuck at on, drift | | | |-------------------------|--------------------------------------------|--|--| | 1 0001111111 1 11111111 | , , , , , , , , , , , , , , , , , , , , | | | | Mode | and oscillation | | | | Element | Discrete Hardware | | | | Classification as | | | | | per IEC | | | | | Potential | Wrong reference voltage to comparator | | | | Effect(s) | | | | | of Failure | | | | | FMEDA | Wrong reference voltage to comparator | | | | Description | caused by reference voltage circuit due to | | | | | Stuck at 1, stuck at 0, stuck at on, drift | | | | | and oscillation in Reference voltage | | | | | circuit. | | | #### I Controller This device controls the entire voltage regulator providing respective supplies to the individual components. | Potential | Internal failures | | |----------------|----------------------------------------|--| | Failure Mode | | | | Element | Discrete Hardware | | | Classification | | | | as per IEC | | | | Potential | Incorrect data or address | | | Effect(s) | | | | of Failure | | | | FMEDA | Incorrect data or address for Register | | | Description | caused by Controller due to internal | | | | failures in Controller. | | Fig 2 # IV. FMEA Failure Mode Effect Analysis [5] is the process of identifying all the failure modes of a design and the effects related with the desired state. Hence defining the possible effects of the respective failures, we will be defining the local and the final effects and will try to figure out the possible causes with their effective solution. Hence we will be able to determine all the local effects from the possible failure causes and determine the possible states of the voltage regulator and thus apply proper mathematical tool in order to calculate that for what particular amount of time the design would be in that state. # V. FAILURE STATES OF VOLTAGE REGULATOR As we can see from the failure modes, there is one zone which can lead to a dangerous undetected fault. If the voltage is in the critical region, the processor could be operating at the absolute minimum operating voltage there by leading to an unpredictable behavior. In order to model the probability of the voltage regulator to operate in this region. We can represent it using a state machine and identifying the probabilities of the state transitions to and from different states. Markov model is a stochastic model used to model randomly changing systems, where the future state depends only on the previous state. This assumption is true in our case because the transition from an operational state to the fail safe state is dependent on the voltage dropping below the critical voltage. #### VI. MARKOV MODELLING & RESULTS A Markov model is depicted as a state machine where the transitions from the source to destination is represented by an arrow. As shown in figure 4, the transition from the State ON to the state voltage $<\!V_{Min}$ has a probability of 50%. Remaining in the same state is represented by drawing an arrow to the same state. From the state machine, we can identify the states that are lead to a safe fault and the states that lead to a dangerous fault that can potentially violate the safety goal. The state transition matrix is determined based on the number of states present in the Markov model. The model helps in identifying the likelihood of the voltage regulator falling in the fail dangerous state there by violating the safety goal. Figure 4: Markov model The state transition matrix for the given state machine is given below $$T = \begin{pmatrix} 0.6 & 0.2 & 0.2 & 0 \\ 0.20.35 & 0 & 0.45 \\ 0.5 & 0 & 0.4 & 0.1 \\ 0 & 0 & 0 & 0.1 \end{pmatrix}$$ $$I * T = A = \begin{pmatrix} 0.4 & -0.2 - 0.2 & 0 \\ -0.2 & 0.65 & 0 & -0.45 \\ -0.5 & 0 & 0.6 & -0.1 \\ 0 & 0 & 0 & 0.9 \end{pmatrix}$$ $$A^{-1} = \begin{pmatrix} 1 & 1 & 1 & 1 \\ -0.2 & 0.65 & 0 & 0 \\ -0.2 & 0 & 0.6 & 0 \\ 0 & -0.45 - 0.10.9 \end{pmatrix}$$ Solving the matrix to obtain the probabilities for each state we get $$\begin{pmatrix} ON \\ VCRIT \\ VMIN \\ FS \end{pmatrix} = \begin{pmatrix} 0.5459 & -1.25972 - 1.01089 - 0.60653 \\ 0.1680 & 1.150855 - 0.31104 - 0.18663 \\ 0.18196 & -0.419911.329705 - 0.20218 \\ 0.1041990.528771 - 0.007780.995334 \end{pmatrix}$$ $$* \begin{pmatrix} 0.545879 \\ 0.167963 \\ 0.18196 \\ 0.104199 \end{pmatrix}$$ From the given calculation we see that the likelihood of the voltage regulator operating in a dangerous condition is 16%. # VII. CONCLUSION A study was done taking off the shelf voltage regulators and the FIT numbers were obtained from the supplier The diagnostic coverage was estimated using the Markov model as described in this paper and was compared against the diagnostic coverage of 60% and 90% as prescribed by the standards. The results are shown in Table I The residual FIT obtained using Markov model was 0.16 FIT which corresponds to 84% DC. | Voltage | FIT | Residual | Residual | |-----------|-------------|----------|----------| | regulator | provided by | FIT with | FIT with | | | vendor | 60% DC | 90% DC | | 1 | 0.1 | 0.04 | 0.001 | | 2 | 0.2 | 0.08 | 0.002 | | 3 | 2.6 | 1.04 | 0.026 | | 4 | 1.2 | 0.48 | 0.021 | Table I FIT estimation using Markov model # VIII. SUMMARY This paper describes a novel method of computing the residual dangerous FIT through analysis. This approach can reduce the overall cost of the product by reducing the safety mechanisms that are implemented considering a pessimistic approach. Before recommending either hardware of software safety mechanisms to do a risk reduction, an analysis can be done using Markov model to identify the potential residual FIT without any safety mechanism. If the residual FIT is sufficiently low such that it does not affect the overall PFDAvg calculation, then we could avoid recommending any additional hardware or software safety mechanisms ## IX. ACKNOWLEDGMENT This will be added later. #### REFERENCES - Advances in Electronic Testing Challenges and Methodologies, Springer, Dimitris Gizopoulos. - CMOS IC Stuck-Open Fault Electrical Effects and Design Considerations, Jerry M Soden, R. Keith Treece, Michael R. Taylor and Charles F Hawkins. - 3. IEC 61508, Part 7, Annexture A, Table A.8. - 4. Karthik Hemmanur, Inter Integrated Circuits, 2009. - DH Stamatis 2003, Failure Mode And Effect Analysis: FMEA from theory to execution. - William M. Goble, Control Systems Safety Evaluation and Reliability. - Part no. TPS566250, Technical Documentation, TEXAS INSTRUMENTS - Billinton, R. and Allsn, R.N. Reliability Evaluation of Engineering Systems: Concepts and Techniques.NY: Plenum Press, 1983.