Root Cause Analysis / Defect Elimination Guide
Vapor bubble collapse causing pitting, noise, and reduced flow. Caused by insufficient NPSH, high suction lift, or restrictions.
Leakage at mechanical seal due to wear, misalignment, dry running, or improper flush. Often secondary to other issues.
Erosion or corrosion of impeller vanes causing imbalance, reduced head/flow. Common with abrasive or corrosive fluids.
Spalling, pitting, or seizure. Caused by misalignment, lubrication issues, contamination, or overload.
Angular or offset misalignment between pump and driver. Causes high vibration at 1X and 2X, coupling wear, seal damage.
Operation without fluid causing rapid seal and bearing damage. High motor current drop indicates loss of load.
Degradation of stator insulation due to heat, contamination, or voltage stress. Leads to ground faults or phase-to-phase shorts.
Most common motor failure. Caused by lubrication issues, contamination, shaft currents, or misalignment.
Cracked or broken rotor bars causing slip-related sidebands and current signature changes. Causes efficiency loss.
Shaft voltage discharge through bearings causing fluting/pitting. Requires shaft grounding or insulated bearings.
Uneven mass distribution causing 1X vibration. From manufacturing defects, buildup, or coupling issues.
Excessive temperature from overload, voltage imbalance, poor cooling, or high ambient. Accelerates insulation aging.
Subsurface crack propagation leading to material flaking. Normal wear-out mode with identifiable bearing frequencies (BPFO, BPFI, BSF, FTF).
Insufficient, excessive, contaminated, or wrong lubricant. Causes metal-to-metal contact, heat generation, and rapid wear.
Ingress of particles, water, or process fluids causing abrasive wear and surface damage. Often from poor sealing.
Wear marks from small oscillations while stationary. Common in standby equipment exposed to external vibration.
Uneven load distribution from shaft misalignment causing localized wear patterns and reduced life.
Washboard pattern on raceways from electrical discharge. VFD-related or poor grounding. Distinct high-frequency noise.
Progressive wear from normal operation, misalignment, or contamination. Shows gear mesh frequencies and harmonics.
Fracture from fatigue, overload, or impact. Strong 1X component with sidebands at gear mesh frequency.
Surface fatigue on tooth faces. Early pitting may self-heal; advanced pitting leads to tooth failure.
Oil degradation, contamination, or wrong viscosity causing inadequate film and accelerated wear.
Input/output shaft misalignment causing uneven tooth loading, high vibration, and bearing stress.
| Category | Examples | Primary Detection | Secondary Detection | Root Cause Areas |
|---|---|---|---|---|
| Mechanical | Misalignment, imbalance, looseness, resonance | Vibration Analysis | Thermal, Ultrasound | Installation, foundation, coupling |
| Electrical | Insulation breakdown, rotor defects, connections | Motor Testing (MCE/ESA) | Thermal, Current | Power quality, VFD, overload |
| Lubrication | Under/over lubrication, contamination, wrong type | Ultrasound, Oil Analysis | Thermal, Vibration | Procedures, training, sealing |
| Process | Cavitation, dry running, off-design operation | Process Parameters | Vibration, Thermal | Operation, design, control |
| Wear | Fatigue, erosion, corrosion, abrasion | Vibration, Oil Analysis | Visual, NDT | Material, environment, age |
Earliest detection of bearing wear, lubrication issues. Particle counts, wear metals, ultrasonic dB levels.
Detects bearing defect frequencies, imbalance, misalignment. Requires baseline and trending.
Hot spots visible when significant friction/resistance present. Electrical connections, bearings, motors.
Obvious symptoms - grinding, squealing, smoke, visible damage. Failure often imminent.
Task Interval โค P-F Interval รท 2
This ensures you catch the failure between inspections. Example: If P-F = 3 months, inspect at least every 6 weeks.
| Failure Mode | P-F Interval | Detection Technology |
|---|---|---|
| Bearing fatigue (spalling) | 1-9 months | Vibration, Ultrasound, Oil |
| Lubrication breakdown | 1-6 months | Ultrasound, Oil Analysis |
| Gear tooth wear | 1-6 months | Vibration, Oil Analysis |
| Pump cavitation | 1-3 months | Vibration, Ultrasound |
| Motor insulation | 6-12 months | MCE, Megger, PD |
| Belt wear | 1-3 months | Visual, Vibration |
| Coupling wear | 1-4 months | Vibration, Visual |
| Seal degradation | 2-8 weeks | Visual, Leak detection |
| Electrical connections | 1-4 weeks | Thermal, Resistance |
| Structural fatigue | Months-Years | NDT, Visual inspection |
Ask "Why?" repeatedly (typically 5 times) to drill down from symptom to root cause. Simple but effective for straightforward failures.
Categorize potential causes into standard groups. Use the 6 M's for manufacturing/maintenance analysis.
Top-down deductive analysis starting from the undesired event (top event) and working down to identify all possible causes using logic gates.
Research by Winston Ledet shows that failures originate from three source categories:
"Careless" doesn't mean irresponsible - it means not providing the care the equipment needs. This includes design engineers specifying wrong materials, maintenance engineers omitting failure modes from PM programs, or managers not acting on performance trends. These are controllable!
Select equipment type to generate a customized defect elimination checklist for inspections and walkdowns.
Use Pareto analysis on downtime, maintenance cost, and failure frequency. 5-10% of equipment typically causes 80% of losses.
For significant failures, use 5 Whys, Fishbone, or FTA to find root causes. Focus on controllable causes.
Create SMART actions addressing root causes. Consider design changes, procedure updates, training, and PM optimization.
Enter actions into CMMS. Track completion and verify effectiveness at 6, 12, and 24 months.
Share lessons learned. Update procedures and training. The goal is "Fix Forever, Not Forever Fixing."
| Category | What to Look For | Detection Method | Typical Causes |
|---|---|---|---|
| Looseness | Loose bolts, guards, covers, foundation | Visual, vibration, touch | Vibration, improper torque, thermal cycling |
| Leakage | Oil, grease, water, process fluid leaks | Visual, UV dye, ultrasound | Seal wear, gasket failure, over-lubrication |
| Contamination | Dirt, debris, water ingress, product buildup | Visual, oil analysis | Poor sealing, housekeeping, environment |
| Corrosion | Rust, pitting, discoloration, scaling | Visual, UT thickness | Environment, chemical exposure, coating failure |
| Wear | Belts, couplings, guards, seals showing wear | Visual, measurement | Normal operation, misalignment, contamination |
| Heat | Hot spots on motors, bearings, connections | Thermal imaging, touch | Friction, electrical resistance, overload |
| Noise/Vibration | Unusual sounds, excessive vibration | Listen, feel, instruments | Imbalance, misalignment, looseness, wear |
| Documentation | Missing labels, outdated procedures, P&IDs | Review, audit | MOC gaps, poor practices |
"Bad Actors" are equipment that consistently underperforms or causes repeated failures. Typically 5-10% of equipment causes 80% of losses.
Analyze your CMMS data to find equipment with:
A "bad actor" pump may actually be a victim of upstream process issues, poor installation, or inadequate procedures. Always investigate the root cause before labeling equipment as problematic.
| Step | Action | Owner | Timeline | Deliverable |
|---|---|---|---|---|
| 1 | Extract maintenance data from CMMS (12-24 months) | Reliability Engineer | Week 1 | Data export, Pareto chart |
| 2 | Identify top 10 bad actors by cost/downtime | Reliability Engineer | Week 1 | Prioritized equipment list |
| 3 | Form cross-functional team for top 3 | Maintenance Manager | Week 2 | Team charter, meeting schedule |
| 4 | Conduct detailed failure analysis (RCA) | RCA Team | Weeks 2-4 | RCA reports with root causes |
| 5 | Develop corrective actions with cost/benefit | RCA Team | Week 4 | Action plan with ROI |
| 6 | Present and approve recommendations | Management | Week 5 | Approved action plan |
| 7 | Implement changes (CMMS, procedures, design) | Assigned owners | Weeks 6-12 | Completed actions |
| 8 | Monitor and verify effectiveness | Reliability Engineer | 6, 12, 24 months | Performance tracking report |
Research by Winston Ledet (DuPont/Manufacturing Game) shows:
Best-performing sites achieve >98% uptime through the combination of effective planning/scheduling, preventive/predictive maintenance, and defect elimination. You cannot achieve world-class reliability without DE!