ATX PSU Testing and troubleshooting

PSUs dont always *just* die and they can cause no end of issues. Just thought I’d pos tthe test procedure we use.

ALWAYS work with your bench/work area protected with RCDs

Learn (if you dont know) how to use a multimeter and or a scope. Cheap but good enough for this, portable scopes abound on ebay for under £100 if not less. You are lookign for the presence of something, not measuring it on the scope. I have a nice Metex portable multimeter with a DMM that goes everywhere.

Until you have tested the first two assume this machine is dangerous, Avoid touching exposed metal.

If you can, PAT test the PSU and its lead. If it fails isolate what fails, lead or PSU and replace. Dont even try and sort a PSU that cant pass a PAT test (There are brands that repeadly fail PAT testing when done with a decent tester even new)

If you cant PAT test check for continuity between the metal PC case and the screws on the socket at the wall. you are looking for nothing more than a few (<10) ohms. If it is significantly higher you need to find out where the earth issue lies before going on. Surge arrestors can be a frequent cause especially if they have been zapped.

WIth your meter on AC voltage plug just the mains in everything on and if fitted turn the PSU power switch on the back on. Meter between bare metal on the case and the screws of the wall socket its plugged into.

Up to a few volts is fine. Over about 10 you may have a minor problem, over 30 you have a serious issue. If you plug the monitor in and the voltage vanishes the PSU is becoming leaky and the monitor (if its earthed) has grounded it out. This can be the cause of frequent monitor failures. It also means you arent using your PAT tester properly if it passed 🙂

With Power off and disconnected from the wall (The PC not everything else, leave anything else plugged in for this)
check that all rails are zero. any voltage on +5V or +%v Standby indicated a faulty USB device, many powered USB devices dont stop their power supplies tryin to power the bus. This isnt really a good idea and can cause boot and power/no post or not power faults. Evo Labs I’m looking at you here. Disconnect things until the voltage goes away, bin the responsible item (or meter pins 1/4 on the devices USB port to check then bin)

Meter all voltages while off with mains on. Check +5V USB/Stby is in tolerance, what you cann in tolerance is down to you but we will reject below 4.7. Make sure everything else is off. more than 200mV on any supply line while off is a reject. It points at control IC issues or a breakdown happening in the transformer. Voltage on the +5V rail may indicate the presence of USB devices that arent playing ball, see above.

Any issues at this point will cause power/no post, intermittant post or a dead machine (often with a faintly glowing power LED)

Check the +5 USB/Standby is clean with the scope. It *should* appear as a flat line. If it doesnt check your earths, expecially the meter earth lead. If you know how to read the display properly artifacts at 50Hz are normally pickup by the meter but could indicate a promary (main side) issue. ANy higher frequeny spikes above a few mV mean the PSU is failing and in particular the filter caps are probobly on the way out or its badly overloaded. Disconnect all but the mains lead and check again. If the spikes have gone work backwards plugging in things till they reappear. USB Novelties are the worst offendores here along with cheap hubs that may be overloaded.

Again any issues at this point will cause power/no post, intermittant post or a dead machine (often with a faintly glowing power LED) but a noisy %V rail can cause the machine to randomly lock up and in some cases turn itself on.

Noisy rails seem to be an issue with Evo Lab and clones, JueJye, earlier Enlight, and Acer’s own. The normal cause is low quality control, removal of filter components to reduce cost or falling victim to capacitor plague.

Power the machine up and work through the rails. Voltage first, then scope. Typical values are available at https://en.wikipedia.org/wiki/ATX but rule of thumb is there is 10% allowable tolerence on all rails. If you start seeing more than one rail badly out then you have an unhapy or overloaded PSU

The spikes on the rails should be as low as possible and they should be regular if at all. Excessive spikes (also called ripple) indcate a rail is way overloaded or its filters are failing. Excesive ripple will cause just about any random fault you can imagine from making ram seem faulty to cooking off grahics cards. High ripple is very very hard on the PC and will cause failures and can cause severe damage especially if the PSU fails.

Irregular spikes can indicate the PSU is close to its limit or something is going into protection somewhere. A few isnt much to worry abou but if these spikes co-incide with a lockup or crash they need to be linvestigated. You may find if you look closeley you’ll see the spikes on the monitor as flickers or faint bars. The latter typically points at filter caps on the board or graphics card having gone bad. A quick visual inspection will normaly find bulged or ruptured caps.

All of these issues can be present on a working PSU, if you dont go looking for them you’ll never find them and you can end up chasing your tail for hours. A simple substitution is always a good way to start a fault finding mission as PSU faults can show up anywhere.

An overloaded PSU will fail, and dont assume that because you are using 450W and you have a 500W PSU you are safe:
1) The PSU may not be capable of 450W, often lower end PSUs quote a peak power fugure that they can only breifly sustain.
2) Your 450W may ot be their 450W. Take all the rails listed on the side with currents and manualy convert them to watts EG 12V*16A = 192W. You will often find on lower end PSUs that the power doesnt add up when you sum them or that all the power is on +3.3V or +5V when its needed on +3.3V AND +12V
3) With the above info and looking at product data work out the power consumption on a PER RAIL basis, you may find quickly you dont have enough power
4) Give yourself headroom. 450W load on a 500W PSU is too close. You dont know how accurate those figures are and as systems age they can draw more power as caps dry out, fans start to stiffen up etc.

When you buy one:
1) Check the figures as per above. Add them up and look at where the power is.
2) If its split rail check continuity between the 12V lines with your meter and the PSU on the bench disconnected. You should fine two or more distinct 12V lines. Normally motherboard and Drive then GFX and ATX12V. If you get 0 ohms on all 12V lines then there is no split rail.
3) Look at the wiring. The wires should be substantial but not too thick. They certainly shouldnt be like bell or phone wire. Thinner wires result in more losses in voltage.

On the wiring you need to watch things here closley. Thinner wires means (as a rule) more resistance and this leads to something called Joule heating. Under a constant load this will reach a fixed point and wont get any worse. However as voltage at the device end drops more current is drawn, this causes a greater voltage drop and more Joule heating and things will either result in the device malfunctioning and shutting down OR catastrophic failure of the wire or device, either fire or a failure at a connector (this is what melts the SATA adaptors). If the device periodicaly and breifly demands high powers then this can cause voltage drop outs and other issues without the burning. Biggest culprit for this is hard drives on the end of a long chain of devices. This will reslut in the drive rebooting, data loss and eventually the drive failing.

Well thats annoying – Pi I2C Woes & HAL

Well thats not quite what I said when I found my bug about 15 mins ago but same sort of general meaning.

We are using the Pi in more and more things and in order to make it easier i’ve been writing our own HAL to make using the GPIO and onboard perhipherals easy to use, wrapping up things that others kinda did, adding new bits and dropping in native support for our hardware. Its been going well hen a few weeks back I hit a reoad block or two.

The first is repeated start, the long answer is that using the IO wrappers for I2c you cant do it. Looking deeper into it the Python lib fudges it too by doing a bulk read so no biggie there. Its no fixable, easilly worked around and in many ways a bulk read is a nicer way to do it.

The second was driving me nuts. I couldnt write a register. I could select it, but the perhiperal always sent 0. After a few hours of head scratching I had to get on with other work and its only today I’ve been able to get back to it.

The long and short of it is that I was trying to be too clever and not thinking about it in a *nix manner. I was opening the bus and then assuming that I could move addresses round as I needed to talk to things, turns out that doesnt work (It may be tied in with the above too). Changing the target address once you’ve set it seems to bugger up the driver and it then sits and sulks. I’m not sure why this should be so but it does. Assigning a new file handle for each device (FPOpen()) and then setting the address works so it looks like thats the way I need to do this. The upshot is that the whole I2C section of the HAL (which to be fair wasnt written) needs redoing. It also points to some issues that were going on with the IMU on the Tezero platform too so its another point in the argument to moving Tezero onto the HAL.

So what does work?
Well GPIO mapping, control and I/O all works a treat. The HAL will map out SPI/UART/I2C pins if the perhipheral is in use. Getting hold of interrupts is hard in Lazarus without scripting bits of Python to do it. The HAL now fudges it by using a thread to monitor the pins. As you assign interrupts to a pin no interrupts = no CPU time used. With all available GPIOs running with interrupts the CPU load doesnt register so thats good. There is a FIFO system in there so a long IRQ handler wont hang the queue.

And to go?
Support ofr specific perhipherals is next. How far I’ll go is uncertain but as the HAL is going down in the Touchdown kits I may go as far as adding specific I2C device support. What is definately going in is support for things like thumbwheels and our TRNet bus system on UART0

Can I have a copy?
Not yet, I’m still testing, but soon 🙂 When its about it’ll be released on http://forum.tswr.co.uk