Jump to content

Some Hardware Diag Help Please


Hogie

Recommended Posts

 

 

Well, Im at the end of the line for what I am able to diag on this server that's causing me problems.

 

 

 

We have Dual Opterons in 1U cases (yes, airflow is good), using RAID1 (Mirrored) drives. We have several of these machines, all exactly the same except their Win2k3 product key.

 

 

 

I have 1 box that we sent into repair to Tyan (Just the mobo) for a sata controller problem around the new year (it crashed for the first time at about 12:30am on Jan 1 believe it or not). I got it back, and the tests that were giving the disk system timeouts ran fine for 4 days without a hitch, so I put the system back into service.

 

 

 

Now it just randomly locks up, even under load of just 1 CSS 100tick box. Ive ran memtest on it, had it running the last week with prime95 (x2, one for each proc), and am testing the disk system again as I type this. The only change I have done is remove it from our cabinet, and bring it home (its on a small UPS here by itself, and plugged into my lan via a secondary NIC for dhcp client).

 

 

 

Now, it wouldnt lockup when I used it to copy files in from fileplanet, or when I created patches for tcadmin, then copied our zip files off it, so I am thinking it isn't a disk system problem (which Tyan was to fix). I also tried to make the box overheat, which I couldn't do, so Im at a loss of what to do next.

Link to comment
Share on other sites

Have you had it lockup on you since you've brought it home? If not and all your tests come back normal I would start thinking it maybe a power issue (dirty power)? Electrical interference (you wouldn't think in a dc but who knows)?

 

 

 

This is the statement thats makes me suspect a power issue <table border="0" align="center" width="90%" cellpadding="3" cellspacing="1"><tr><td class="SmallText">Quote:</td></tr><tr><td class="quote">

its on a small UPS here by itself

</td></tr></table>

 

 

Link to comment
Share on other sites

Ive thought about that too. But why would 1 box (which is connected to the same PDU as the others) be that way? None of the other machines have a problem with the power there. I can only see a reason of it being a bad power supply, but that's hard to come buy locally for a 1U server:\

Link to comment
Share on other sites

I haven't switched out power supplies yet (I haven't even ordered a replacement yet). Ive been too busy installing a security camera system, yelling at dell about my laptop (they made me reinstall xp to help me diag a problem with my wifi card.... I was running linux only on it. I will always have dualboot on boxes I order now with windows if I install linux, this was too much a PITA), and flying to our remote office to help with database issues.

 

 

 

Been a busy week (and weekend!). Hopefully I'll get a new power supply ordered early this week, and it back at the datacenter to test.

Link to comment
Share on other sites

  • 2 weeks later...
  • 2 weeks later...

This last month is like a killer for me. That server being down, the mobo in my laptop dying (everything worked except my mini-pci, which was my wireless), and then last night, the mobo in my desktop died on me while I was sitting here coding. I miss my dual monitors today, it makes me cry...

 

 

 

Just sitting here on my linux laptop, trying to keep up with everything happening.

Link to comment
Share on other sites

  • 1 month later...

Thought I would update this...

 

 

 

It seems that it was bad memory slots. I say this, because Saturday night we lost 4 memory slots in this trouble server.

 

 

 

It has been stable since I moved all the memory into the other cpu's slots (the Tyan 2882 has 8x DIMM slots, 4 for each processor). Now if that was the problem before, I dont know, but I do know it wasn't booting up with any memory in any of the 4 slots that I deemed bad.

Link to comment
Share on other sites

Im trying to get a replacement box for it. But Im not the final say on purchases, and our boss is a bitch when I ask for her credit card. Im about to tell them this server gave up the ghost to get them to really move on it.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue. Terms of Use