How-To Manually Test Probes

This forum supports the ESX Host Health Monitor plugin. When posting post screenshots of issues and any script and command logs listed in the probe consoles.
User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

How-To Manually Test Probes

Post by Cubert »

To manually test a probe to see if it is returning accurate CIM data from an ESX host. At the "agent" that was setup as a probe execute at the command line dos window ->
CODE:

Code: Select all

C:\Python3\Python.exe -W ignore C:\Python3\check_esxi_hardware.py -v -H hostIP -U root -P "ThePassword" 
This should spit out the current status and the hardware build the system is reporting

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

HI Cubert,

Thanks for the tip and sorry for the delayed response. I was looking at the raw output vs what's shown in the CIM data viewer in theVMWare ESX Health Monitor and it does not match.
It looks to me that not all raw data is making it into the system.
Is there anything we can do to control that?

I've attached a zip file with the raw output.

Regards,

Jeroen
Attachments
check_esxi_hardware - raw output.zip
(2.4 KiB) Downloaded 156 times

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

I will have a look at this, can you also post the CIM data from the viewer so I can see what's missing?

You should be getting everything that has a accompanying "OpStatus" code. So if a Element Name has no Op Status entry it will be skipped.


An example is this HP data has a battery element name at the bottom of output. but it does not provide a "good, bad or ugly" status so there is no point in listing an item that does not generate a current operational status code.

Otherwise it should be in there...

Anyhow, I am in the direct path of Hurricane Ian so will be going offline today until it passes. May be with out power several days so If you do not hear back in 48 hours assume I am powerless to stop it... Ha ha a pun!

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

Hi Cubert,

Sorry, I posted my reply in the wrong post as I had created a new post viewtopic.php?t=6117 and there you can find the details.

Should I move my previous message to the post I already created?

Thanks,

Jeroen

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

Ah, That's what was missing.

I was waiting for some more information as to what your issue was and didn't tie the two posts together.


Ok So your showing a battery bad and a raid controller issue.

Lets look at the CIM output and in the data we look for 2 data points that need to be with in the same element.
  • "Element Name =" - Name of hardware item
  • "Element Op Status =" - status of hardware item (0 and 2 are considered healthy in all systems but HP also uses 5)

Most hardware follows a common default CIM status code set. HP on the other hand did not. HP used the status of 5 as a "Good" code as well among other differences.


You show a HP ProLiant
Screenshot 2022-10-03 084409.png
Screenshot 2022-10-03 084409.png (15.01 KiB) Viewed 13957 times


Taking the first of the two error reports you show in your example.

The Battery Bad Issue

The CIM Data returned for that is
20220927 17:23:21 Element Name = Battery 1 42-SuperCAP Max
20220927 17:23:21 sensorType = 2 - Temperature
20220927 17:23:21 BaseUnits = 2
20220927 17:23:21 Scaled by = 0.010000
20220927 17:23:21 Current Reading = 29.000000
20220927 17:23:21 Upper Threshold Critical = 65.000000
20220927 17:23:21 Element Op Status = 2

The element Battery 1 42-SuperCAP Max shows a status of 2 which as for the HP types list is an "OK". Now this is a status of the time of the scan and as such the status of the element may had changed. ESX maybe keeping a cache of or history I'm not aware of. I can't say so you would need to know your hardware and maybe preform some system tests to confirm failures.

As for the HP Smart Array, the logs do not show a status for that element.
20220927 17:23:20 Element Name = Add-in Card 11:3
20220927 17:23:20 Element Op Status = 0
20220927 17:23:20 Element Name = Hardware Management Controller (Node 0)
20220927 17:23:20 Element Op Status = 0
20220927 17:23:20 Element Name = HP Smart Array P420i Controller : Embedded : HPSA1
20220927 17:23:20 Check classe CIM_NumericSensor
20220927 17:23:21 Element Name = System Board 10 Power Meter
20220927 17:23:21 sensorType = 4 - Current
20220927 17:23:21 BaseUnits = 7
20220927 17:23:21 Scaled by = 0.010000
20220927 17:23:21 Current Reading = 90.000000
20220927 17:23:21 Element Op Status = 2

Element Name = HP Smart Array P420i Controller : Embedded : HPSA1 is followed by another Element Name but returns no Element status.

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

Hi Cubert,

Thanks for the detailed response.

SuperCAP Max is not the same as battery status so it looks like there is no CIM data for the battery status so that's the same as for the HP Smart Array P420i Controller : Embedded : HPSA1 Element.

What can be done to make sure the informaiton is showing in the CIM data?
Would that be a bug in ESXi or something on the HP side (i.e. HP driver)?

Thanks,
J.

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

Actually I would think it's most likely some thing in the following script.


check_esxi_hardware (formerly known as check_esx_wbem) is an open source monitoring plugin to monitor the hardware of ESXi (and previously ESX) servers. It queries the CIM (Common Information Model) server running on the ESXi server to retrieve the current status of all discovered hardware parts. The plugin can also be used as standalone script to check the hardware.


https://www.claudiokuenzler.com/monitor ... rdware.php


We pull the script from here if missing in the c:\python3x directory You can try removing this file an having it update the file with the current one to see if any changes. If not most likely on the current version.

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

Thanks for the quick response.

I checked the script that we are using and it is the latest version.
I discovered when playing with the different parameter in the script that when I specify the --vendor hp parameter it actually does give me the battry status (and some other ones as well) but when I leave it default it doesn't show.

with --vendor hp parameter
img2.png
img2.png (19.25 KiB) Viewed 13867 times
without --vendor hp parameter
img3.png
img3.png (8.04 KiB) Viewed 13867 times
I checked the ESXi health monitor plug-in configuration and there I am specifying the vendor explicitly but I don't see the same Battery status element in the CIM data view other then the SuperCAP but that is not the same.
img1.png
img1.png (19.77 KiB) Viewed 13867 times
Any ideas what we can do to fix that?

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

Yes that is accurate. This is why we ask the vender hardware during a probe setup. send that to the scanner during or scan requests.

So you should be getting that in your logs.

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

I've played with the ESXi host settings and found that when I set the vendor to auto I get the battery CIM data and it shows status failed but the ESXi health monitor shows it as okay.
Screenshot 2022-10-13 122024.png
Screenshot 2022-10-13 122024.png (13.4 KiB) Viewed 13786 times
What can be done to change that?

Post Reply

Return to “VMWare ESX Host Health Monitor”