How-To Manually Test Probes

This forum supports the ESX Host Health Monitor plugin. When posting post screenshots of issues and any script and command logs listed in the probe consoles.
mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

Thanks for the quick resolution.
I'll be updating the plug-in today.

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

I've updated the plug-in and restarted the database agent but still no error reported after that I ran the SQL query but the status is still showing the same.
Is there anything else I need to do to make it work?

I did notice in the updated SQL quesry that errors 25 and 30 are not listed.
Is that because error code 20 or higher you consider critical?

Battery status is still showing OK
Screenshot 2023-01-30 105800.png
Screenshot 2023-01-30 105800.png (27.3 KiB) Viewed 13814 times

User avatar
Cubert
Posts: 2457
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

Once the plugin is updated in plugin manager you should only need to restart the DBAgent service for it to reload the data in the types table.

Here is the SQL query it would be running

Code: Select all

REPLACE INTO `plugin_p4l_vmware_healthmon_types` VALUES (1,'hp',0,'Unknown','Good'),(2,'hp',1,'Other','Good'),(3,'hp',2,'OK','Good'),(4,'hp',3,'Degraded','Critical'),(5,'hp',4,'Stressed','Critical'),(6,'hp',5,'OK','Good'),(7,'hp',6,'Error','Critical'),(8,'hp',7,'Non-Recoverable Error','Critical'),(9,'hp',8,'Starting','Good'),(10,'hp',9,'Stopping','Warning'),(11,'hp',10,'Stopped','Warning'),(12,'hp',11,'In Service','Good'),(13,'hp',12,'No Contact','Warning'),(14,'hp',13,'Lost Communication','Critical'),(15,'hp',14,'Aborted','Critical'),(16,'hp',15,'Minor Failure','Critical'),(17,'hp',16,'Supporting Entity in Error','Critical'),(18,'hp',17,'Completed','Good'),(19,'hp',18,'Power Mode','Good'),(20,'hp',19,'DTMF Reserved','Good'),(21,'hp',20,'Major Failure','Critical'),(22,'dell',0,'Not Available','Good'),(23,'dell',1,'Other','Good'),(24,'dell',2,'OK','Good'),(25,'dell',3,'Degraded','Critical'),(26,'dell',4,'Stressed','Critical'),(27,'dell',5,'Predictive Failure','Warning'),(28,'dell',6,'Error','Critical'),(29,'dell',7,'Non-Recoverable Error','Critical'),(30,'dell',8,'Starting','Good'),(31,'dell',9,'Stopping','Good'),(32,'dell',10,'Stopped','Good'),(33,'dell',11,'In Service','Good'),(34,'dell',12,'No Contact','Warning'),(35,'dell',13,'Lost Communication','Critical'),(36,'dell',14,'Aborted','Critical'),(37,'dell',15,'Dormant','Good'),(38,'dell',16,'Supporting Entity in Error','Critical'),(39,'dell',17,'Completed','Good'),(40,'dell',18,'Power Mode','Good'),(41,'dell',19,'DTMF Reserved','Good'),(42,'dell',20,'Vender Reserved','Good'),(43,'intel',0,'Not Available','Good'),(44,'intel',1,'Other','Good'),(45,'intel',2,'OK','Good'),(46,'intel',3,'Degraded','Critical'),(47,'intel',4,'Stressed','Critical'),(48,'intel',5,'Predictive Failure','Warning'),(49,'intel',6,'Error','Critical'),(50,'intel',7,'Non-Recoverable Error','Critical'),(51,'intel',8,'Starting','Good'),(52,'intel',9,'Stopping','Warning'),(53,'intel',10,'Stopped','Warning'),(54,'intel',11,'In Service','Good'),(55,'intel',12,'No Contact','Warning'),(56,'intel',13,'Lost Communication','Critical'),(57,'intel',14,'Aborted','Critical'),(58,'intel',15,'Dormant','Good'),(59,'intel',16,'Supporting Entity in Error','Critical'),(60,'intel',17,'Completed','Good'),(61,'intel',18,'Power Mode','Good'),(62,'intel',19,'DTMF Reserved','Good'),(63,'intel',20,'Vender Reserved','Good'),(64,'ibm',0,'Not Available','Good'),(65,'ibm',1,'Other','Good'),(66,'ibm',2,'OK','Good'),(67,'ibm',3,'Degraded','Critical'),(68,'ibm',4,'Stressed','Critical'),(69,'ibm',5,'Predictive Failure','Warning'),(70,'ibm',6,'Error','Critical'),(71,'ibm',7,'Non-Recoverable Error','Critical'),(72,'ibm',8,'Starting','Good'),(73,'ibm',9,'Stopping','Warning'),(74,'ibm',10,'Stopped','Warning'),(75,'ibm',11,'In Service','Good'),(76,'ibm',12,'No Contact','Warning'),(77,'ibm',13,'Lost Communication','Critical'),(78,'ibm',14,'Aborted','Critical'),(79,'ibm',15,'Dormant','Good'),(80,'ibm',16,'Supporting Entity in Error','Critical'),(81,'ibm',17,'Completed','Good'),(82,'ibm',18,'Power Mode','Good'),(83,'ibm',19,'DTMF Reserved','Good'),(84,'ibm',20,'Vender Reserved','Good'),(85,'unknown',0,'Not Available','Good'),(86,'unknown',1,'Other','Good'),(87,'unknown',2,'OK','Good'),(88,'unknown',3,'Degraded','Critical'),(89,'unknown',4,'Stressed','Critical'),(90,'unknown',5,'Predictive Failure','Warning'),(91,'unknown',6,'Error','Critical'),(92,'unknown',7,'Non-Recoverable Error','Critical'),(93,'unknown',8,'Starting','Good'),(94,'unknown',9,'Stopping','Warning'),(95,'unknown',10,'Stopped','Warning'),(96,'unknown',11,'In Service','Good'),(97,'unknown',12,'No Contact','Warning'),(98,'unknown',13,'Lost Communication','Critical'),(99,'unknown',14,'Aborted','Critical'),(100,'unknown',15,'Dormant','Good'),(101,'unknown',16,'Supporting Entity in Error','Critical'),(102,'unknown',17,'Completed','Good'),(103,'unknown',18,'Power Mode','Good'),(104,'unknown',19,'DTMF Reserved','Good'),(105,'unknown',20,'Vender Reserved','Good'),(106,'auto',0,'Not Available','Good'),(107,'auto',1,'Other','Good'),(108,'auto',2,'OK','Good'),(109,'auto',3,'Degraded','Critical'),(110,'auto',4,'Stressed','Critical'),(111,'auto',5,'Predictive Failure','Warning'),(112,'auto',6,'Error','Critical'),(113,'auto',7,'Non-Recoverable Error','Critical'),(114,'auto',8,'Starting','Good'),(115,'auto',9,'Stopping','Warning'),(116,'auto',10,'Stopped','Warning'),(117,'auto',11,'In Service','Good'),(118,'auto',12,'No Contact','Warning'),(119,'auto',13,'Lost Communication','Critical'),(120,'auto',14,'Aborted','Critical'),(121,'auto',15,'Dormant','Good'),(122,'auto',16,'Supporting Entity in Error','Critical'),(123,'auto',17,'Completed','Good'),(124,'auto',18,'Power Mode','Good'),(125,'auto',19,'DTMF Reserved','Good'),(126,'auto',20,'Major Failure','Critical'),(127, 'hp', 30, 'Non-recoverable Error', 'Critical'),(128, 'hp', 25, 'Critical Failure', 'Critical');

If you have SQL access to Automate host then running this command in SQLYog should produce the same results.

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

I did both the DB agent restart and after I ran the query as well but all with the same result.
The query you just sent is the same as the one you sent earlier so I didn't run that one.

What else can we check?

User avatar
Cubert
Posts: 2457
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

Lets do a little SQL and see what we have as the output.

This should produce the list and status codes for each entry. Lets find the failed battery entry and verify that it is the correct code.

Code: Select all

SELECT * FROM plugin_p4l_vmware_healthmon_cimdata a LEFT JOIN plugin_p4l_vmware_healthmon_types t on a.CIM_Value = t.DataValue WHERE a.ProbeID = 'Place Agent ID here that probes ESX' and a.ESXHost = 'Place ESX IP Here' and t.DataType = 'hp'

Next lets see if the Types database has actually updated as it should have.

Code: Select all

SELECT * FROM plugin_p4l_vmware_healthmon_types WHERE DataType = 'hp'

Next lets verify that the probe configuration is set to "hp", If set to any other then we need to verify that types table has that same status code.


plugin_p4l_vmware_healthmon_cimdata holds the CIM raw data for each ESX/Probe combo

plugin_p4l_vmware_healthmon_types holds the vender types to status codes structure.

Together when queried you should get the status code and description from the types table for each status code returned by raw CIM data. We need to see if you are getting that correctly from the queries above.

I suspect that the types table is not updating for some reason so we will see that extra data added is not in table..

Post your query results here so we can see what your returning.

User avatar
Cubert
Posts: 2457
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

Lets do a little SQL and see what we have as the output.

This should produce the list and status codes for each entry. Lets find the failed battery entry and verify that it is the correct code.

Code: Select all

SELECT * FROM plugin_p4l_vmware_healthmon_cimdata a LEFT JOIN plugin_p4l_vmware_healthmon_types t on a.CIM_Value = t.DataValue WHERE a.ProbeID = 'Place Agent ID here that probes ESX' and a.ESXHost = 'Place ESX IP Here' and t.DataType = 'hp'

Next lets see if the Types database has actually updated as it should have.

Code: Select all

SELECT * FROM plugin_p4l_vmware_healthmon_types WHERE DataType = 'hp'

Next lets verify that the probe configuration is set to "hp", If set to any other then we need to verify that types table has that same status code.


plugin_p4l_vmware_healthmon_cimdata holds the CIM raw data for each ESX/Probe combo

plugin_p4l_vmware_healthmon_types holds the vender types to status codes structure.

Together when queried you should get the status code and description from the types table for each status code returned by raw CIM data. We need to see if you are getting that correctly from the queries above.

I suspect that the types table is not updating for some reason so we will see that extra data added is not in table..

Post your query results here so we can see what your returning.

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

Hi Cubert,

Attached are all the results from the database queries, server iLO status, VMware status and Python script.
Outputs.zip
(143.77 KiB) Downloaded 178 times
Thanks,

J.

User avatar
Cubert
Posts: 2457
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

I am see in a few oddities but I would like to see more data first before making any assumptions.

so first, I noticed that the raw CIM text sent to me didn't have a "vender type" sent with command so can I get a you to rerun the test command with the vender flag added.

Code: Select all

C:\Python3\Python.exe -W ignore C:\Python3\check_esxi_hardware.py -v -H -esxi101 -U root -P blank -V hp
Next we need to see if this is the command being send via the script during a normal scan.

So in the script logs / command logs of the agent probe, look into the command logs for an "executing" function that would be calling "C:\Python3\Python.exe -W ignore C:\Python3\check_esxi_hardware.py -v -H -esxi101 -U root -P blank" and getting a return from probe.

We want to look at that command being sent to see if it has the "-V hp" flag anywhere in the command.

The issue I am seeing is that the raw data is:
20230201 11:42:59 Element Name = Battery 1 Megacell Status: Failed
20230201 11:42:59 Element Op Status = 2
Element Op Status = number represents the current status which in your case is "2", so all the HP add ins we did to types table has no bearing on this as 10,15,20,30 as status codes do not match status code 2. 2 is universal for "OK" or "Good". So either the python script is not getting correct vender sent to it so it mis reads CIM data or it is misreading this CIM data.

I do not supply the (check_esxi_hardware.py) python script but I'll have a look at it to see if there is some obvious issues with parsing the event. if the vender ID does not change output when you run the above test.

Post that test data here.

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

I ran the command as you specified and I got a different output this time.

I noticed this now:

20230202 09:44:26 Element Name = HP Smart Array P440ar RAID Controller : Embedded : HPSA1
20230202 09:44:26 Element HealthState = 20
20230202 09:44:26 Element Name = Battery on HPSA1
20230202 09:44:26 Element HealthState = 0

Which matches what I see in VMware but I also see this:

20230202 09:44:25 Element Name = Battery 1 Megacell Status: Failed
20230202 09:44:25 Element HealthState = 5

Which indicates no issue.

I've attached the raw output in a file as well.
RAW output.zip
(2.54 KiB) Downloaded 179 times

User avatar
Cubert
Posts: 2457
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

Is the script also adding the "-V hp" to script command.

If so then your CIM Data should be showing a raid issue as health state 20 is a critical issue.

Post Reply

Return to “VMWare ESX Host Health Monitor”