How-To Manually Test Probes

This forum supports the ESX Host Health Monitor plugin. When posting post screenshots of issues and any script and command logs listed in the probe consoles.
User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

OK can you do a manual test setting -V auto and post the output. What was the status number given with data? 2 is considered "OK"

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

Attached is the RAW output and screen shots of the CIM data.
Attachments
check_esxi_hardware - raw output2.zip
(2.54 KiB) Downloaded 149 times
Screenshot 2022-10-13 122024.png
Screenshot 2022-10-13 122024.png (13.4 KiB) Viewed 14677 times
Screenshot 2022-10-13 144225.png
Screenshot 2022-10-13 144225.png (7.37 KiB) Viewed 14677 times
Screenshot 2022-10-13 144313.png
Screenshot 2022-10-13 144313.png (156.51 KiB) Viewed 14677 times
Screenshot 2022-10-13 144333.png
Screenshot 2022-10-13 144333.png (163.46 KiB) Viewed 14677 times
Screenshot 2022-10-13 144440.png
Screenshot 2022-10-13 144440.png (147.2 KiB) Viewed 14677 times
Screenshot 2022-10-13 144509.png
Screenshot 2022-10-13 144509.png (39.03 KiB) Viewed 14677 times

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

I am not finding Megacell in your raw dump. Did you run that with the same vender?



Screenshot 2022-10-13 153333.png
Screenshot 2022-10-13 153333.png (67.47 KiB) Viewed 14675 times

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

I double checked and the screenshots from the ESX Health Monitor and raw script output are from the same server.
Screenshot 2022-10-13 161034.png
Screenshot 2022-10-13 161034.png (38.28 KiB) Viewed 14674 times

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

Yes but are you running the same switches on the raw as you are in the probe automatically?

I notice we do not remove old data during scan updates.

I.E:
If you were to run the probe once for every server type the probe may output more or different data types based on that probes settings at the time of probe.

If the setting "AUTO" generates CIM item "Battery 1" then the database will now have a battery1 saved in the SQL table for plugin. If you then switch it to "HP" then it may not return a "Battery 1" but may instead return a "MegaCell Battery" which is then saved to the sql table. Now if you were to query SQL table for all data present for computerID XXX then you will now get both "Battery1" and "Mega Battery".

So you have the setting set to XXX but yet the raw data returned does not bare the item name within.

This is a plausible bug:

To fix this issue now within your plugin, open the scripts folder in control center and find the maintenance folder. Inside should be "P4A ESX Health Monitor" script. Some Automate hosts do not have a maintenance folder or folder is not consistent with Automate best standards, you may need to do a script search to find the title in your Automate host.

Open script and edit line number 54.

Screenshot 2022-10-14 105558.png
Screenshot 2022-10-14 105558.png (91.8 KiB) Viewed 14662 times


Replace the following SQL with the new SQL query.

Code: Select all

INSERT IGNORE INTO `plugin_p4l_vmware_healthmon_cimdata` (`ClientID`,`LocationID`,`ProbeID`,`ESXHost`,`CIM_Item`,`CIM_Value`) VALUES @CIMSQLDATA@ ON DUPLICATE KEY UPDATE CIM_Value=VALUES(CIM_Value)


New SQL code

Code: Select all

DELETE FROM `plugin_p4l_vmware_healthmon_cimdata` WHERE ProbeID= '%ComputerID%' and ESXHost='@ESXHOSTIP@' ;INSERT IGNORE INTO `plugin_p4l_vmware_healthmon_cimdata` (`ClientID`,`LocationID`,`ProbeID`,`ESXHost`,`CIM_Item`,`CIM_Value`) VALUES @CIMSQLDATA@ ON DUPLICATE KEY UPDATE CIM_Value=VALUES(CIM_Value)
We added a delete all records matching probeID/esxhostIP before inserting the new records. This will clear all old records out before saving new data. The ON DUPLICATE KEY is now redundant as there should never be a update at this point by I left it there anyhow, it will not cause issues with query.

See if this change has any effect on what data you see when the options are changed to different hardware types.

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

Hi Cubert,

We tried your update and it worked for that one server so thanks for that but now we have 3 other HP Proliant servers that are not showing the proper battery status and controller status but when I run the script manually it does show the correct status.
Screenshot 2023-01-24 113603.png
Screenshot 2023-01-24 113603.png (17.1 KiB) Viewed 14152 times
Screenshot 2023-01-24 113603.png
Screenshot 2023-01-24 113603.png (17.1 KiB) Viewed 14152 times
What can we do to resolve this?
Attachments
Screenshot 2023-01-24 113815.png
Screenshot 2023-01-24 113815.png (70.08 KiB) Viewed 14152 times

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

Could you post for me the raw data from the two commands?

Pipe the output to a text file and post the text files here.

I will have a look at the parser code that reads these files to see if we are missing data points.

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

I've attached a zip file with the two raw outputs.
Raw Output.zip
(4.91 KiB) Downloaded 145 times

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

OK I am seeing HP report CIM value of 20 for the RAID controller. If RAID was in good standing that value should be "5" for HP meaning GOOD and operational.

The value 20 for HP was set to "Vender Reserved','Good" in our database which is what HP CIM chart showed for HP CIM DATA back when we first created mappings..

I did a new search for HP value maps and found a new "Value Map" for HP.
The possible values are 0 to 30, where 5 means the element is entirely healthy and 30 means the element is completely non-functional. The following continuum is defined: n Non-recoverable Error" (30) – The element has completely failed, and recovery is not possible. All functionality provided by this element has been lost. "

""Critical Failure" (25) – The element is "

"non-functional and recovery might not be possible. "

""Major Failure" (20) – The element is failing. It is "

"possible that some or all of the functionality of this "

"component is degraded or not working. n"

""Minor Failure" (15) – All functionality is available "

"but some might be degraded. n"

""Degraded/Warning" (10) – The element is in working "

"order and all functionality is provided. However, the "

"element is not working to the best of its abilities. For "

"example, the element might not be operating at optimal "

"performance or it might be reporting recoverable errors. n"

""OK" (5) – The element is fully functional and is "

"operating within normal operational parameters and "

"without error. n"

""Unknown" (0) – The implementation cannot report on "

"HealthState at this time. n"

"DMTF has reserved the unused portion of the continuum "

"for additional HealthStates in the future." ),

ValueMap { "0", "5", "10", "15", "20", "25", "30", ".." },


If you have SQL access to your Environment you can execute this to resolve the issues, else you will need to download the latest version once I have it posted and update your plugin to get the new settings .


Here is SQL query needed.

Code: Select all

REPLACE INTO `plugin_p4l_vmware_healthmon_types` VALUES (1,'hp',0,'Unknown','Good'),(2,'hp',1,'Other','Good'),(3,'hp',2,'OK','Good'),(4,'hp',3,'Degraded','Critical'),(5,'hp',4,'Stressed','Critical'),(6,'hp',5,'OK','Good'),(7,'hp',6,'Error','Critical'),(8,'hp',7,'Non-Recoverable Error','Critical'),(9,'hp',8,'Starting','Good'),(10,'hp',9,'Stopping','Warning'),(11,'hp',10,'Stopped','Warning'),(12,'hp',11,'In Service','Good'),(13,'hp',12,'No Contact','Warning'),(14,'hp',13,'Lost Communication','Critical'),(15,'hp',14,'Aborted','Critical'),(16,'hp',15,'Minor Failure','Critical'),(17,'hp',16,'Supporting Entity in Error','Critical'),(18,'hp',17,'Completed','Good'),(19,'hp',18,'Power Mode','Good'),(20,'hp',19,'DTMF Reserved','Good'),(21,'hp',20,'Major Failure','Critical'),(22,'dell',0,'Not Available','Good'),(23,'dell',1,'Other','Good'),(24,'dell',2,'OK','Good'),(25,'dell',3,'Degraded','Critical'),(26,'dell',4,'Stressed','Critical'),(27,'dell',5,'Predictive Failure','Warning'),(28,'dell',6,'Error','Critical'),(29,'dell',7,'Non-Recoverable Error','Critical'),(30,'dell',8,'Starting','Good'),(31,'dell',9,'Stopping','Good'),(32,'dell',10,'Stopped','Good'),(33,'dell',11,'In Service','Good'),(34,'dell',12,'No Contact','Warning'),(35,'dell',13,'Lost Communication','Critical'),(36,'dell',14,'Aborted','Critical'),(37,'dell',15,'Dormant','Good'),(38,'dell',16,'Supporting Entity in Error','Critical'),(39,'dell',17,'Completed','Good'),(40,'dell',18,'Power Mode','Good'),(41,'dell',19,'DTMF Reserved','Good'),(42,'dell',20,'Vender Reserved','Good'),(43,'intel',0,'Not Available','Good'),(44,'intel',1,'Other','Good'),(45,'intel',2,'OK','Good'),(46,'intel',3,'Degraded','Critical'),(47,'intel',4,'Stressed','Critical'),(48,'intel',5,'Predictive Failure','Warning'),(49,'intel',6,'Error','Critical'),(50,'intel',7,'Non-Recoverable Error','Critical'),(51,'intel',8,'Starting','Good'),(52,'intel',9,'Stopping','Warning'),(53,'intel',10,'Stopped','Warning'),(54,'intel',11,'In Service','Good'),(55,'intel',12,'No Contact','Warning'),(56,'intel',13,'Lost Communication','Critical'),(57,'intel',14,'Aborted','Critical'),(58,'intel',15,'Dormant','Good'),(59,'intel',16,'Supporting Entity in Error','Critical'),(60,'intel',17,'Completed','Good'),(61,'intel',18,'Power Mode','Good'),(62,'intel',19,'DTMF Reserved','Good'),(63,'intel',20,'Vender Reserved','Good'),(64,'ibm',0,'Not Available','Good'),(65,'ibm',1,'Other','Good'),(66,'ibm',2,'OK','Good'),(67,'ibm',3,'Degraded','Critical'),(68,'ibm',4,'Stressed','Critical'),(69,'ibm',5,'Predictive Failure','Warning'),(70,'ibm',6,'Error','Critical'),(71,'ibm',7,'Non-Recoverable Error','Critical'),(72,'ibm',8,'Starting','Good'),(73,'ibm',9,'Stopping','Warning'),(74,'ibm',10,'Stopped','Warning'),(75,'ibm',11,'In Service','Good'),(76,'ibm',12,'No Contact','Warning'),(77,'ibm',13,'Lost Communication','Critical'),(78,'ibm',14,'Aborted','Critical'),(79,'ibm',15,'Dormant','Good'),(80,'ibm',16,'Supporting Entity in Error','Critical'),(81,'ibm',17,'Completed','Good'),(82,'ibm',18,'Power Mode','Good'),(83,'ibm',19,'DTMF Reserved','Good'),(84,'ibm',20,'Vender Reserved','Good'),(85,'unknown',0,'Not Available','Good'),(86,'unknown',1,'Other','Good'),(87,'unknown',2,'OK','Good'),(88,'unknown',3,'Degraded','Critical'),(89,'unknown',4,'Stressed','Critical'),(90,'unknown',5,'Predictive Failure','Warning'),(91,'unknown',6,'Error','Critical'),(92,'unknown',7,'Non-Recoverable Error','Critical'),(93,'unknown',8,'Starting','Good'),(94,'unknown',9,'Stopping','Warning'),(95,'unknown',10,'Stopped','Warning'),(96,'unknown',11,'In Service','Good'),(97,'unknown',12,'No Contact','Warning'),(98,'unknown',13,'Lost Communication','Critical'),(99,'unknown',14,'Aborted','Critical'),(100,'unknown',15,'Dormant','Good'),(101,'unknown',16,'Supporting Entity in Error','Critical'),(102,'unknown',17,'Completed','Good'),(103,'unknown',18,'Power Mode','Good'),(104,'unknown',19,'DTMF Reserved','Good'),(105,'unknown',20,'Vender Reserved','Good'),(106,'auto',0,'Not Available','Good'),(107,'auto',1,'Other','Good'),(108,'auto',2,'OK','Good'),(109,'auto',3,'Degraded','Critical'),(110,'auto',4,'Stressed','Critical'),(111,'auto',5,'Predictive Failure','Warning'),(112,'auto',6,'Error','Critical'),(113,'auto',7,'Non-Recoverable Error','Critical'),(114,'auto',8,'Starting','Good'),(115,'auto',9,'Stopping','Warning'),(116,'auto',10,'Stopped','Warning'),(117,'auto',11,'In Service','Good'),(118,'auto',12,'No Contact','Warning'),(119,'auto',13,'Lost Communication','Critical'),(120,'auto',14,'Aborted','Critical'),(121,'auto',15,'Dormant','Good'),(122,'auto',16,'Supporting Entity in Error','Critical'),(123,'auto',17,'Completed','Good'),(124,'auto',18,'Power Mode','Good'),(125,'auto',19,'DTMF Reserved','Good'),(126,'auto',20,'Major Failure','Critical'),(127, 'hp', 30, 'Non-recoverable Error', 'Critical'),(128, 'hp', 25, 'Critical Failure', 'Critical');"

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

Here is link to plugin download if you need to update the entire plugin..


Build 5.0.0.6

https://delivery.shopifyapps.com/-/6f70 ... bf78e4f3d1

Post Reply

Return to “VMWare ESX Host Health Monitor”