How-To Manually Test Probes

This forum supports the ESX Host Health Monitor plugin. When posting post screenshots of issues and any script and command logs listed in the probe consoles.
mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

I've removed and reinstalled the python script but same result.

I also ran the script manually again on all 4 servers and all of them show the correct results except it's not showing correctly for 2 out of the 4 servers listed in the VMWare ESX Health Monitor window.
Screenshot 2023-02-21 135731.png
Screenshot 2023-02-21 135731.png (18.86 KiB) Viewed 13403 times
ESX05 and ESXi101 are showing the correct status but ESXi102/201 both have battery issues but show okay.

Though the script is showing the correct value so maybe we need to clear out the data table like we did for ESXi101 as that one does show the correct status?
Screenshot 2023-02-21 140034.png
Screenshot 2023-02-21 140034.png (3.94 KiB) Viewed 13403 times

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

Hi Cubert,

Any idea or update on a solution?

Thanks.
J.

User avatar
Cubert
Posts: 2457
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

Sorry J,

I was waiting for the results of
Though the script is showing the correct value so maybe we need to clear out the data table like we did for ESXi101 as that one does show the correct status?
Did you do this? I am assuming either no or it failed to help?

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

Hi Cubert,

Oh sorry, I missed that and did not do that for the other devices.
let me remove the data from the data table for those other devices and I"ll let you know if tha fixed it.

J.

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

I've delete the data from the cim data table for the 4 devices in question and 2 of the 4 are reporting okay but other two are not.

This is me running the script manually and as you can see 3 out of the 4 are returning an error.
Screenshot 2023-03-01 161757.png
Screenshot 2023-03-01 161757.png (20.85 KiB) Viewed 13326 times
When I check the plugin_p4l_vmware_healthmon_hosts table it shows 3 out of the 4 have no issues, this does not line up with the script that I ran manually.
Screenshot 2023-03-01 161801.png
Screenshot 2023-03-01 161801.png (53.86 KiB) Viewed 13326 times
The CIM data status for the 4 devices do match with the status as shown in the screenshot above which matches the status in the client VMWare ESX Health Monitor screen in Automate.

Why is it not ingesting the failed status into the hosts table and cimdata table?
I might be able to get you the output details from when the Automate script runs if that would help though not sure if that gets us all the details.

User avatar
Cubert
Posts: 2457
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

interesting!!!


OK lets do some script logging. For some reason we either are not getting a successful status or SQL is not updating as expected in script.

Just to give you a bit of scripting logic we are using:

Name of script :P4A ESXi Health Monitor Service

Lines 1 - 29 deal with testing for and installing python3 and any supporting scripts and extensions to probing agent.

Line 30 and 31 test for Python script and make sure its readable as a script.

Line 33 we query for all ESX hosts that are assigned to Probing agent.

Line 34 we start to loop through the query of ESX hosts.

We use several line to validate the ESX data we have before executing the first of to probes against a single ESX host.

** On Line 47 we execute the first probe command which is not in Verbose mode so we should get just a simple return.

Code: Select all

C:\Python3\Python.exe -W ignore C:\Python3\check_esxi_hardware.py -H @ESXHOSTIP@ -U @ESXUsername@ -P "@ESXPassword@" -V @ESXVender@
This is the command that is returning what your seeing in the last post.

Line 48 we log this output to the agents script log (PROBE -> [ @PROBEDATA@ ])
Good point to verify that data is accurate in script log as SQL insert is the next step.


Line 49 we create a Script variable that hold the SQL Query value

Code: Select all

UPDATE plugin_p4l_vmware_healthmon_hosts SET LastScan = NOW(),  Status = '@PROBEDATA@' WHERE ESXHost = '@ESXHOSTIP@' and ProbeID = %computerid%
Line 50 we execute the SQL query which at this point should be updating this views top status portion (above CIM data)

Image


Line 51 we execute the same command again but with Verbose turned on which spits out all our CIM data, we then save that to SQL on line 54.

Code: Select all

C:\Python3\Python.exe -W ignore C:\Python3\check_esxi_hardware.py -v -H @ESXHOSTIP@ -U @ESXUsername@ -P "@ESXPassword@" -V @ESXVender@


Line 54

Code: Select all

INSERT IGNORE INTO `plugin_p4l_vmware_healthmon_cimdata` (`ClientID`,`LocationID`,`ProbeID`,`ESXHost`,`CIM_Item`,`CIM_Value`) VALUES @CIMSQLDATA@ ON DUPLICATE KEY UPDATE CIM_Value=VALUES(CIM_Value)


So first things to test is, Did we get correct data printed to agents script log for line 48? That will show us what it will be trying to input into SQL.

Then did SQL take that data or did it fail on line 50?

if you add a new script line before 50 that prints out to script log (like line 48)

Code: Select all

Here is SQL query -> [ @SCANSQL@ ]
This will print out the actual SQL query with all data included in query. Copy and paste that query into your SQL. Does SQL fail and if so with what return?

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

I added the extra line into the script and when I tested it, the results match what the script is producing so it's not an issue with ingesting the data, at least not that I have been able to detect.

Now I have a different issue :(
The Python script isn't detecting the correct state anymore. I did try to restart the VMware management agent services but that didn't make a difference.
When I run the exact same cmd on the same servers that still have the hardware issue as before, it now returns status okay for 3 out of the 4 servers! :shock:
In the vSphere client it does show the correct status so now I'm confiused on why this is happening.
I did double check that we use the latest version of the script and that's indeed the case.

Here are the snipits for each of the stages and it looks like the issue is with the script.
ESXi102-1.png
ESXi102-1.png (17.1 KiB) Viewed 13243 times
ESXi102-2.png
ESXi102-2.png (3.99 KiB) Viewed 13243 times
ESXi102-3.png
ESXi102-3.png (43.54 KiB) Viewed 13243 times
ESXi102-iLO.png
ESXi102-iLO.png (12.85 KiB) Viewed 13243 times
Is there anything else we can do other then maybe running a debug on the Python script or restart the servers?

One other thing that I noticed is that the schedule of checking servers once per hour is no longer happening.
Is there a way to reset the schedule?

User avatar
Cubert
Posts: 2457
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: How-To Manually Test Probes

Post by Cubert »

The first issue looks to be something with the check_esxi_hardware.py script file. Server shows one thing and ESX CIM shows another.

That's an interesting issue.

We use the guys over at Infiniroot.com Health reader Python script. https://www.claudiokuenzler.com/monitor ... rdware.php


I will need to do some reviews of the docs and such to see why a false reading would come across.


As for the the automation stopping, try restarting the DBagent on Automate. That should reset all the backend automations taking place.

mspguyoi
Posts: 26
Joined: Thu Sep 08, 2022 1:20 pm
1

Re: How-To Manually Test Probes

Post by mspguyoi »

How long do you think you need to get the script issue sorted out?
Just so I know if I should put another solution in place until this is sorted out.

The restart of the databae agent fixed this issue, thanks for that.

Post Reply

Return to “VMWare ESX Host Health Monitor”