Error Messages and how to resolve

This forum supports the ESX Host Health Monitor plugin. When posting post screenshots of issues and any script and command logs listed in the probe consoles.
Post Reply
reaves
Posts: 6
Joined: Thu Jun 29, 2017 5:02 pm
6

Error Messages and how to resolve

Post by reaves »

Allow me to preface this by saying that I absolutely love this plugin. However, we are receiving some error messages when adding hosts to the plugin that I'd appreciate your assistance with.

Error #1: UNKNOWN: Authentication Error
Image

I have a few servers which are reporting this error. I've tried to perform due diligence and rule out user error. In each case I've logged on to a local system on the network, connected to the ESX host with vSphere and verified the credentials entered in the plugin are valid and allow login. Anything I'm missing here?

Error #2: Plugin never scans host
Image

I've got 1 or 2 of these as well. I've entered the information into the plugin but the host is never scanned. There are no failure messages for these. Again, I've logged onto a system on the local network and connected to the ESX host via vSphere to make sure that I have the IP and credentials correct.

Error #3: OK
Image

I have 2 of these, where the host appears to have been scanned, but the only response that I receive is "OK". Any more verbose logging or error syntax here?

Error #4: ConnectionError: Socket error: [Errno 10061] No connection could be made because the target machine actively refused it
Image

I have several of these as well. Again, in all cases I've connected to the local network and then logged in to the ESX host with vSphere. My first guess is that there are services or daemons that aren't running, but I'm not sure which the plugin is requiring. I'm experiencing this on ESX hosts with and without SSH enabled. What can I do to correct this?

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: Error Messages and how to resolve

Post by Cubert »

Looks like you have several possible issues. If what I say here does not shed some light on your issues then email us at Helpdesk@plugins4labtech.com to open a ticket with us so we can investigate further.

with the latests PYWBEM builds for Python(v10) fixed issues with connecting to newer SSL services like ESX 6.5 + it also came with a flaw. That flaw is that any probe can only query 1 server now per script run. The reason is kinda shadowy but looks to be based on a cache that the LT script holds during the execution at the agent. This cache causes the first SSL connection to work but the second fails with typically "No connection could be made because the target machine actively refused it" which looks to be number 4 on your list. We also see this behavior in your #2 authentication failures in some instances.

The fix is to have 2 probes at location, one for each ESX host. We hope the maintainers of PYWBEM might resolve this in future releases as it worked prior in the older ones (v6).

We also see that ESX 6.5 comes with CIM turned off by default. You must first administratively enable it at the command line then start up the services for it. Just starting services will look like it worked but the backend services will fail to actually start.
see this post to fix http://www.squidworks.net/2017/02/vmwar ... y-default/

Lastly the best tool for testing is the actual probe. to test that the probe actually works and can make the connections correctly you can do 3 things.

#1 run the probe's command manually to see the output yourselves. replace @XXX@ with the real account data to access ESXhost.

Code: Select all

C:\Python27\Python.exe -W ignore C:\Python27\check_esxi_hardware.py -H @ESXHOSTIP@ -U @ESXUsername@ -P "@ESXPassword@" -V @ESXVender@
#2 using Telnet, Telnet to port 5989 and see if you get a rejection or a failure to open port. If it is working correctly you will connect and get a prompt in Telnet. If you do not start looking at CIM services and possible firewalls running on ESX.

#3 Python install is missing files, make sure your AV software does not stop zipfile downloads from lp.plugins4labtech.com. If a proxy is being used (Baracuda) then allow SSL and HTTP to lp.plugins4labtech.com. Delete the python folder and allow probe to ren a scan. It will see python missing and will reinstall all parts of the probe again. Typically solves probe install issues.



Do not point ESX Probe at a VCenter server. I do not believe that will work.

Let me know if any of this helps.

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: Error Messages and how to resolve

Post by Cubert »

As for #3 of your list. OK is just that ESX is A-OK! otherwise it will be Warning (Yellow) or FAILURE (Red). By right clicking the agent and selecting view CIM data you should get the verbose view of the ESX and any items in error will be marked with a colored dot

reaves
Posts: 6
Joined: Thu Jun 29, 2017 5:02 pm
6

Re: Error Messages and how to resolve

Post by reaves »

Thank you for your replies. Sorry for my delayed response, but I just now had a chance to troubleshoot this further.

There were a few systems which had CIM Server set to start and stop with host, but the service had stopped. Restarting the service via cli fixed those.

On the servers that were having authentication issues, I was able to manually run the script on each. In each case the script would run successfully. However, on looking at it closer, I noticed that each of servers which returned an authentication error for the automated script had either an "@" or multiple "!" characters in the password. The password worked in the manual script because it was enclosed in quotes, but it doesn't look like the Automate script pipes that through properly. In each case I changed the password to remove those characters and now the Automate script works fine.

Thanks again for all your help!

User avatar
MrRat
Posts: 24
Joined: Thu Apr 20, 2017 4:53 pm
6
Contact:

Re: Error Messages and how to resolve

Post by MrRat »

Cubert wrote: Thu Jul 06, 2017 12:36 pm As for #3 of your list. OK is just that ESX is A-OK!
why is the icon not the green check mark?
the grey ellipses otherwise indicate an error so having the same icon for some OK statuses keeps me from doing a quick visual scan to check error state.

Also, it would be nice to have the plugin generate a plugin related ticket for when the status is not OK so I wouldn't have to visually scan it to see if the script failed.

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: Error Messages and how to resolve

Post by Cubert »

There should be an internal monitor that watches the status and cim data status areas and when either fails it should "do something" you have to set that something up in the monitor.

Turn on monitoring in plugin then look for the 3 P4L- CIM monitors

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: Error Messages and how to resolve

Post by Cubert »

And yes,

That OK is not a good ok in step #3 That is a script timeout from what I suspect is the OK coming from LT agent saying well we just gave up...

The question is what was it doing just before that?

User avatar
Cubert
Posts: 2430
Joined: Tue Dec 29, 2015 7:57 pm
8
Contact:

Re: Error Messages and how to resolve

Post by Cubert »

To figure that out we need to look at the probe 's command and script logs. see what it was doing and why it returned a "OK".

Post some of that here so we can have a peek

Post Reply

Return to “VMWare ESX Host Health Monitor”