Skip to content

[BUG] Windows agent service doesn't bind to port 6556 until restarted after installation #965

@zenwalk2013

Description

@zenwalk2013

Describe the bug
The CheckMK Windows agent service starts successfully after installation/registration but does not bind to port 6556 until the service is restarted. This causes the win_wait_for port verification task in the agent role to timeout, even though the service is running.
After installation completes:
CheckmkService status shows as "Running"
Port 6556 is NOT listening (verified with netstat -an | findstr 6556)
After manually restarting the service with restart-service CheckmkService, port 6556 becomes active and listening

This appears to be a Windows-specific issue where the agent controller (cmk-agent-ctl.exe) doesn't immediately initialize the network listener after installation/registration, requiring a service restart to fully activate.

Component Name
Component Name: ansible_collections/checkmk/general/roles

Ansible Version
ansible [core 2.20.0]
jinja version = 3.1.6
pyyaml version = 6.0.3 (with libyaml v0.2.5)


**Checkmk Version and Edition**
CheckMK version: 2.4.0p3

**Collection Version**
checkmk.general                          6.5.0

To Reproduce
Steps to reproduce the behavior:

Install CheckMK agent on Windows 11 host using the checkmk.general.agent role
Use configuration with checkmk_agent_mode: 'pull' and checkmk_agent_tls: false
Wait for the role to complete agent installation and registration
Observe the win_wait_for task timeout with error:

TASK [checkmk.general.agent : Win32NT: Verify Checkmk Agent Port is open.] ****
fatal: [monitoring-host: FAILED! => {
"changed": false,
"elapsed": 60.900359599999994,
"msg": "timeout while waiting for 127.0.0.1:6556 to start listening",
"wait_attempts": 20
}

Expected behavior
After the CheckMK agent installation and registration completes:

The CheckmkService should be running AND listening on port 6556
The win_wait_for task should successfully verify the port is open
No manual service restart should be required

Actual behavior
After installation completes:The CheckmkService shows as "Running" but is NOT listening on port 6556The win_wait_for task times out after 60 secondsManual service restart is required to bind port 6556PS C:\Users\user> get-service CheckmkServiceStatus Name DisplayName------ ---- -----------Running CheckmkService Checkmk ServicePS C:\Users\user> netstat -an | findstr 6556# No output - port not listening despite service runningPS C:\Users\user> restart-service CheckmkServicePS C:\Users\user> netstat -an | findstr 6556TCP 0.0.0.0:6556 0.0.0.0:0 LISTENINGTCP [::]:6556 [::]:0 LISTENING

Minimum reproduction example

  • name: Install CheckMk agent from monitoring server
    ansible.builtin.include_role:
    name: checkmk.general.agent
    vars:
    checkmk_agent_version: "2.4.0p3"
    checkmk_agent_server_protocol: https
    checkmk_agent_server_validate_certs: false
    checkmk_agent_server_port: 443
    checkmk_agent_configure_firewall: false
    checkmk_agent_site: 'sci_monitoring'
    checkmk_agent_user: "{{ checkmk_automation_user }}"
    checkmk_agent_secret: "{{ checkmk_automation_secret }}"
    checkmk_agent_registration_server_protocol: "https"
    checkmk_agent_add_host: false
    checkmk_agent_host_name: "{{ inventory_hostname }}"
    checkmk_agent_tls: false
    checkmk_agent_mode: 'pull'

Additional context
workaround by adding restart plays after installation play in the playbook

  • name: Install CheckMk agent from monitoring server
    ansible.builtin.include_role:
    name: checkmk.general.agent
    vars:
    checkmk_agent_mode: 'ssh' # Skip port check to avoid timeout

  • name: Restart CheckMK service to ensure port binding (Windows)
    ansible.windows.win_service:
    name: CheckMkService
    state: restarted
    when: ansible_os_family == "Windows"

  • name: Wait for CheckMK agent port to be listening (Windows)
    ansible.windows.win_wait_for:
    port: 6556
    timeout: 30
    when: ansible_os_family == "Windows"

Suggested fix
The agent role's Win32NT.yml tasks should include an automatic service restart after agent installation/registration and before the port verification check. This would ensure the agent is fully initialized and listening on the correct port.
Proposed change location: roles/agent/tasks/Win32NT.yml - add a service restart task before line 123 (the win_wait_for port verification task).

OS: Windows 11 (also reported on Windows Server 2019/2022)
Issue occurs: On both fresh installations and updates
Affected versions:

Previously observed with collection version 5.10.1
Still present in version 6.5.0

Agent mode: pull mode without TLS
This is a Windows-specific issue - Linux agent installations work correctly

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions