Problem Description:
Network Attached Storage (NAS) backups are failing intermittently. The primary error message displayed is: failed to mount nas share as given network path not found. This issue can result in some backup jobs for a particular NAS share failing, while others may succeed.
Cause:
The root cause of this backup failure is a network communication issue between a specific NAS proxy server and the NAS storage device. In environments with multiple proxy servers, backups are distributed among them. The failure occurs when a backup job is assigned to a proxy that cannot establish a connection to the NAS device over the required network port (typically the SMB port 445,139). This prevents the proxy from mounting the NAS share, leading to the "network path not found" error, even though other proxies in the same environment can connect successfully.
Traceback:
The system logs indicate that the backup process fails at the point of creating a connection to the storage device. The key error messages point to a connection timeout, which suggests a network block or connectivity problem. A generalized representation of the error is:
dial tcp <NAS_DEVICE_IP>:<SMB_PORT>: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
This is followed by messages indicating the inability to initialize or create the SMB (Server Message Block) filesystem, which is essential for accessing the NAS share.
Unable to create SMB Filesystem
Resolution:
The immediate resolution focuses on ensuring business continuity by routing backups through working components while the root cause is permanently fixed.
Isolate the Faulty Proxy: Identify the proxy server that is failing to connect to the NAS device. This can be done by reviewing the logs of failed jobs. Once identified, move the problematic proxy into a different proxy pool or disable it to prevent it from being assigned new backup tasks. This ensures that all subsequent backups are handled by the remaining healthy proxies.
Address Network Connectivity: The underlying network issue must be resolved. Engage the network or firewall administration teams to diagnose the connectivity problem between the problematic proxy and the NAS device. This typically involves checking firewall rules, network routing, and port accessibility to ensure that communication is not being blocked.
Re-introduce the Proxy: After the network and firewall teams confirm that the connectivity issue has been resolved, the proxy server can be moved back to its original pool and re-enabled for backup operations.
Verification:
The success of the resolution can be verified by the following actions:
Run a Manual Backup: After isolating the problematic proxy, initiate a manual backup for a previously failing backupset. Successful completion of this job confirms that the healthy proxies are working as expected.
Monitor Scheduled Backups: Observe the regularly scheduled backup jobs to ensure they now complete consistently without any failures.
Test the Resolved Proxy: Before fully re-introducing the fixed proxy into the production pool, perform a test backup using only that proxy to confirm its connectivity is fully restored.