While a thorough treatment of the issues that can arise in running jobs remotely on a Linux cluster is beyond the scope of this document, we can offer some comments based on our experience and that of other users. The most common issue arising in practice occurs where the job configuration created by the LSF job scheduler does not match the configuration specified by the user. In that case you will probably need to consult with your cluster administrator but you may also be able to take the following steps:
Before your job starts, enter the LSF command bhosts in a terminal on the root node of your cluster. bhosts will give you the status of each of your cluster nodes (up, down, offline, etc.) and the number of cores available on each node. You may find that nodes that you thought were available are in fact down or offline. Note however that not all clusters expose bhosts to users.
Analyst uses bhosts to determine the cluster topology and uses those values as ceilings relative to the user's resource request. If your cluster does not expose bhosts, Analyst falls back on environment variables (see “Cluster Installation”). If it fails to find those variables it runs your job on a single node by default.
Your cluster should have the LSF services sbatchhd, mbatchhd, mbschd, and eauth running. To confirm this, enter the Linux command service --status-all in a terminal on the root node of your cluster. If any of the aforementioned services are not running, start them with the command service [name] start.
Once a job is running you can gather information about the job by entering the LSF command bjobs on the root node of your cluster.