-
David Anderson authored
- client: change work fetch policy to avoid starving GPUs in situations where GPU exclusions are used. - client: fix bug in round-robin simulation when GPU exclusions are used. Note: this fixes a major problem (starvation) with project-level GPU exclusion. However, project-level GPU exclusion interferes with most of the client's scheduling policies. E.g., round-robin simulation doesn't take GPU exclusion into account, and the resulting completion estimates and device shortfalls can be wrong by an order of magnitude. The only way I can see to fix this would be to model each GPU instance as a separate resource, and to associate each job with a particular GPU instance. This would be a sweeping change in both client and server.
777f1f11