1985 Volume 28 Issue 3 Pages 252-267
We consider the problem of minimizing the long-run average (expected) cost per unit time in a semi-Markov decision process including an unknown parameter. In the case of general state and action spaces and compact parameter space we construct the adaptive policy which has good properties under some identifiability conditions weaker than those for the strong consistency of the estimator. As example, we treat the age replacement with an unknown failure distribution.