**ERIC Number:**ED124161

**Record Type:**Non-Journal

**Publication Date:**1976-Mar

**Pages:**25

**Abstractor:**N/A

**Reference Count:**N/A

**ISBN:**N/A

**ISSN:**N/A

Partially Observable Markov Decision Processes Over an Infinite Planning Horizon with Discounting. Technical Report No. 77.

Wollmer, Richard D.

The true state of the system described here is characterized by a probability vector. At each stage of the system an action must be chosen from a finite set of actions. Each possible action yields an expected reward, transforms the system to a new state in accordance with a Markov transition matrix, and yields an observable outcome. The problem of finding the total maximum discounted reward as a function of the probability state vector may be formulated as a linear program with an infinite number of constraints. The reward function may be expressed as a partial N-dimensional Maclaurin series. The coefficients in this series are also determined as an optimal solution to a linear program with an infinite number of constraints. A sequence of related finitely constrained linear programs is solved which then generates a sequence of solutions that converge to a local minimum for the infinitely constrained program. This model is applicable to computer assisted instruction systems as well as to other situations. (Author/CH)

**Publication Type:**Reports - Descriptive

**Education Level:**N/A

**Audience:**N/A

**Language:**N/A

**Sponsor:**Advanced Research Projects Agency (DOD), Washington, DC.; Office of Naval Research, Arlington, VA. Personnel and Training Research Programs Office.

**Authoring Institution:**University of Southern California, Los Angeles. Behavioral Technology Labs.