ERIC Number: ED124161
Record Type: Non-Journal
Publication Date: 1976-Mar
Reference Count: N/A
Partially Observable Markov Decision Processes Over an Infinite Planning Horizon with Discounting. Technical Report No. 77.
Wollmer, Richard D.
The true state of the system described here is characterized by a probability vector. At each stage of the system an action must be chosen from a finite set of actions. Each possible action yields an expected reward, transforms the system to a new state in accordance with a Markov transition matrix, and yields an observable outcome. The problem of finding the total maximum discounted reward as a function of the probability state vector may be formulated as a linear program with an infinite number of constraints. The reward function may be expressed as a partial N-dimensional Maclaurin series. The coefficients in this series are also determined as an optimal solution to a linear program with an infinite number of constraints. A sequence of related finitely constrained linear programs is solved which then generates a sequence of solutions that converge to a local minimum for the infinitely constrained program. This model is applicable to computer assisted instruction systems as well as to other situations. (Author/CH)
Publication Type: Reports - Descriptive
Education Level: N/A
Sponsor: Advanced Research Projects Agency (DOD), Washington, DC.; Office of Naval Research, Arlington, VA. Personnel and Training Research Programs Office.
Authoring Institution: University of Southern California, Los Angeles. Behavioral Technology Labs.