A general Classification framework to estimate the optimal dynamic treatment regime
A dynamic treatment regime is a sequence of decision rules, each corresponding to a decision point, that determine that next treatment based on each individual’s own available characteristics and treatment history up to that point. We show that identifying the optimal dynamic treatment regime can be recast as a sequential optimization problem and propose a direct sequential optimization method to estimate the optimal treatment regimes. In particular, at each decision point, the optimization is equivalent to sequentially minimizing a weighted expected misclassification error. Based on this classification perspective, we propose a powerful and flexible C-learning algorithm to learn the optimal dynamic treatment regimes backward sequentially from the last stage until the first stage. C-learning is a direct optimization method that directly targets optimizing decision rules by exploiting powerful optimization/classification techniques and it allows incorporation of patient’s characteristics and treatment history to improve performance, hence enjoying advantages of both the traditional outcome regression based methods (Q-and A-learning) and the more recent direct optimization methods. The superior performance and flexibility of the proposed methods are illustrated through extensive simulation studies.