- Successive Over-Relaxation ${Q}$ -Learningcsysl, 4(1):55-60, 2020. [doi]

- Learning Active Spine Behaviors for Dynamic and Efficient Locomotion in Quadruped Robotsro-man 2019: 1-6 [doi]
- Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learningatal 2019: 1931-1933 [doi]
- Predictive and Prescriptive Analytics for Performance Optimization: Framework and a Case Study on a Large-Scale Enterprise Systemicmla 2019: 876-881 [doi]
- Efficient Adaptive Resource Provisioning for Cloud Applications using Reinforcement Learningsaso 2019: 271-272 [doi]
- An Online Sample-Based Method for Mode Estimation Using ODE Analysis of Stochastic Approximation Algorithmscsysl, 3(3):697-702, 2019. [doi]

- A stochastic approximation approach to active queue managementtelsys, 68(1):89-104, 2018. [doi]
- Novel Sensor Scheduling Scheme for Intruder Tracking in Energy Efficient Sensor Networkswcl, 7(5):712-715, 2018. [doi]
- A unified decision making framework for supply and demand management in microgrid networkssmartgridcomm 2018: 1-7 [doi]
- An incremental off-policy search in a model-free Markov decision process using a single sample pathml, 107(6):969-1011, 2018. [doi]
- An online prediction algorithm for reinforcement learning with linear function approximation using cross entropy methodml, 107(8-10):1385-1429, 2018. [doi]
- Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learningmor, 43(1):130-151, 2018. [doi]
- A Linearly Relaxed Approximate Linear Program for Markov Decision Processestac, 63(4):1185-1191, 2018. [doi]
- Analysis of Gradient Descent Methods With Nondiminishing Bounded Errorstac, 63(5):1465-1471, 2018. [doi]

- Adaptive System Optimization Using Random Directions Stochastic Approximationtac, 62(5):2223-2238, 2017. [doi]
- A model based search method for prediction in model-free Markov decision processijcnn 2017: 170-177 [doi]
- Adaptive mean queue size and its rate of change: queue management with random droppingtelsys, 65(2):281-295, 2017. [doi]
- Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient ApproachIEEEcloud 2017: 375-382 [doi]
- Quasi-Newton smoothed functional algorithms for unconstrained and constrained simulation optimizationcoap, 66(3):533-556, 2017. [doi]
- A stability criterion for two timescale stochastic approximation schemesautomatica, 79:108-114, 2017. [doi]
- A Generalization of the Borkar-Meyn Theorem for Stochastic Recursive Inclusionsmor, 42(3):648-661, 2017. [doi]

- A constrained optimization perspective on actor-critic algorithms and application to network routingscl, 92:46-51, 2016. [doi]
- Multiscale Q-learning with linear function approximationdeds, 26(3):477-509, 2016. [doi]
- Actor-Critic Algorithms with Online Feature Adaptationtomacs, 26(4):24, 2016. [doi]

- Simultaneous perturbation methods for adaptive labor staffing in service systemssimulation, 91(5):432-455, 2015. [doi]
- Simultaneous Perturbation Newton Algorithms for Simulation Optimizationjota, 164(2):621-643, 2015. [doi]
- Energy Sharing for Multiple Sensor Nodes With Finite Bufferstcom, 63(5):1811-1823, 2015. [doi]

- Two timescale convergent Q-learning for sleep-scheduling in wireless sensor networkswinet, 20(8):2589-2604, 2014. [doi]
- A simulation-based algorithm for optimal pricing policy under demand uncertaintyitor, 21(5):737-760, 2014. [doi]
- q-Gaussian Distributionstomacs, 24(3), 2014. [doi]
- Newton-based stochastic optimization using q-Gaussian smoothed functional algorithmsautomatica, 50(10):2606-2614, 2014. [doi]
- A Markov Decision Process Framework for Predictable Job Completion Times on Crowdsourcing Platformshcomp 2014: [doi]
- Approximate Dynamic Programming with (min; +) linear function approximation for Markov decision processescdc 2014: 1588-1593 [doi]
- Universal Option Modelsnips 2014: 990-998 [doi]

- Feature Search in the Grassmanian in Online Reinforcement Learningjstsp, 7(5):746-758, 2013. [doi]

- An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processesjota, 153(3):688-708, 2012. [doi]
- q-Gaussian based Smoothed Functional algorithms for stochastic optimizationisit 2012: 1059-1063 [doi]
- A novel Q-learning algorithm with function approximation for constrained Markov decision processesallerton 2012: 400-405 [doi]
- General-sum stochastic games: Verifiability conditions for Nash equilibriaautomatica, 48(11):2923-2930, 2012. [doi]
- Threshold Tuning Using Stochastic Optimization for Graded Signal Controltvt, 61(9):3865-3880, 2012. [doi]
- Optimal multi-layered congestion based pricing schemes for enhanced QoScn, 56(4):1249-1262, 2012. [doi]

- Reinforcement learning with average cost for adaptive control of traffic lights at intersectionsitsc 2011: 1640-1645 [doi]
- Stochastic Algorithms for Discrete Parameter Simulation Optimizationtase, 8(4):780-793, 2011. [doi]
- Stochastic approximation algorithms for constrained optimization via simulationtomacs, 21(3):15, 2011. [doi]
- An Optimized SDE Model for Slotted Alohatcom, 59(6):1502-1508, 2011. [doi]
- Smoothed Functional and Quasi-Newton Algorithms for Routing in Multi-stage Queueing Network with Constraintsicdcit 2011: 175-186 [doi]
- Reinforcement Learning With Function Approximation for Traffic Signal Controltits, 12(2):412-421, 2011. [doi]

- An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processesscl, 59(12):760-766, 2010. [doi]
- Optimized Policies for the Retransmission Probabilities in Slotted Alohasimulation, 86(4):247-261, 2010. [doi]
- An efficient algorithm for scheduling in bluetooth piconets and scatternetswinet, 16(7):1799-1816, 2010. [doi]

- Optimal parameter trajectory estimation in parameterized SDEs: An algorithmic proceduretomacs, 19(2), 2009. [doi]
- Natural actor-critic algorithmsautomatica, 45(11):2471-2482, 2009. [doi]
- A proof of convergence of the B-RED and P-RED algorithms for random early detectionicl, 13(10):809-811, 2009. [doi]
- Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximationnips 2009: 1204-1212 [doi]
- A probabilistic constrained nonlinear optimization framework to optimize RED parameterspe, 66(2):81-104, 2009. [doi]

- Simulation-Based Optimization Algorithms for Finite-Horizon Markov Decision Processessimulation, 84(12):577-600, 2008. [doi]
- New algorithms of the Q-learning typeautomatica, 44(4):1111-1119, 2008. [doi]
- An efficient ad recommendation system for TV programsmms, 14(2):73-87, 2008. [doi]

- Reinforcement Learning Based Algorithms for Average Cost Markov Decision Processesdeds, 17(1):23-52, 2007. [doi]
- Adaptive Newton-based multivariate smoothed functional algorithms for simulation optimizationtomacs, 18(1), 2007. [doi]
- Discrete parameter simulation optimization algorithms with applications to admission control with dependent service timescdc 2007: 2986-2991 [doi]
- Link route pricing for enhanced QoScdc 2007: 1504-1509 [doi]