<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://sherstan.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://sherstan.com/" rel="alternate" type="text/html" /><updated>2025-10-30T13:55:19+00:00</updated><id>https://sherstan.com/feed.xml</id><title type="html">Craig Sherstan</title><subtitle>Write an awesome description for your new site here. You can edit this line in _config.yml. It will appear in your document head meta (for Google search results) and in your feed.xml site description.</subtitle><author><name>Craig Sherstan</name></author><entry><title type="html">Two Tales of Reward Design</title><link href="https://sherstan.com/two-tales-of-reward-design/" rel="alternate" type="text/html" title="Two Tales of Reward Design" /><published>2025-10-05T00:00:00+00:00</published><updated>2025-10-05T00:00:00+00:00</updated><id>https://sherstan.com/two-tales-of-reward-design</id><content type="html" xml:base="https://sherstan.com/two-tales-of-reward-design/"><![CDATA[<p>At the end of August, 2025, I gave a talk at the University of Alberta.
The talk was titled <a href="https://youtu.be/PrsKX5ZWt_4">“Two Tales of Reward Design”</a>.</p>

<p>I talked about two different projects that I’ve worked on at Sony AI.</p>
<ol>
  <li>
    <p><a href="https://www.nature.com/articles/s41586-021-04357-7">GT Sophy</a>
A reinforcement learning agent that plays Gran Turismo at a superhuman level.</p>
  </li>
  <li>
    <p><a href="https://arxiv.org/abs/2206.13901">Value Function Decomposition for Iterative Design of Reinforcement Learning Agents</a>
By decomposing the value function into components, we can gain insight into the agent’s learning behavior and use this to improve agent design.</p>
  </li>
</ol>

<p>A guiding theme to the talk is the importance of reward design in reinforcement learning, something that has largely
been ignored in the community.</p>]]></content><author><name>Craig Sherstan</name></author><category term="reinforcement learning" /><category term="reward design" /><category term="value function decomposition" /><summary type="html"><![CDATA[At the end of August, 2025, I gave a talk at the University of Alberta. The talk was titled “Two Tales of Reward Design”.]]></summary></entry><entry><title type="html">First Tokyo AI RL Talk</title><link href="https://sherstan.com/blog/i-recently-gave-a-talk-at-the-first-reinforcement/" rel="alternate" type="text/html" title="First Tokyo AI RL Talk" /><published>2024-11-16T08:39:34+00:00</published><updated>2024-11-16T08:39:34+00:00</updated><id>https://sherstan.com/blog/i-recently-gave-a-talk-at-the-first-reinforcement</id><content type="html" xml:base="https://sherstan.com/blog/i-recently-gave-a-talk-at-the-first-reinforcement/"><![CDATA[<p>I recently gave a talk at the first Reinforcement Learning focused session of Tokyo AI Talks.<br/><br/>I gave a 10 minute introduction to RL and then talked about GT Sophy - the RL-based agent we trained to outrace the top human racers in Gran Turismo</p>]]></content><author><name>Craig Sherstan</name></author><category term="Blog" /><category term="reinforcement learning" /><category term="Gran Turismo" /><category term="Tokyo" /><category term="TAI" /><summary type="html"><![CDATA[I recently gave a talk at the first Reinforcement Learning focused session of Tokyo AI Talks.I gave a 10 minute introduction to RL and then talked about GT Sophy - the RL-based agent we trained to outrace the top human racers in Gran Turismo]]></summary></entry><entry><title type="html">ALA 2020: Temporally Extended Auxiliary Tasks</title><link href="https://sherstan.com/ala-2020-temporally-extended-auxiliary-tasks/" rel="alternate" type="text/html" title="ALA 2020: Temporally Extended Auxiliary Tasks" /><published>2020-04-07T08:16:54+00:00</published><updated>2020-04-07T08:16:54+00:00</updated><id>https://sherstan.com/ala-2020-temporally-extended-auxiliary-tasks</id><content type="html" xml:base="https://sherstan.com/ala-2020-temporally-extended-auxiliary-tasks/"><![CDATA[<p>My paper “Work in Progress: Temporally Extended Auxiliary Tasks” has been accepted to the<a href="https://ala2020.vub.ac.be/"> Adaptive and Learning Agents</a> workshop at AAMAS 2020.<br/><br/>This work was done in collaboration with Bilal Kartal, Pablo Hernandez-Leal and Matt Taylor while I was intern at Borealis AI last summer. It was a pleasure to work with all of them.<br/><br/>The paper focuses on the question, what effect does prediction timescale of a GVF auxiliary task have on policy learning? If that doesn’t make sense then hopefully this will help. Auxiliary tasks are additional losses placed on a neural network whose sole purpose is to provide gradients for training a core network (at least that’s how I define them). GVFs (general value functions) are a type predictor.<br/><br/>In short, we haven’t yet a clear relationship between the timescale and policy learning. However, we do note that adding the GVF auxiliary tasks allows us to shorten the trajectory length used in the A2C algorithm.</p>]]></content><author><name>Craig Sherstan</name></author><category term="reinforcement learning" /><category term="general value functions" /><category term="auxiliary tasks" /><category term="borealisai" /><summary type="html"><![CDATA[My paper “Work in Progress: Temporally Extended Auxiliary Tasks” has been accepted to the Adaptive and Learning Agents workshop at AAMAS 2020.This work was done in collaboration with Bilal Kartal, Pablo Hernandez-Leal and Matt Taylor while I was intern at Borealis AI last summer. It was a pleasure to work with all of them.The paper focuses on the question, what effect does prediction timescale of a GVF auxiliary task have on policy learning? If that doesn’t make sense then hopefully this will help. Auxiliary tasks are additional losses placed on a neural network whose sole purpose is to provide gradients for training a core network (at least that’s how I define them). GVFs (general value functions) are a type predictor.In short, we haven’t yet a clear relationship between the timescale and policy learning. However, we do note that adding the GVF auxiliary tasks allows us to shorten the trajectory length used in the A2C algorithm.]]></summary></entry><entry><title type="html">Gamma-Nets: Generalizing Value Estimation over Timescale</title><link href="https://sherstan.com/gamma-nets-generalizing-value-estimation-over/" rel="alternate" type="text/html" title="Gamma-Nets: Generalizing Value Estimation over Timescale" /><published>2019-12-21T00:44:00+00:00</published><updated>2019-12-21T00:44:00+00:00</updated><id>https://sherstan.com/gamma-nets-generalizing-value-estimation-over</id><content type="html" xml:base="https://sherstan.com/gamma-nets-generalizing-value-estimation-over/"><![CDATA[<p>I’m happy to announce that my paper: “Gamma-Nets: Generalizing Value Estimation over Timescale” has been accepted for oral presentation at AAAI 2020!<br/><br/>You can find the arXiv copy here: <a href="https://arxiv.org/abs/1911.07794">https://arxiv.org/abs/1911.07794</a><br/><br/>This is work that I started as an intern with Cogitai in 2017.</p>]]></content><author><name>Craig Sherstan</name></author><category term="Reinforcement Learning" /><category term="general value functions" /><category term="Cogitai" /><summary type="html"><![CDATA[I’m happy to announce that my paper: “Gamma-Nets: Generalizing Value Estimation over Timescale” has been accepted for oral presentation at AAAI 2020!You can find the arXiv copy here: https://arxiv.org/abs/1911.07794This is work that I started as an intern with Cogitai in 2017.]]></summary></entry><entry><title type="html">Candidacy Passed!</title><link href="https://sherstan.com/candidacy-passed/" rel="alternate" type="text/html" title="Candidacy Passed!" /><published>2019-03-01T08:38:50+00:00</published><updated>2019-03-01T08:38:50+00:00</updated><id>https://sherstan.com/candidacy-passed</id><content type="html" xml:base="https://sherstan.com/candidacy-passed/"><![CDATA[<p>This is a bit old, but I successfully passed my PhD candidacy exam back in November, 2018. So all I have left to graduate is to finish my research (which is nearly complete), write my thesis and defend.</p><p>On that note, I’m officially on the market for a real adult job. I’m looking for a research scientist position in any of the following areas: robots, human augmentation, neural interfaces, computational neuroscience, drug discovery, genetic editing, and synthetic biology.</p>]]></content><author><name>Craig Sherstan</name></author><summary type="html"><![CDATA[This is a bit old, but I successfully passed my PhD candidacy exam back in November, 2018. So all I have left to graduate is to finish my research (which is nearly complete), write my thesis and defend.On that note, I’m officially on the market for a real adult job. I’m looking for a research scientist position in any of the following areas: robots, human augmentation, neural interfaces, computational neuroscience, drug discovery, genetic editing, and synthetic biology.]]></summary></entry><entry><title type="html">Borealis AI Internship!</title><link href="https://sherstan.com/borealis-ai-internship/" rel="alternate" type="text/html" title="Borealis AI Internship!" /><published>2019-03-01T08:29:43+00:00</published><updated>2019-03-01T08:29:43+00:00</updated><id>https://sherstan.com/borealis-ai-internship</id><content type="html" xml:base="https://sherstan.com/borealis-ai-internship/"><![CDATA[<p>Good news (for me at least)! I’ll be starting an internship with Borealis AI, Edmonton in April. I’ll get to work with some great researchers including<a href="https://www.borealisai.com/en/team/prof-matthew-e-taylor/"> Matt Taylor</a>.</p>]]></content><author><name>Craig Sherstan</name></author><category term="BorealisAI" /><category term="internship" /><category term="reinforcement learning" /><summary type="html"><![CDATA[Good news (for me at least)! I’ll be starting an internship with Borealis AI, Edmonton in April. I’ll get to work with some great researchers including Matt Taylor.]]></summary></entry><entry><title type="html">Generalizing Value Estimation over Timescale</title><link href="https://sherstan.com/generalizing-value-estimation-over-timescale/" rel="alternate" type="text/html" title="Generalizing Value Estimation over Timescale" /><published>2018-08-01T03:33:20+00:00</published><updated>2018-08-01T03:33:20+00:00</updated><id>https://sherstan.com/generalizing-value-estimation-over-timescale</id><content type="html" xml:base="https://sherstan.com/generalizing-value-estimation-over-timescale/"><![CDATA[<a href="https://drive.google.com/file/d/1K3f13VIWWOcSDsDTViiQkaJeyPaSxfvt/view?usp=sharing">Generalizing Value Estimation over Timescale</a><br/><p><b>Sherstan, C., MacGlashan, J., Pilarski, P. M. (2018) Generalizing Value Estimation over Timescale. <i>FAIM Workshop: Prediction and Generative Modeling in Reinforcement Learning (PGMRL)</i>. Stockholm, Sweden. July 15.</b><br/></p><p>General value functions (GVFs) are an approach
to representing models of an agent’s world as
a collection of predictive questions. A GVF is
defined by: a policy, a prediction target, and a
timescale. Traditionally predictions for a given
timescale must be specified by the engineer and
each timescale learned independently. Here we
present γ-nets, a method for generalizing value
function estimation over timescale, allowing a
given GVF to be trained and queried for any
fixed timescale. The key to our approach is to
use timescale as one of the network inputs. The
prediction target for any fixed timescale is then
available at every timestep and we are free to train
on any number of timescales. We present preliminary
results on a simple test signal.</p>]]></content><author><name>Craig Sherstan</name></author><category term="general value functions" /><category term="Reinforcement Learning" /><category term="ICML2018" /><summary type="html"><![CDATA[Generalizing Value Estimation over TimescaleSherstan, C., MacGlashan, J., Pilarski, P. M. (2018) Generalizing Value Estimation over Timescale. FAIM Workshop: Prediction and Generative Modeling in Reinforcement Learning (PGMRL). Stockholm, Sweden. July 15.General value functions (GVFs) are an approach to representing models of an agent’s world as a collection of predictive questions. A GVF is defined by: a policy, a prediction target, and a timescale. Traditionally predictions for a given timescale must be specified by the engineer and each timescale learned independently. Here we present γ-nets, a method for generalizing value function estimation over timescale, allowing a given GVF to be trained and queried for any fixed timescale. The key to our approach is to use timescale as one of the network inputs. The prediction target for any fixed timescale is then available at every timestep and we are free to train on any number of timescales. We present preliminary results on a simple test signal.]]></summary></entry><entry><title type="html">Successor Representation Literature</title><link href="https://sherstan.com/srliterature/" rel="alternate" type="text/html" title="Successor Representation Literature" /><published>2018-07-27T00:59:36+00:00</published><updated>2018-07-27T00:59:36+00:00</updated><id>https://sherstan.com/srliterature</id><content type="html" xml:base="https://sherstan.com/srliterature/"><![CDATA[<p>This is a list of literature related to the successor representation. It is not exhaustive and I have not read it all. Right now it is just a list; if I have time I’ll add summaries for those papers I’ve read.</p><p><b>Learning the SR</b></p><ul><li>Dayan, P. (1993). Improving Generalization for Temporal Difference Learning: The Successor Representation. <i>Neural Computation</i>, 5(4), 613–624.<br/></li><li>Gehring, C. A. (2015). Approximate Linear Successor Representation. In <i>Reinforcement Learning Decision Making</i>. Retrieved from <a href="http://people.csail.mit.edu/gehring/publications/clement-gehring-rldm-2015.pdf">http://people.csail.mit.edu/gehring/publications/clement-gehring-rldm-2015.pdf</a><br/></li><li>White, L. M. (1996). <i>Temporal Difference Learning: Eligibility Traces and the Successor Representation for Actions</i>. University of Toronto.<br/></li><li>Pitis, S. (2018). Source Traces for Temporal Difference Learning. In <i>AAAI Conference on Artificial Intelligence</i>. New Orleans, Louisiana, USA.<br/></li></ul><p><b>Transfer</b></p><ul><li>Barreto, A., Dabney, W., Munos, R., Hunt, J., Schaul, T., Silver, D., &amp; van Hasselt, H. (2017). Transfer in Reinforcement Learning with Successor Features and Generalised Policy Improvement. In <i>Lifelong Learning: A Reinforcement Learning Approach Workshop @ICML</i>. Sydney, Australia. <br/></li><li>Barreto, A., Munos, R., Schaul, T., &amp; Silver, D. (2016). Successor Features for Transfer in Reinforcement Learning. arXiv: 1606.05312.<br/></li><li>Barreto, A., Borsa, D., Quan, J., Schaul, T., Silver, D., Hessel, M., … Munos, R. (2018). Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement. <i>International Conference on Machine Learning (ICML).</i><br/></li><li>Zhang, J., Springenberg, J. T., Boedecker, J., &amp; Burgard, W. (2017). Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments. In <i>The International Conference on Intelligent Robots and Systems (IROS)</i> (pp. 2371–2378). Vancouver, Canada.<br/></li><li>Zhu, Y., Gordon, D., Kolve, E., Fox, D., Fei-Fei, L., Gupta, A., … Farhadi, A. (2017). Visual Semantic Planning using Deep Successor Representations. <i>International Conference on Computer Vision</i>, <i>2</i>(4), 7.<br/></li><li>Sherstan, C., Machado, M. C., &amp; Pilarski, P. M. (2018). Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation. <i>IROS. </i>Madrid, Spain.<br/></li><li>Kulkarni, T. D., Saeedi, A., Gautam, S., &amp; Gershman, S. J. (2016). Deep Successor Reinforcement Learning. arXiv: 1606.02396.<br/></li><li>Lehnert, L., Tellex, S., &amp; Littman, M. L. (2017). Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning. arXiv: 1708.00102<br/></li><li>Ma, C., Wen, J., &amp; Bengio, Y. (2018). Universal Successor Representations for Transfer Reinforcement Learning. arXiv: 1804.03758.<br/></li></ul><p><b>Exploration</b></p><ul><li>Machado, M. C., Rosenbaum, C., Guo, X., Liu, M., Tesauro, G., &amp; Campbell, M. (2018). Eigenoption Discovery Through The Deep Successor Representation. In <i>International Conference on Learning Representations</i>. Vancouver, Canada.<br/></li></ul><p><b>Neuroscience</b></p><ul><li>Stachenfeld, K. L., Botvinick, M. M., &amp; Gershman, S. J. (2017). The Hippocampus as a Predictive Map. <i>Nature Neuroscience</i>, <i>20</i>, 1643–1653. <br/></li><li>Stachenfeld, K. L., Botvinick, M. M., &amp; Gershman, S. J. (2014). Design Principles of the Hippocampal Cognitive Map. <i>Advances in Neural Information Processing Systems</i>, 1–9.<br/></li><li>Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J., &amp; Daw, N. D. (2017). Predictive Representations Can Link Model-based Reinforcement Learning to Model - free Mechanisms. <i>PLoS Computational Biology</i>, <i>13</i>(9), 1–42.<br/></li><li>Gershman, S. J., Moore, C. D., Todd, M. T., Norman, K. A., &amp; Sederberg, P. B. (2012). The Successor Representation and Temporal Context. <i>Neural Computation</i>, <i>24</i>(6), 1553–1568. <br/></li><li>Banino, A., Barry, C., Uria, B., Blundell, C., Lillicrap, T., Mirowski, P., … Kumaran, D. (2018). Vector-based Navigation using Grid-like Representations in Artificial Agents. <i>Nature</i>, <i>26</i>.<br/></li><li>Ducarouge, A., &amp; Sigaud, O. (2017). The Successor Representation as a Model of Behavioural Flexibility. ?<br/></li><li>Foster, D. J., Morris, R. G. M., &amp; Dayan, P. (2000). A Model of Hippocampally Dependent Navigation, Using the Temporal Difference Learning Rule. <i>Hippocampus</i>, <i>10</i>(1), 1–16.<br/></li><li>Gershman, S. J. (2017). Predicting the past, remembering the future. <i>Current Opinion in Behavioral Sciences</i>, <i>17</i>, 7–13.<br/></li><li>Momennejad, I., Russek, E. M., Cheong, J. H., Botvinick, M. M., Daw, N. D., &amp; Gershman, S. J. (2017). The Successor Representation in Human Reinforcement Learning. <i>Nature Human Behaviour</i>, <i>1</i>(9), 680–692.<br/></li></ul><p><b>Other</b></p><p>There is also a handful of literature that does not necessarily directly make the connection to the SR but uses it nonetheless.</p><ul><li>Yao, H., Szepesvári, C., Sutton, R., Modayil, J., &amp; Bhatnagar, S. (2014). Universal Option Models. <i>Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS)</i>, 1–9.<br/></li></ul><p>There are a number of papers in the field of imitation learning that <i>may</i> be related:</p><ul><li>Apprenticeship Learning Using Linear Programming - Syed, Bowling &amp; Schapire, ICML &lsquo;08 <br/></li><li>Abbeel, P., Ng. A. Y. (2004). Apprenticeship Learning Via Inverse Reinforcement Learning. ICML<br/></li><li>Syed, U. &amp; Schapire, R. A. (2007). Game-Theoretic Approach to Apprenticeship Learning. NIPS<br/></li></ul>]]></content><author><name>Craig Sherstan</name></author><category term="successor representation" /><category term="successor features" /><category term="reinforcement learning" /><category term="planning" /><category term="model-based" /><category term="transfer learning" /><category term="hippocampus" /><summary type="html"><![CDATA[This is a list of literature related to the successor representation. It is not exhaustive and I have not read it all. Right now it is just a list; if I have time I’ll add summaries for those papers I’ve read.Learning the SRDayan, P. (1993). Improving Generalization for Temporal Difference Learning: The Successor Representation. Neural Computation, 5(4), 613–624.Gehring, C. A. (2015). Approximate Linear Successor Representation. In Reinforcement Learning Decision Making. Retrieved from http://people.csail.mit.edu/gehring/publications/clement-gehring-rldm-2015.pdfWhite, L. M. (1996). Temporal Difference Learning: Eligibility Traces and the Successor Representation for Actions. University of Toronto.Pitis, S. (2018). Source Traces for Temporal Difference Learning. In AAAI Conference on Artificial Intelligence. New Orleans, Louisiana, USA.TransferBarreto, A., Dabney, W., Munos, R., Hunt, J., Schaul, T., Silver, D., &amp; van Hasselt, H. (2017). Transfer in Reinforcement Learning with Successor Features and Generalised Policy Improvement. In Lifelong Learning: A Reinforcement Learning Approach Workshop @ICML. Sydney, Australia. Barreto, A., Munos, R., Schaul, T., &amp; Silver, D. (2016). Successor Features for Transfer in Reinforcement Learning. arXiv: 1606.05312.Barreto, A., Borsa, D., Quan, J., Schaul, T., Silver, D., Hessel, M., … Munos, R. (2018). Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement. International Conference on Machine Learning (ICML).Zhang, J., Springenberg, J. T., Boedecker, J., &amp; Burgard, W. (2017). Deep Reinforcement Learning with Successor Features for Navigation across Similar Environments. In The International Conference on Intelligent Robots and Systems (IROS) (pp. 2371–2378). Vancouver, Canada.Zhu, Y., Gordon, D., Kolve, E., Fox, D., Fei-Fei, L., Gupta, A., … Farhadi, A. (2017). Visual Semantic Planning using Deep Successor Representations. International Conference on Computer Vision, 2(4), 7.Sherstan, C., Machado, M. C., &amp; Pilarski, P. M. (2018). Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation. IROS. Madrid, Spain.Kulkarni, T. D., Saeedi, A., Gautam, S., &amp; Gershman, S. J. (2016). Deep Successor Reinforcement Learning. arXiv: 1606.02396.Lehnert, L., Tellex, S., &amp; Littman, M. L. (2017). Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning. arXiv: 1708.00102Ma, C., Wen, J., &amp; Bengio, Y. (2018). Universal Successor Representations for Transfer Reinforcement Learning. arXiv: 1804.03758.ExplorationMachado, M. C., Rosenbaum, C., Guo, X., Liu, M., Tesauro, G., &amp; Campbell, M. (2018). Eigenoption Discovery Through The Deep Successor Representation. In International Conference on Learning Representations. Vancouver, Canada.NeuroscienceStachenfeld, K. L., Botvinick, M. M., &amp; Gershman, S. J. (2017). The Hippocampus as a Predictive Map. Nature Neuroscience, 20, 1643–1653. Stachenfeld, K. L., Botvinick, M. M., &amp; Gershman, S. J. (2014). Design Principles of the Hippocampal Cognitive Map. Advances in Neural Information Processing Systems, 1–9.Russek, E. M., Momennejad, I., Botvinick, M. M., Gershman, S. J., &amp; Daw, N. D. (2017). Predictive Representations Can Link Model-based Reinforcement Learning to Model - free Mechanisms. PLoS Computational Biology, 13(9), 1–42.Gershman, S. J., Moore, C. D., Todd, M. T., Norman, K. A., &amp; Sederberg, P. B. (2012). The Successor Representation and Temporal Context. Neural Computation, 24(6), 1553–1568. Banino, A., Barry, C., Uria, B., Blundell, C., Lillicrap, T., Mirowski, P., … Kumaran, D. (2018). Vector-based Navigation using Grid-like Representations in Artificial Agents. Nature, 26.Ducarouge, A., &amp; Sigaud, O. (2017). The Successor Representation as a Model of Behavioural Flexibility. ?Foster, D. J., Morris, R. G. M., &amp; Dayan, P. (2000). A Model of Hippocampally Dependent Navigation, Using the Temporal Difference Learning Rule. Hippocampus, 10(1), 1–16.Gershman, S. J. (2017). Predicting the past, remembering the future. Current Opinion in Behavioral Sciences, 17, 7–13.Momennejad, I., Russek, E. M., Cheong, J. H., Botvinick, M. M., Daw, N. D., &amp; Gershman, S. J. (2017). The Successor Representation in Human Reinforcement Learning. Nature Human Behaviour, 1(9), 680–692.OtherThere is also a handful of literature that does not necessarily directly make the connection to the SR but uses it nonetheless.Yao, H., Szepesvári, C., Sutton, R., Modayil, J., &amp; Bhatnagar, S. (2014). Universal Option Models. Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS), 1–9.There are a number of papers in the field of imitation learning that may be related:Apprenticeship Learning Using Linear Programming - Syed, Bowling &amp; Schapire, ICML &lsquo;08 Abbeel, P., Ng. A. Y. (2004). Apprenticeship Learning Via Inverse Reinforcement Learning. ICMLSyed, U. &amp; Schapire, R. A. (2007). Game-Theoretic Approach to Apprenticeship Learning. NIPS]]></summary></entry><entry><title type="html">Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation</title><link href="https://sherstan.com/accelerating-learning-in-constructive-predictive/" rel="alternate" type="text/html" title="Accelerating Learning in Constructive Predictive Frameworks with the Successor Representation" /><published>2018-03-27T23:23:09+00:00</published><updated>2018-03-27T23:23:09+00:00</updated><id>https://sherstan.com/accelerating-learning-in-constructive-predictive</id><content type="html" xml:base="https://sherstan.com/accelerating-learning-in-constructive-predictive/"><![CDATA[<a href="https://arxiv.org/abs/1803.09001">Accelerating Learning in Constructive Predictive Frameworks with the
  Successor Representation</a><br/><p>I’ve got a new paper with Marlos C. Machado and Patrick M. Pilarski on using the successor representation to speed up learning of incrementally added general value functions.</p>]]></content><author><name>Craig Sherstan</name></author><category term="successor representation" /><category term="ai" /><category term="general value functions" /><category term="continual learning" /><category term="Reinforcement Learning" /><summary type="html"><![CDATA[Accelerating Learning in Constructive Predictive Frameworks with the Successor RepresentationI’ve got a new paper with Marlos C. Machado and Patrick M. Pilarski on using the successor representation to speed up learning of incrementally added general value functions.]]></summary></entry><entry><title type="html">Directly Estimating the Variance of the lambda-Return Using Temporal-Difference Methods</title><link href="https://sherstan.com/directly-estimating-the-variance-of-the/" rel="alternate" type="text/html" title="Directly Estimating the Variance of the lambda-Return Using Temporal-Difference Methods" /><published>2018-01-26T06:49:31+00:00</published><updated>2018-01-26T06:49:31+00:00</updated><id>https://sherstan.com/directly-estimating-the-variance-of-the</id><content type="html" xml:base="https://sherstan.com/directly-estimating-the-variance-of-the/"><![CDATA[<a href="https://arxiv.org/abs/1801.08287">Directly Estimating the Variance of the lambda-Return Using
  Temporal-Difference Methods</a><br/><p>I’ve got a new paper on arXiv on using TD methods to directly estimate the variance of the lambda-return.</p>]]></content><author><name>Craig Sherstan</name></author><summary type="html"><![CDATA[Directly Estimating the Variance of the lambda-Return Using Temporal-Difference MethodsI’ve got a new paper on arXiv on using TD methods to directly estimate the variance of the lambda-return.]]></summary></entry></feed>