I find that a better perspective to reinforcement learning is to treat it as an experiment. For instance, the Mujoco physical environment doesn’t have a detailed descriptions to go with it. I need to figure out exactly what does every observation mean. The clues are the gym file and mujoco xml file.

As there are readily available agents such as tianshou and tensorforce. One can just play with those agents and learn from the example files.

Maybe instead of focusing on the overarching theory, one can treat reinforcement learning as an experiment science.

Paper Behind Mujoco

  1. Hopper env.observation_space.shape = (11,) env.action_space.shape=(3,)

x_position=self.sim.data.gpos[0] according to the xml file, there are three joints. Thigh, leg, and foot. There are three actuator as well, each within the range(-1,1) at the three joints. The observations are x_position, xyz_coordinate_of_thigh_joint, xyz_coordinate_of_leg_joint, xyz_coordinate_of_foot_joint,velocity)

z,angle=self.sim.data.qpos[1:3] healthy_state_range(100,100) healthy_z_range(0.7,float(‘inf’)) healthy_angle_range(-0.2,0.2)

  1. Walker