Cooperative Multi-Agent Reinforcement Learning Framework for Scalping Trading

Long-term investment in the stock market refers to buying stocks whose inherent value is higher than their current value on the stock market. This investment style requires company analysis based on reports released by the companies periodically. 

Stock market analysis. Image credit: Sergei Tokmakov via Pixabay, free licence

On the other hand, so-called scalping strategy is an opposite method. It is a trading style that specializes in profiting off of small price changes and making a fast profit off reselling. Scalping is a method prioritizing making high volumes of small gains in day trading. Machine learning studies are actively analyzing possibilities to apply algorithmic principles in this area too.

Uk Jo, Taehyun Jo, Wanjun Kim, Iljoo Yoon, Dongseok Lee, and Seungho Lee have discussed multi-agent reinforcement learning for scalping trading in their research paper. The research paper is titled “Cooperative Multi-Agent Reinforcement Learning Framework for Scalping Trading” and forms the basis of the following text. 

Importance of the Study

Traders do Intra-day trading based on the buy/sell orders and candle charts. Since this data is readily available, reinforcement learning could catch traders’ desires and habits to maximize their investment return. Machine learning can leverage this understanding of traders; a model that maximizes profit has to be developed for this purpose.

If a reinforcement learning agent can predict and execute buy/sell decisions with acceptable accuracy, substantial amounts of money can be made from the stock market. 

Research Methodology

For this research, the researchers used data from April to July 2018 in the Korean stock market. The proposed reinforcement learning agent comprises four sub-agents with specific roles and primary rewards associated with their function. Based on the performance (return) of the whole agent, a secondary reward is also added to the overall reward function. Four sub-agents were introduced:

  • Buy Signal Agent (BSA): BSA predicts when the stock is expected to rise steadily for 2 minutes. 
  • Buy Order Agent (BOA): At this time, the agent will purchase the stocks at the lowest possible price. 
  • Sell Signal Agent (SSA): The SSA predicts when the stock is expected to fall for 2 minutes. 
  • Sell Order Agent (SOA): The SOA predicts when the agent could sell the stock at the maximum price.  

Image credit: arXiv:1904.00441 [cs.AI]

In the words of the researchers,

We explore deep Reinforcement Learning (RL) algorithms for scalping trading and knew that there is no appropriate trading gym and agent examples. Thus we propose gym and agent like Open AI gym in finance. Not only that, we introduce new RL framework based on our hybrid algorithm which leverages between supervised learning and RL algorithm and uses meaningful observations such order book and settlement data from experience watching scalpers trading. That is very crucial information for traders behavior to be decided. To feed these data into our model, we use spatio-temporal convolution layer, called Conv3D for order book data and temporal CNN, called Conv1D for settlement data. Those are preprocessed by episode filter we developed. Agent consists of four sub agents divided to clarify their own goal to make best decision. Also, we adopted value and policy based algorithm to our framework. With these features, we could make agent mimic scalpers as much as possible. In many fields, RL algorithm has already begun to transcend human capabilities in many domains. This approach could be a starting point to beat human in the financial stock market, too and be a good reference for anyone who wants to design RL algorithm in real world domain. Finally, we experiment our framework and gave you experiment progress.

Research Result

Reward showed a rise over newer episodes which is very encouraging for the use of reinforcement learning in scalping methodology.

Image credit: arXiv:1904.00441 [cs.AI]


Although the reward increased over time, the researcher has noted that the transaction cost (tax + fee) in Korean Stock Market is higher than in the Chinese and Hong Kong markets. Hence, the results of Scalping would be better in Chinese and Hongkong markets. The researchers have also suggested expanding the proposed model from short-term recommendation robot advisor service to long-term investment robot advisor service.

Source: Uk Jo, Taehyun Jo, Wanjun Kim, Iljoo Yoon, Dongseok Lee and Seungho Lee’s “Cooperative Multi-Agent Reinforcement Learning Framework for Scalping Trading”