ESTIMATING PRODUCT SUBSTITUTION RATES USING POINT OF SALE DATA COBENA Business Analytics and Strategy, Inc.
Sirius The Sirius system, developed in python using Tensorflow, is an implementation of the two-step method of estimating substitution rates. As a proof of concept, we illustrate the system using a simulated point of sales (POS) data involving a product set with 5 items named a, b, c, d, and e. Step1: Estimating Inventory Status Our approach divides the time horizon into short time intervals. During each time interval, the inventory status of a product indicates whether or not it is available. The actual available inventory count is not needed. To estimate inventory status, we borrow the approach of Karabati et al. (2009). In each time interval, we examine if any sales of the product occurred. The procedure examines consecutive no-sales time intervals. Based on the observed demand rate of the product in time intervals when it was surely in-stock (i.e. sales were observed), we compute the probability that a series of no-sales time intervals of the observed length would occur. If this probability is lower than a certain threshold value, we decide that the product was out-of-stock during this series of time intervals. Step 2: Inference of Demand and Substitution RatesDuring each time interval, the expected sales (Y) of a product can be expressed as the sum of demand of customers (D) who intended to buy the product plus substitutions (S) from customers who intended to buy other products which were out-of-stock. Hence Y = D + S if the product is available; otherwise Y = 0. We formulate this equation as a Bayesian network which estimates the demand rate of each product, denoted as , and the substitution rates between products, denoted as . Given estimated inventory status for each product from step 1, our model observes increases in sales in product j whenever product i is out-of-stock for every ordered pair of products (i, j). Figure 1 – Illustration of the simulation model to generate POS data. Retailer Simulation Using Monte Carlo simulation, we generated POS data for a retailer. [a1] Figure 1 illustrates our simulation model. To manage the discussion, we generated data for five hypothetical products, namely a, b, c, d, and e. We provide hypothetical simulation parameters for each product (see Appendix 1) and substitution rates between products (see Appendix 2). We simulate 200,000 time intervals to produce sample POS data and inventory data. The simulated inventory data is used to test the accuracy and consistency of estimating inventory status. Table 1 shows the structure of a sample inventory status data. |
Table 1 - Sample Inventory Status Data
Estimating Inventory Status
Using the approach discussed in step 1, we estimate inventory status of each product. As recommended by Karabati et al. (2008), we use a probability threshold of 0.0001. We achieve an overall accuracy of 98.5% (From Table 2, we see that 19.1% are true positives while 79.4% are true negatives). We also compute the accuracy as well as type 1 and 2 errors for each product (see Appendix 3).
Table 2 - Overall accuracy of Inventory status estimation
Inference of Demand and Substitution Rates
We implemented step 2 using Tensorflow on Python 3, which allows us to accelerate our computations using GPUs if necessary. We applied Variational Inference to infer the substitution and demand rates. The simulated POS data and the estimated inventory status served as input data. In contrast to other (non-PGM) approaches which produce only point estimates, Variational inference produces probability distributions for each the substitution and demand rates. This provides more information such as the level of uncertainty in estimating the rates.
We sample the posterior distributions for the demand rate of each product () and the substitution rates between each pair of products (). As depicted in Figures 3 and 4, we represent each distribution as a histogram with an orange axial line denoting the mean value, a green axial line denoting the actual value and gray axial lines denoting the 95% confidence limits.
Figure 2 - Posterior Distributions of Demand Rates
We find that majority of the posterior distributions, both for demand and substitution rates, obtained estimated values close to the actual values. Majority of the distributions also have a narrow spread, indicating certainty in the estimated values. We take the mean from the posterior distribution of each variable as our estimated for demand rates (see Appendix 4) and substitution rates (see Appendix 6).
Figure 3 - Posterior Distributions of Substitution Rates
Additionally, we also demonstrate that the posterior distributions can be used to generate confidence intervals for demand rates (Appendix 5) and substitution rates (Appendix 7).
Assessment of Assortment Policy
Figure 4 demonstrates the effect of product substitution in a modified simulation where there’s increased stock-outs of product d. Observe that the sales of product e increases when product d is out-of-stock, demonstrating the effect of product substitution. Simulating this behavior allows us to estimate the monetary impact of assortment policy changes.
Figure 4 - Demonstration of Effect of Product Substitution
We demonstrate that the estimated substitution rates can be used to simulate the effects of dropping specific products from the retailer’s assortment. We assign per-unit revenues (see Appendix 1) for each product prior to simulation. In our base case simulation with complete product assortment, we obtain output values as summarized in Table 3.
Table 3 - Outputs of Base Case Simulation
We re-run our original simulation with the same time horizon, but this time dropping product b. That is, customers for product b continue to arrive at the same rate but product b is always unavailable. The capacity originally allocated for product b is not reallocated. Table 4 shows revenue values when product b was dropped without reallocating capacity.
Table 4 - Outputs of Case 1: Drop Product b without Reallocating Capacity
Observe that, while product b accounted for 60,234 of revenue in the original simulation (see Table 2), our model estimates that the retailer only loses 37,125 in total revenue. This is due to the increase in sales of other products, particularly product a, which is the main substitute for product b.
It is more likely that the retailer would reallocate the capacity previously used by product b. We re-run the same simulation, this time reallocating the capacity solely to product a. We find that the retailer only loses 251 in total revenue, producing practically equal total revenue to our base case. This is because Product a is able to cope with additional demand from substitutions by increasing availability. Table 5 provides a summary of the results of the simulation.
Table 5 - Outputs of Case 2: Drop Product b and Reallocate Capacity to Product a
It is also possible that other policy sets may produce improved total revenue. We run another simulation reallocating the capacity equally among remaining products. The results are summarized in Table 6.
Table 6 - Outputs of Case 3: Drop Product b and Reallocate Capacity Equally among Remaining Products
We find that total revenue increased by 4,205, which demonstrates the possibility of improving total revenue. This is due both to substitution and improved availability of remaining product. Note that this outcome is specific to this scenario; distributing capacity equally may not always be the optimal policy. The optimal policy is highly dependent on input parameters: demand and substitution rates, unit revenues, ordering costs, unit capacity consumptions, and other constraints.
Our simulations demonstrate the effect of product substitution and the importance of estimating substitution rates. These rates should be considered when modifying assortment policies in order to accurately assess the impact of any changes.
References
- [1] Anupindi, R., Dada, M., & Gupta, S. (1998). Estimation of consumer demand with stock-out based substitution: An application to vending machine products. Marketing Science, 17(4): 406-423.
- [2] Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic Variational Inference. Journal of Machine Learning Research,14:1303-1347.
- [3] Karabati, S., Tan, B., & Öztürk, Ö C. (2009). A method for estimating stock-out-based substitution rates by using point-of-sale data. IIE Transactions, 41(5): 408-420. doi:10.1080/07408170802512578
- [4] Kök, A. G., & Fisher, M. L. (2007). Demand estimation and assortment optimization under substitution: Methodology and application. Operations Research, 55(6):1001-1021.
- [5] Letham, B., Letham, L. M., & Rudin, C. (2016, August). Bayesian inference of arrival rate and substitution behavior from sales transaction data with stockouts. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1695-1704).
- [6] Musalem, A., Olivares, M., Bradlow, E. T., Terwiesch, C., & Corsten, D. (2010). Structural estimation of the effect of out-of-stocks. Management Science, 56(7): 1180-1197
- [7] Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K., & Blei, D. M. (2017). Deep probabilistic programming. arXiv preprint arXiv:1701.03757.
- [8] Vulcano, G., Van Ryzin, G., & Ratliff, R. (2012). Estimating primary demand for substitutable products from sales transaction data. Operations Research, 60(2): 313-334.
Appendix
Simulation Product Parameters
Simulated product substitution rates
Accuracy Measures for Inventory Estimation
- Comparison of Estimated and Simulated Demand Rates