White Paper

Store Inventory Management Optimization



COBENA Business Analytics and Strategy, Inc.

In this brief, we discuss a two-step process for estimating substitution rates of each item in a product set using point if sale data as input.  The first step produces an estimate of inventory status of each product based on zero-sale interval lengths. In the second step, the problem is formulated as a Bayesian Network which allows the estimation of substitution rates of each product using Variational Inference. Monte Carlo simulation is used to produce sample POS and inventory data. Accuracy of 98.5% is achieved in each step of the process. Using simulation method, the financial impact of modifying product assortment in consideration of the estimated substitution rates is demonstrated.


Most studies on estimating substitution rates use inventory status data in addition to point of sale (POS) data[1][4][6][8]. However, many retailers do not track inventory at a sufficient granularity to complement Point of Sale data. Furthermore, there may be changes in inventory not reflected in POS data such as perishing goods and losses due to pilferage. It is therefore often necessary to estimate substitution rates using only POS data.  Karabati et al. addressed this problem using a Quadratically Constrained Quadratic Programming (QCQP) approach[3]. This approach however is not scalable since the number of decision variables and constraints of the proposed model grow exponentially with the number of products and time intervals.

Probabilistic graphical models (PGM) scale better given a large input. We formulated the problem using a Bayesian Network model, a particular kind of PGM. This formulation provides several advantages over a QCQP formulation:

  • ● Highly scalable algorithms such as Stochastic Variational Inference[2] can be used for inference.
  • ● Uncertainty in substitution rate estimates can be measured thru confidence intervals.
  • ● Domain specific knowledge on substitution rates between particular products or product categories can be injected in prior distributions.
  • ● Bayesian networks can be implemented using Tensorflow, allowing GPU accelerated inference.


  1. 1. The Bayesian Network formulation allows estimation of substitution rates and provides metrics to assess accuracy and confidence.
  1. 2. Using simulated data, we demonstrate the possibility of increasing total revenue or minimizing losses of assortment and capacity changes.


SUBSTITUTION RATE is the probability that a customer who intends to buy product 1 would buy product 2 if product 1 is not available.

POSTERIOR DISTRIBUTION is a statistical distribution from which one can derive information regarding a parameter being estimated.

MONTE CARLO SIMULATION is a method of simulating a scenario by producing random numbers from specified distributions and resolving their interactions.


The Sirius system, developed in python using Tensorflow, is an implementation of the two-step method of estimating substitution rates. As a proof of concept, we illustrate the system using a simulated point of sales (POS) data involving a product set with 5 items named a, b, c, d, and e.

Step1: Estimating Inventory Status

Our approach divides the time horizon into short time intervals. During each time interval, the inventory status of a product indicates whether or not it is available. The actual available inventory count is not needed. To estimate inventory status, we borrow the approach of Karabati et al. (2009). In each time interval, we examine if any sales of the product occurred.

The procedure examines consecutive no-sales time intervals. Based on the observed demand rate of the product in time intervals when it was surely in-stock (i.e. sales were observed), we compute the probability that a series of no-sales time intervals of the observed length would occur. If this probability is lower than a certain threshold value, we decide that the product was out-of-stock during this series of time intervals.

Step 2: Inference of Demand and Substitution Rates

During each time interval, the expected sales (Y) of a product can be expressed as the sum of demand of customers (D) who intended to buy the product plus substitutions (S) from customers who intended to buy other products which were out-of-stock.  Hence Y = D + S if the product is available; otherwise Y = 0.

We formulate this equation as a Bayesian network which estimates the demand rate of each product, denoted as , and the substitution rates between products, denoted as . Given estimated inventory status for each product from step 1, our model observes increases in sales in product j whenever product i is out-of-stock for every ordered pair of products (i, j).

Figure 1 – Illustration of the simulation model to generate POS data.

Retailer Simulation

Using Monte Carlo simulation, we generated POS data for a retailer. [a1] Figure 1 illustrates our simulation model.  To manage the discussion, we generated data for five hypothetical products, namely a, b, c, d, and e.

We provide hypothetical simulation parameters for each product (see Appendix 1) and substitution rates between products (see Appendix 2). We simulate 200,000 time intervals to produce sample POS data and inventory data. The simulated inventory data is used to test the accuracy and consistency of estimating inventory status. Table 1 shows the structure of a sample inventory status data.

Table 1 - Sample Inventory Status Data

Estimating Inventory Status     

Using the approach discussed in step 1, we estimate inventory status of each product. As recommended by Karabati et al. (2008), we use a probability threshold of 0.0001.  We achieve an overall accuracy of 98.5% (From Table 2, we see that 19.1% are true positives while 79.4% are true negatives). We also compute the accuracy as well as type 1 and 2 errors for each product (see Appendix 3).

Table 2 - Overall accuracy of Inventory status estimation

Inference of Demand and Substitution Rates

We implemented step 2 using Tensorflow on Python 3, which allows us to accelerate our computations using GPUs if necessary. We applied Variational Inference to infer the substitution and demand rates. The simulated POS data and the estimated inventory status served as input data. In contrast to other (non-PGM) approaches which produce only point estimates, Variational inference produces probability distributions for each the substitution and demand rates. This provides more information such as the level of uncertainty in estimating the rates.

We sample the posterior distributions for the demand rate of each product () and the substitution rates between each pair of products (). As depicted in Figures 3 and 4, we represent each distribution as a histogram with an orange axial line denoting the mean value, a green axial line denoting the actual value and gray axial lines denoting the 95% confidence limits.

Figure 2 - Posterior Distributions of Demand Rates

We find that majority of the posterior distributions, both for demand and substitution rates, obtained estimated values close to the actual values. Majority of the distributions also have a narrow spread, indicating certainty in the estimated values. We take the mean from the posterior distribution of each variable as our estimated for demand rates (see Appendix 4) and substitution rates (see Appendix 6).

Figure 3 - Posterior Distributions of Substitution Rates

Additionally, we also demonstrate that the posterior distributions can be used to generate confidence intervals for demand rates (Appendix 5) and substitution rates (Appendix 7).

Assessment of Assortment Policy

Figure 4 demonstrates the effect of product substitution in a modified simulation where there’s increased stock-outs of product d.  Observe that the sales of product e increases when product d is out-of-stock, demonstrating the effect of product substitution. Simulating this behavior allows us to estimate the monetary impact of assortment policy changes.

Figure 4 - Demonstration of Effect of Product Substitution

We demonstrate that the estimated substitution rates can be used to simulate the effects of dropping specific products from the retailer’s assortment. We assign per-unit revenues (see Appendix 1) for each product prior to simulation. In our base case simulation with complete product assortment, we obtain output values as summarized in Table 3.

Table 3 - Outputs of Base Case Simulation

We re-run our original simulation with the same time horizon, but this time dropping product b. That is, customers for product b continue to arrive at the same rate but product b is always unavailable. The capacity originally allocated for product b is not reallocated. Table 4 shows revenue values when product b was dropped without reallocating capacity.

Table 4 - Outputs of Case 1: Drop Product b without Reallocating Capacity

Observe that, while product b accounted for 60,234 of revenue in the original simulation (see Table 2), our model estimates that the retailer only loses 37,125 in total revenue. This is due to the increase in sales of other products, particularly product a, which is the main substitute for product b.

It is more likely that the retailer would reallocate the capacity previously used by product b. We re-run the same simulation, this time reallocating the capacity solely to product a. We find that the retailer only loses 251 in total revenue, producing practically equal total revenue to our base case. This is because Product a is able to cope with additional demand from substitutions by increasing availability.  Table 5 provides a summary of the results of the simulation.

Table 5 - Outputs of Case 2: Drop Product b and Reallocate Capacity to Product a

It is also possible that other policy sets may produce improved total revenue. We run another simulation reallocating the capacity equally among remaining products.  The results are summarized in Table 6.

Table 6 - Outputs of Case 3: Drop Product b and Reallocate Capacity Equally among Remaining Products

We find that total revenue increased by 4,205, which demonstrates the possibility of improving total revenue. This is due both to substitution and improved availability of remaining product. Note that this outcome is specific to this scenario; distributing capacity equally may not always be the optimal policy. The optimal policy is highly dependent on input parameters: demand and substitution rates, unit revenues, ordering costs, unit capacity consumptions, and other constraints.

Our simulations demonstrate the effect of product substitution and the importance of estimating substitution rates. These rates should be considered when modifying assortment policies in order to accurately assess the impact of any changes.


  • [1] Anupindi, R., Dada, M., & Gupta, S. (1998). Estimation of consumer demand with stock-out based substitution: An application to vending machine products. Marketing Science, 17(4): 406-423.
  • [2] Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic Variational Inference. Journal of Machine Learning Research,14:1303-1347.
  • [3] Karabati, S., Tan, B., & Öztürk, Ö C. (2009). A method for estimating stock-out-based substitution rates by using point-of-sale data. IIE Transactions, 41(5): 408-420. doi:10.1080/07408170802512578
  • [4] Kök, A. G., & Fisher, M. L. (2007). Demand estimation and assortment optimization under substitution: Methodology and application. Operations Research, 55(6):1001-1021.
  • [5] Letham, B., Letham, L. M., & Rudin, C. (2016, August). Bayesian inference of arrival rate and substitution behavior from sales transaction data with stockouts. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1695-1704).
  • [6] Musalem, A., Olivares, M., Bradlow, E. T., Terwiesch, C., & Corsten, D. (2010). Structural estimation of the effect of out-of-stocks. Management Science, 56(7): 1180-1197
  • [7] Tran, D., Hoffman, M. D., Saurous, R. A., Brevdo, E., Murphy, K., & Blei, D. M. (2017). Deep probabilistic programming. arXiv preprint arXiv:1701.03757.
  • [8] Vulcano, G., Van Ryzin, G., & Ratliff, R. (2012). Estimating primary demand for substitutable products from sales transaction data. Operations Research, 60(2): 313-334.


Simulation Product Parameters

Simulated product substitution rates 

Accuracy Measures for Inventory Estimation

  1. Comparison of Estimated and Simulated Demand Rates


    This article has been downloaded 11 time(s).

    Related Articles

    Connect with Our Industry Experts