Early Detection of User Exits from Clickstream Data: A Markov Modulated Marked Point Process Model

Most users leave e-commerce websites with no purchase. Hence, it is important for website owners to detect users at risk of exiting and intervene early (e. g., adapting website content or offering price promotions). Prior approaches make widespread use of clickstream data; however, state-of-the-art algorithms only model the sequence of web pages visited and not the time spent on them. In this paper, we develop a novel Markov modulated marked point process (M3PP) model for detecting users at risk of exiting with no purchase from clickstream data. It accommodates clickstream data in a holistic manner: our proposed M3PP models both the sequence of pages visited and the temporal dynamics between them, i. e., the time spent on pages. This is achieved by a continuous-time marked point process. Different from previous Markovian clickstream models, our M3PP is the first model in which the continuous nature of time is considered. The marked point process is modulated by a continuous-time Markov process in order to account for different latent shopping phases. As a secondary contribution, we suggest a risk assessment framework. Rather than predicting future page visits, we compute a user’s risk of exiting with no purchase. For this purpose, we build upon sequential hypothesis testing in order to suggest a risk score for user exits. Our computational experiments draw upon real-world clickstream data provided by a large online retailer. Based on this, we find that state-of-the-art algorithms are consistently outperformed by our M3PP model in terms of both AUROC (+ 6.24 percentage points) and so-called time of early warning (+ 12.93 %). Accordingly, our M3PP model allows for timely detections of user exits and thus provides sufficient time for e-commerce website owners to trigger dynamic online interventions.


INTRODUCTION
In 2018, over 97 % of global e-commerce website users exited the website with no purchase [38]. Because of this, there exists considerable potential for online retailers to increase revenue by improving their conversion rate. Accordingly, online retailers demand solutions that help targeting users with online interventions in order to prevent them from exiting.
Detecting exiting users from clickstream data is a challenging undertaking. First, clickstreams are highly variable and, therefore, difficult to predict [36]. Second, the objective in practice is not only to detect users just before their exits but to detect users at risk of exiting at an early stage of their session. This is important in order to provide online retailer with sufficient time for interventions. Such interventions can have various forms, for instance, presenting marketing content, dynamically adapting website design, or offering price promotions [14,24], with the goal of steering users that otherwise would have exited with no purchase towards conversion (e. g., turning a user into a buyer).
The effectiveness of interventions depends on the accuracy with which users at risk of exiting with no purchase are identified [11]. In order to make such predictions, different modeling approaches have been developed. Markov models are widely applied to predicting future pages in clickstream data [22,34]. Hidden Markov models have become popular as they further accommodate latent shopping phases (e. g., "browsing-oriented" phase vs. "deliberation-oriented" phase), based on which interventions are suggested [11,28]. However, the aforementioned approaches are discrete-time models: they assume that a clickstream process evolves in unit time-steps. As such, they are unable to model the time spent on a page (TSP), and thus an important source of heterogeneity in clickstream data is missed.
As our primary contribution, we develop a novel Markov modulated marked point process (M3PP) model with the objective of detecting users that exit with no purchase from clickstream data. Our model addresses the above mentioned research gap: it models both the sequence of pages visited and the TSP. Thereby, our model adheres to prior literature suggesting that TSP is highly relevant for predicting future user actions [9,15,18]. Accommodating the TSP that is continuous in time requires a continuous-time process. For this reason, our model builds upon a continuous-time marked point process: here the marks refer to the visited pages and the points to the TSP. The marked point process is modulated by a Markov jump process, which is a continuous-time extension of the discrete-time Markov chain. This allows the user's clickstream to be driven by the evolution of the user's latent shopping phases, which has been theoretically and empirically verified [26,27,29,31].
As our secondary contribution, we suggest a new risk assessment framework: it quantifies a user's risk of exiting with no purchase based on sequential hypothesis testing [40]. This is in contrast to previous work which focuses on predicting the next m pages [36]. However, this has limitations: there does not exist a particular exit page, user exits can occur on any page. Hence, predicting the next m pages might miss user exits. In addition to common performance metrics, we further evaluate the model using the time of early warning, which focuses on how early a user at risk of exiting with no purchase is detected. Early detection is crucial in order to provide enough time for steering users appropriately.
Results: We conducted computational experiments based on a real-world clickstream dataset collected during 2019. The dataset was provided by Digitec Galaxus, the largest online retailer in Switzerland offering more than a million products. The results show that our M3PP model consistently outperforms state-of-the-art algorithms. Our M3PP model detects exiting users more accurately and earlier, that is, it improves the AUROC (+6.24 percentage points) and the so-called time of early warning (+12.93 %). On the basis of this, e-commerce website owners can target users at the risk of exiting with no purchase at an earlier stage of their sessions in order to convert them into buyers.
Contributions: In summary, our main contributions are the following: (1) M3PP model: We propose a new M3PP 1 model that models both the sequence of individual pages visited and the time spent on pages. Formally, this task requires a tailored model: instead of assuming that the clickstream is a discrete-time process with unit time-steps, our setting demands a continuous-time process. Accordingly, our model captures a user's complete clickstream. Hence, when detecting users at risk of exiting with no purchase, it achieves superior performance. (2) Risk assessment framework: We suggest a new risk assessment framework for detecting users at risk of exiting with no purchase. We provide a risk score that computes a user's probability of eventually exiting with no purchase in real-time. Based on our framework, the model performance is evaluated by standard metrics (ROC) and a metric called time of early warning, which focuses on early detection of users at risk. Early detection is crucial in order to provide sufficient time for triggering successful interventions. Note that our framework can be built on top of any clickstream model and thus ensures widespread applicability. 1 Code is available at https://github.com/tobhatt/M3PP.

RELATED WORK
Several approaches have been proposed for modeling clickstream data [cf. 25, for an overview]. Particularly popular have been approaches designed for handling sequential data: deep learning and Markov models. The former builds upon, e. g., recurrent neural networks [e. g., 21,35,37,39] and serves as one of our baselines. The latter, Markov models, present the focus of this paper and are reviewed.
Markov models: Most of the previous work on modeling clickstream data describes the sequence of pages visited using a discretetime Markov chain. These models are applied to capture users' clickstreams where each state of the model corresponds to a page type (e. g., home, account, checkout) [4,13,23,34,36,42]. Based on this, a user's next m visited pages are predicted. However, the family of discrete-time Markov chains assume that the process evolves in unit time-steps. Hence, the heterogeneity of the time spent on page is ignored.
Relational Markov models are an extension of Markov models where states can be of different types [3]. Higher-order Markov chains relax the memoryless property of Markov chains and can improve predictive performance on clickstream data [5,10,22]. Variable-length Markov chains extend higher-order Markov chains by allowing variable-length history [6]. Although higher-order Markov chains and variable-length Markov chains are usually more accurate for predicting users' clickstreams, they come at computational costs due to their exponentially large state space. Moreover, the increase in the number of states may even result in worse predictive performance and can substantially restrict their usefulness for applications requiring fast predictions, such as inferring a user's risk of exiting with no purchase in real time [10].
Hidden Markov models: Hidden Markov models (HMMs) describe the user's latent shopping phases using a discrete-time Markov chain with the pages visited being the emissions depended on the current shopping phase [11,28]. HMMs build upon research according to which users' sessions undergo different latent shopping phases. These shopping phases represent a user's current goals or state of mind, e. g.,goal-directed" and "exploratory" search [20,26], "flow" and "non-flow" phases [17], or "browsing-oriented" and "deliberation-oriented" phases [28]. The predictive performance of HMMs suggests that the clickstream may reflect a user's goals, which could be helpful to predict future movements on a website. However, they fail to holistically model both the sequence of pages visited and the TSP.
Marked point processes: In addition to Markov models, we review marked point processes, which have become popular due to its ability to capture both the timing of events and the type of the event. Marked point processes were successfully used to model check-in data [12,32] and patients' clinical states in a medical context [1,2,19]. However, they have not been tailored to clickstream data yet.
Research gap: The family of discrete-time Markov chains cannot capture both the sequence of pages visited and the times spent on page, since it assume that the process evolves by unit time-steps. Hence, this requires a family of models that can deal with processes that evolve in continuous-time. In addition, previous work has not addressed early detection of user at risk of exiting with no purchase.
Usually, websites do not have pages dedicated to exiting; user exits can happen on any page. Hence, predicting a user's next m pages does not allow for predicting a user exit.

CLICKSTREAM DATA
In this section, we describe the typical structure of clickstream data. A clickstream dataset comprises a set of sessions; each session is a sequence of pages that have been visited by a user together with a timestamp when it was visited. Note that sessions can have variable lengths (i. e., number of pages visited).
We denote a clickstream dataset D that comprises D sessions with p d m ∈ P being the m-th page visited at time t d m , and P is the set of available pages on the website. The total number of pages observed during this session is denoted by M d and the duration of the session is denoted by T d . The time between two successive page visits is the TSP, i. e., t m − t m−1 . The session starts when the first page is visited and is concluded by the outcome of the session. The outcome of the session is either a purchase or an exit with no purchase, which we aim to predict.
Usually, only the sequence of pages visited is considered, i. e., In Fig. 1, we illustrate an example of a user's sequences of pages visited. We can see from the figure that any temporal information of when the user visited a page and the TSP is lost since the time when a page is visited is discarded. Most existing modeling approaches have not been tailored to adequately capture such information. To do so, we included the irregularly spaced time instances at which the pages are visited, i. e., {t d m } M d m=1 . However, this requires to move from the perspective that time evolves in unit time-steps as in Fig. 1 to a perspective that time evolves continuously. In Fig. 2, we depict the continuous-time perspective of a user's session. This perspective enables us to include the irregularly spaced time instances at which pages are visited. We can see from the figure that the user spends more time on some pages than on others.
However, adopting a continuous-time perspective is not possible in previous work, since this requires a continuous-time model. As a remedy, we introduce our M3PP model in the following.

THE PROPOSED MARKOV-MODULATED MARKED POINT PROCESS MODEL
Now we present our Markov modulated marked point process, M3PP, for detecting users at risk of exiting with no purchase based on clickstream data {S d } D d =1 . The model has two components: (i) a Markov jump process modeling the latent shopping phases, and (ii) a marked point process modeling the clickstream, both of which are specified in the following.
For ease of reading, we omit the superscript d in the subsequent sections.

Modeling the Latent Shopping Phases
We model each user's session as being driven by an underlying latent process that represents different latent shopping phases over time. These latent shopping phases can not be observed directly;  however, their presence has been validated empirically [26,27,29,31] and, particularly, manifests in clickstream data [28]. We use a Markov jump process X (t) to model a user's latent shopping phases. A Markov jump process 2 is a continuous-time extensions of a discrete-time Markov chain. Its realization is a piecewise-constant function transitioning between N phases. Formally, a realization of a Markov jump process is given by for all t ∈ [0,T ], where every new phase X n belongs to a finite space X = {1, 2, . . . , N } and lasts from a jump time τ n until another jump time τ n+1 . During a session, a total number of K phases are realized, which is a random variable itself. Note that t = 0 corresponds to the beginning of a session, whereas T corresponds to the end of the session (i. e., the time when the user either purchases or exits with no purchase). For the phase X n , we define the waiting time ∆τ n ≜ τ n+1 − τ n , with τ 1 = 0. The waiting time ∆τ n is drawn from an exponential distribution, i. e., whose rate parameter, γ i , depends on the current phase X n = i for i ∈ X.
The transitions among phases are governed by a Markov transi- where self-transitions are eliminated, i. e., a ii = 0, for all i ∈ X, since these are captured by the waiting times. The initial phase distribution is given by π = {π i } N i=1 , where π i = P(X (0) = i), and describes the probability of starting a session in phase i ∈ X. Fig. 3 depicts an exemplary realization of a Markov jump process. A user's latent shopping phase X (t) manifests in a user's clickstream in two ways: (i) It modulates the distribution of the TSP, for instance, users in a "browsing-oriented" phase might spend more time on pages than in another phase. (ii) It modulates the distribution of the sequence of pages visited, for instance, users in a "deliberation-oriented" phase might be more likely to visit product pages than in another phase. We capture these two effects via a marked point process in the next section.

Modeling the Clickstream
We model a user's clickstream using a marked point process and allow the parameters of the marked point process to depend on the latent shopping phase X (t).
A marked point process is a continuous-time process where the points are the time instances when pages are visited and the marks are the pages themselves, i. e., , the density f of a marked point process is given by where the history H t m−1 denotes the clickstream up to time t m−1 .
Depending on the application, one can design a variety of forms due to its simplicity; joint models can be adapted in a straightforward manner. In the following sections, we derive parameterizations for (i) f (t m | H t m−1 ), i. e., the conditional density of time instances when pages are visited, and (ii) f (p m | H t m−1 ), i. e., the conditional density of of the pages visited.
where λ * (t) = λ(t | H t − ) is the intensity function conditioned on the history up to, but not including time t. It describes the probability of the occurrence of a new page within a small window [t, t + dt) [33]. Note that in Eq. 5, modeling the time when the next page is visited, t m , also determines the time between two successive page visits, t m − t m−1 , which is precisely the TSP. Formally, a point process is specified by the conditional intensity function λ * (t), which is given by The functional forms of the conditional intensity function λ * (t) are often designed to describe the phenomena of interest [8]. In particular, we model {t m } M m=1 as a one-dimensional Hawkes process [16] whose conditional intensity function is modulated by the latent shopping phase X (t) and defined as for all i ∈ X, where µ i ≥ 0 is the phase-dependent baseline intensity, τ < t is the time of the most recent jump in the latent shopping phase process X (t), and ϕ i (t, t m ) ≥ 0 is the kernel capturing temporal dynamics between pages. The sum over the kernel terms makes the conditional intensity dependent on the history from the most recent jump in the latent shopping phase process. We choose an exponential kernel ϕ i (t, t m ) = e −β i (t −t m ) such that the temporal dependence between pages decays over time. For β i = ∞ or α i = 0, we recover a modulated Poisson process as a special case [32]. A distinctive feature of the Hawkes process is that the occurrence of each page visit increases the conditional intensity by a certain amount and, hence, the probability of another page visit. In Fig. 4, we illustrate the conditional intensity function of a Hawkes process. We can see that the close succession of page visits increases the conditional intensity function and, therefore, the probability of another page visit. We choose a Hawkes process due to this "exciting effect", since the display of information (e. g., advertisement) is likely to stimulate subsequent page visits [41]. In other words, prior exposure to a page can positively affect the probability of subsequent page visits. In the light of this argument, the modulated Hawkes process appears to be a suitable model for {t m } M m=1 as it captures both the time-varying intensity and the "exciting effects" among page visits.

Mark
Process for the Sequence of Pages. We capture the conditional density f (p m | H t m−1 ) in the marked point process by a mark process, which describes the sequence of pages visited by a user. We build upon previous work on clickstream modeling that extensively studied suitable models for the sequence of pages [e. g., 36]. We describe the probability of choosing the next page depending on the current page, i. e., In particular, the sequence of pages visited, {p m } M m=1 , is modeled using a discrete-time Markov chain given by where Q i = {q i jk } jk ∈ R | P |× | P | describes the probability of visiting the k-th page conditioned on being on the j-th page and the latent shopping phase X (t) = i. In addition, the probability of the first page is given by the initial . We call the above discrete-time Markov chain a switching Markov chain due to its dependence on the current latent shopping phase X (t) = i and in order to clearly distinguish it from the Markov jump process that describes the latent shopping phases.  Our M3PP model is then given by the combination of the point process and the mark process, both modulated by the latent shopping phase process X (t). It describes a user's clickstream in a holistic manner. Fig. 6 presents the entire model for a user's session, where both the conditional intensity and the distribution of the sequence of pages visited depend on the latent shopping phase.

Estimating the Model Parameters
In this section, we describe the estimation procedure for the M3PP model parameters using the clickstream data {S d } D d =1 . In particular, we state the likelihood of the complete M3PP model which can subsequently be used for any likelihood-based estimation procedure. We provide the code of our model on https://github.com/tobhatt/M3PP. For every latent phase i ∈ X, we denote the Hawkes process parameters as Λ i = (µ i , α i , β i ) and the switching Markov chain parameters as M i = (o i , Q i ). The set of model parameters is given by where A is the Markov transition matrix of the Markov jump process X (t) and {π i } N i=1 its initial phase distribution. Estimating the model parameters is a daunting task due to the user's shopping phase X (t) being latent. Usually, the jump times among the latent phases, {τ d n } K +1 n=1 , cannot be observed, since transitions can take place at any point in time.
However, in our specific setting, the user itself makes the decision when to visit another page. Hence, we assume that changes in the latent shopping phases are associated with the user taking direct action and visiting another page. Therefore, the times when the user visits another page, i. e., Using the above, the likelihood of our model is given by are the forward probabilities. These are recursively given for our M3PP model by where λ * i (s) is the conditional intensity function of the Hawkes process in Eq. 7 depending on the latent shopping phase i ∈ X. The derivation of the likelihood can be found in Section A.

RISK ASSESSMENT FRAMEWORK
We now describe our risk assessment framework that computes a user's risk of exiting with no purchase. For this, we introduce a realtime risk score based on sequential hypothesis testing [40]. This risk score can be used to detect users at risk whenever it exceeds a predefined threshold and, thereby, determines the so-called time of early warning.

Assessing a User's Exiting Risk
We denote the risk score at time t as R(t), which is the probability that a user exits with no purchase given the observed clickstream up to time t.
More formally, the risk score is confronted with two hypotheses: (i) the null hypothesis H 0 which corresponds to the hypothesis that the user will ultimately purchase something, and (ii) the alternative hypothesis H 1 which corresponds to the hypothesis that the user will exit with no purchase. The risk score's aim to test the null hypothesis as more clickstream data from a user becomes available. Hence, the risk score is a sequential hypothesis test [40], where the true hypothesis is observed at the end of the sessions by the outcome of the session (i. e., purchase or exit). We denote the session outcome by the variable l d ∈ {0, 1}, where l d = 0 if the null hypothesis holds true, and l d = 1 if the alternative hypothesis holds true, i. e., l d = 0, if H 0 (i. e., user purchases), 1, if H 1 (i. e., user exits with no purchase).
Hence, following [40], we view the user's risk score as the test statistic of a sequential hypothesis test. That is, the user's risk score R(t) at time t is the posterior probability of hypothesis H 1 given the observations up to time t, i. e., R(t) ≜ P(H 1 | {p d m , t d m ≤ t }). Using Bayes' rule, the risk score can be rewritten as where P(H 0 ) and P(H 1 ) are the prior probabilities of a user purchasing and exiting with no purchase, respectively (i. e., the rate of purchasing and exiting sessions). This means that the risk score R(t) determines under which hypothesis the observed clickstream is more likely to occur. In practice, a threshold ρ must be set on the computed risk score R(t) in order to detect users at risk of exiting with no purchase.
Whenever the risk score exceeds that threshold, a user is classified as at risk of exiting with no purchase (i. e., the hypothesis H 1 is declared). This is similar to the sequential hypothesis test, where the null hypothesis is rejected whenever the test statistic crosses a predefined threshold [40].

Time of Early Warning
We define the time at which the risk score exceeds the threshold ρ as the stopping time T s (ρ), i. e., where we set inf {} = ∞. Then the user's time of early warning (TEW) is the time between a user's stopping time, T s (ρ), and the end of the session, T d . Fig. 7 depicts an example of the evolution of a computed risk score during a user's session. The threshold ρ can be varied in order to manipulate the time of early warning and control alarm fatigue. The smaller ρ, the earlier it is exceeded by the risk score. This yields an early stopping time, T s (ρ), and, therefore, a longer time period until the end the of sessions,T d . However, this also results in a sensitive risk assessment with many false alarms. On the contrary, the larger ρ, the later it is exceeded by the risk score, which yields a late stopping time, T s (ρ), and, therefore, a shorter time period until the end the of sessions, T d . This yields an insensitive risk assessment which misses many users that exit with no purchase. Note that users are classified as not at risk of exiting with no purchase (i. e., hypothesis H 1 is rejected), if ρ < R(t), for all t ∈ [0,T d ].
This risk assessment framework can be used for an arbitrary model for a given clickstream. In particular, we are using it for both our M3PP model and all baselines.

EXPERIMENTAL SETTING
In this section, we describe our clickstream dataset and the experimental settings. In particular, we introduce a variety of baselines and performance metrics, which we are going use later for model comparison.

Data Description
We evaluate our M3PP on a real-world clickstream dataset provided by Digitec Galaxus, the largest online retailer in Switzerland offering more than a million products. Our partner company sells a large variety of products, ranging from electronic devices to fashion. Its website also contains content that provides information to prospective customers, for instance, product reviews.
Our clickstream dataset D = {S d } D d =1 displays the typical structure of clickstream data as described in Section 3. As a reminder, the d-th user's session is defined as This variable is used in our risk assessment framework where it decodes whether the null hypothesis or the alternative hypothesis held true in Eq. 12 and is required to compute a user's risk score R(t) from Eq. 13 and to evaluate the predictive performance.
Our real-world clickstream dataset has the following two benefits. (i) Our dataset comprises user sessions collected during 2019 and is thus recent (different from many public datasets). (ii) We obtained the actual session outcomes, l d . The session outcomes can not be extracted from the clickstreams themselves, as there is no page dedicated to a purchase or exit. Using a proxy by assuming that Checkout or Shopping Cart pages are associated with purchases yields noisy variables l d .
6.1.1 Data Preprocessing. The following preprocessing was applied analogously to previous research on clickstream modeling [13,22,28]. (i) We extracted the pages that were actually visited (i. e., rendering of a page request in the user's browser window). (ii) We encoded page according to seven distinct page types: Home, Account, Overview, Product, Marketing Content, Community, and Checkout (incl. Shopping Cart). (iii) We excluded sessions from the dataset that originated from web crawlers as these were not conducted by humans and thus had different objectives. For this purpose, we made use of a built-in tool provided by our partner company. (iv) In addition, many sessions had only a few page visits. We excluded sessions with fewer than three pages visited. (v) We excluded sessions that contain more than 50 page visits. (vi) We considered a session has ended if the TSP was more than 20 minutes, since some users kept the website open for a long period of time and continued with the sessions later or another day. This results in the above dataset containing 2,223 sessions.

Baselines
We compare our M3PP model with the following state-of-the-art baselines analogous to the models in Section 2: (1) Markov chain (MC): We compare to Markov chains with varying higher orders from 1 to 3 [12,22], denoted as MC-1, MC-2, and MC-3. (5) Long short-term memory network (LSTM): Long shortterm memory networks have been recently applied to clickstream data [e. g., 21,37]. We train a LSTM, which is state-ofthe-art in clickstream modeling, following the architecture of [37]. The LSTM has 3 layers and 60 neurons in each layer. The network is trained in 1,000 epochs using the Adam optimizer and the cross-entropy loss. The input consists of a user's clickstream; the label l d ∈ {0, 1} represents the target variable. We also compare to a marked temporal point process (MTPP) without latent shopping phases. This resembles the model from [12] but is based on a classic Hawkes process. The model form [32] is not applicable to clickstream data since it assumes continuous marks. For our M3PP model and the baselines (1)-(3), we use the risk score in Eq. 13. For the baselines (4)-(5), the expression Eq. 13 can not be computed. However, the way were trained these two models, we can directly compute the posterior probability of hypothesis H 1 , i. e., P( . This yields the same risk score, since l d = 1 for H 1 .

Performance Metrics
We compare our M3PP model to the baselines in Section 6.2 in two ways: (i) the accuracy of whether a user is detected correctly; and, if detected correctly, (ii) the time of early warning.
For (i), we use the receiver operating characteristics (ROC), which is commonly used in machine learning in order to evaluate the performance of classification problems. The ROC is the trade-off between the false positive rate (FPR) and the true positive rate (TPR). The FPR is the rate with which users are classified as at risk of exiting with no purchase although they are going to purchase, i. e., FPR = # users with l d = 0 and R(t) ≥ ρ for some t < T d # users with l d = 0 .
The TPR is the rate with which users are classified as at risk of exiting with no purchase and they are going to exit with no purchase, i. e., TPR = # users with l d = 1 and R(t) ≥ ρ for some t < T d # users with l d = 1 . (16) For (ii), we use the time of early warning as introduced in Section 5.2. In particular, we use the trade-off between the FPR and the TEW, which is tailored to early warnings but closely resembles the familiar ROC curves [7]. In particular, we use the expected TEW in order to account for all users in the dataset, i. e., The threshold ρ can be varied in order to trade off between the FPR and the TPR, as well as to trade off between the FPR and the TEW. The performance metrics above evaluate the model performance in terms of its accuracy, false alarm rate, and time of early warning in detecting users at risk of exiting with no purchase.

Estimation Details
In this section, we provide the detailed steps of how the model was estimated and, based on this, how the risk score was computed.
(1) We split our clickstream dataset into a training set (75 %) and a test set (25 %). (2) We further divide the training set into two set: one set on which H 0 holds true for all sessions and one set on which H 1 hold true for all sessions. For this, the session outcomes l d are required. (3) Then, we estimate the model parameters Ω on both sets from (2) using the likelihood from Section 4.3. We take a Bayesian approach to estimating the parameters via a maximum a posteriori estimation. 3 We place weakly informative priors on the unknowns A, π , γ , Λ, and M. In particular, we place normal priors on the rows of A, a Dirichlet prior on π , and gamma priors on γ . We further placed gamma priors on the Hawkes process parameters Λ and Dirichlet priors on the switching Markov chain parameters (M i ) i ∈X . This yields one model estimate for P({p d m , t d m ≤ t } | H 0 ) and another model estimate for P({p d m , t d m ≤ t } | H 1 ). (4) Afterwards, we determine hyperparameters. Note that the number of latent shopping phases N is the only hyperparameter that has to be selected. Using the Bayesian information criterion, we selected N = 2 shopping phases for each of the two model estimates. (5) Finally, we compute the risk score R(t) given by Eq. 13 on the test set using the two model estimates. The risk score is dynamically updated as more clickstream data from a user becomes available.

RESULTS
In this section, we present the predictive performance on the test sessions and report the parameter estimates of the M3PP model.

Predictive Performance
We compare our M3PP model with our baselines across different performance metrics. In practice, the threshold ρ has to be set in order to classify a user at risk of exiting with no purchase. We vary the threshold ρ and report the area under the curve (AUC) of the performance metrics. The AUC of the ROC curve is denoted by AUROC, and the AUC of the FPR vs. TEW curve is denoted by AUTEW.
We report the AUC values for all models under consideration in Table 1. We can see that our M3PP model outperforms the state-ofthe-art baselines: it achieves an AUROC of 0.728 and an AUTEW of 5.12. The best performing baseline is the first-order Markov chain (i. e., MC-1) with an AUROC of 0.666 and an AUTEW of 4.53. Hence, our proposed model offers an improvement over the MC-1 of 6.2 percentage points (p.p.) in AUROC and 12.9 % in AUTEW.
As mentioned in Section 2, the increase in the number of states results in inferior predictive performance for MC-2 and MC-3. The LSTM achieves an AUROC of 0.597 and an AUTEW of 4.36. It is outperformed by our proposed M3PP model by 13.2 p.p. in AUROC and 17.4 % in AUTEW.
For better illustration, we now make an example with a fixed threshold ρ. This differs from the above comparison (where we varied the value for ρ). In practice, the threshold ρ should be set such that the resulting FPR is relatively low in order to avoid many false positive alarms and, therefore, falsely spent online interventions (e. g., price promotions). We set a different threshold ρ for each model that was chosen in a way such that the resulting FPR is 0.15 for each model. Note that different thresholds for each model are needed in order to achieve the same FPR.
In Fig. 8, we depict the ROC curve, which shows the trade-off between the FPR and the TPR as a function of ρ. The dashed line in Fig. 8 9 depicts the FPR vs. TEW curve, which shows the tradeoff between the FPR and the TEW as a function of ρ. Again, the dashed line in Fig. 9 denotes a FPR of 0.15. On average, our M3PP model triggers alarms 25.99 seconds before the end of the session. Compared to this, the MC-1 baseline triggers alarms 14.57 seconds before the session ends. Hence, our proposed model detects on average user that will exit with no purchase 78.4 % earlier. If we measure the TEW in number of pages instead of seconds, then the M3PP detects a user at risk of exiting with no purchase 3 pages before the exit. Compared to this, the MC-1 detects a user at risk of exiting with no purchase 2 pages before the exit. This is considerable when comparing to the average session, which contains 10 pages. Note that for lower FPR, the performance difference tends to increase in favor of our M3PP model. In summary, our M3PP does not only detect more user at risk of exiting with no purchase correctly, but also substantially earlier than the best baseline.

Interpretation of Model Parameters
Now we present the estimated parameters for our M3PP model. In particular, we highlight that the marked point process parameters (Λ i , M i ) for each latent shopping phase i ∈ X differ substantially. In the following, we use these differences to attach an interpretation to the latent shopping phases using the terminology in [28].
As a result of the estimation in Section 6.4, we obtained two sets of estimated parameters: one set for the distribution conditioned on the null hypothesis, H 0 , denoted byΩ 0 , and one set for the distribution conditioned on the alternative hypothesis, H 1 , denoted byΩ 1 . We only report the estimated parametersΩ 1 due to space constraints. 4 We remind of our model selection: using the Bayesian information criterion, we selected 2 latent shopping phases for the Markov jump process X (t), i. e., X = {1, 2}. using a Hawkes process in Eq. 7. The Hawkes process' conditional 4 The latent shopping phases for the estimated parametersΩ 0 have the same interpretation.
intensity functions forΩ 1 are given by the parameter estimates λ * 1 (t) = 0.68 λ * 2 (t) = 12.04 for each of the two latent shopping phases, where λ * i (t) is measured in units of 10 seconds.
We interpret phase 1 as a "browsing-oriented" phase in which the user does not have a specific goal in mind. During this phase, the user tends to spend more time on an individual page, since the estimated baseline intensity,μ 1 , is much lower than in phase 2 (i. e., 0.68 vs. 12.04). Phase 2 is interpreted as a "deliberation-oriented" phase in which the user has a specific goal in mind. Because of this, the user switches between pages much quicker and spends less time on an individual page, since the estimated baseline intensity,μ 2 , is much higher than in phase 1. The temporal dynamics between pages visited differ from one phase to another: during a "deliberationoriented" phase, past page visits influence the conditional intensity function more strongly than during a "browsing-oriented" phase (i. e., larger α 2 and smaller β 2 ).

Estimated Mark Process Parameters.
We modeled the sequence of pages visited using a switching Markov chain in Eq. 8. As above, the estimated parameters of the switching Markov chain substantially differ from one latent shopping phase to another. For each of the two latent shopping phases, Table 2 depicts the estimated parameters of the switching Markov chain, i. e., Q 1 and Q 2 . We find that users in a "browsing-oriented" phase tend to visit Overview and Product pages more often, whereas users in a "deliberationoriented" phase tend to visit Account pages and Marking Content more often. This can be observed by summing over the columns in Table 2 for each phase. These findings coincide with previous findings on latent shopping phases [11,28].
In summary, the above latent shopping phase-specific differences underline the importance of modulating the marked point processes: it allows us to integrate the user's clickstream together with her latent shopping phase while learning the user's clickstream behavior.

CONCLUSION
This paper proposes a novel Markov modulated marked point process model for detecting e-commerce website users at risk of exiting with no purchase using clickstream data. Our proposed model captures not only the sequence of individual page visits by a user but also the temporal dynamics between them, i. e., the time spent on pages. This requires extending previous discrete-time models to continuous-time.
As a secondary contribution, we suggested a risk assessment framework that assesses a user's risk of eventually exiting with no purchase using a real-time risk score. It computes a risk score based on sequential hypothesis testing in order to detect users at risk of exiting with no purchase early during their sessions.
Our computational experiments use a real-world clickstream dataset provided by our partner company, the largest online retailer in Switzerland. The results demonstrate that our M3PP model consistently outperforms state-of-the-art models in terms of both AU-ROC (+6.24 percentage points) and AUTEW (+12.93 %). At the same level of the FPR, our model detects user exiting with no purchase not only more accurately, but also 78.4 % earlier, which provides more time for interventions.
For online retailers, the substantial predictive performance offered by our M3PP model promises a great improvement in the quality and timing of online interventions. By utilizing the proposed model combined with the risk assessment framework, e-commerce website owners can better focus their attention on users at risk of exiting with no purchase, and can intervene timely and effectively via online interventions (e. g., adapting website content or offering price promotions).
With a high exiting rate on e-commerce websites, deploying our M3PP model can help converting users into buyers; we are currently implementing the proposed M3PP model at our partner company.