Obtaining consistent time series from Google Trends

Google Trends data are a popular data source for research, but raw data are frequency‐inconsistent: daily data fail to capture long‐run trends. This issue has gone unnoticed in the literature. In addition, sampling noise can be substantial. We develop a procedure (available in an R‐package), which solves both issues at once. We apply this procedure to construct long‐run, frequency‐ consistent daily economic indices for three German‐speaking countries. The resulting indices are significantly correlated with traditional leading economic indicators while being available in real time. We discuss potential applications across disciplines and spanning well beyond business cycle analysis

latest information about the economy. A recent example is the outbreak of the COVID-19 pandemic, during which macroeconomic conditions sometimes changed daily.
Although GSV data is in principle available on a daily basis for any country or region of the world, much of the prior research has focused on large countries and used monthly rather than weekly or daily GSV data. This may explain why so far the literature has failed to identify an important limitation of GSV data: the raw data at the daily level is inconsistent across different time frequencies (e.g., daily vs. monthly data). As a result, daily data fail to capture longrun trends. Researchers therefore face a trade-off between using high-frequency daily series versus time-consistent series, where search volumes are comparable at different and distant points in time. Business cycle analysis and forecasting models, however, typically require data spanning more than a decade.
A second limitation of the publicly available GSV data arises from random sampling by Google. Depending on the underlying "population size," this introduces substantial sampling variation, affecting the results. While some of the prior research based on GSV data has addressed this issue in practice (e.g., D'Amuri & Marcucci, 2017;Matsa et al., 2017;McLaren & Shanbhogue, 2011;Narita & Yin, 2018;Vosen & Schmidt, 2012), we document the magnitude of the problem (as in Carrière-Swallow & Labbé, 2013), and how it varies with population size.
The two limitations arise from a combination of the following three factors: (i) For privacy reasons, Google only provides an index of search volumes, rather than the actual number of searches. Scaling of the index varies with the chosen time window: within each window, the index lies in the range between zero and 100. Only long time windows allow comparing magnitudes of specific events over time or studying long-run trends. (ii) GSV default to weekly (monthly) data for time spans longer than 9 months (5.25 years). In combination, (i) and (ii) imply that it is not possible to directly extract long-run daily GSV data from Google Trends that is consistent with the long-run trend captured by the monthly data. (iii) For a chosen time frame, GSV data are based on a random sub-sample drawn by Google. We show that in small countries or sub-national regions, the sampling variation of returned GSV series turns out to be substantial. Even in large administrative units, there can be substantial sampling noise for keywords with limited search volume. 1 For example, in an area with one million inhabitants, the standard deviation of the daily results for "recession" can be as large as its mean value.
To overcome these issues, we present a two-step procedure to construct frequency-consistent daily, weekly, and monthly series of GSV over a long period, which at the same time reduces sampling noise. 2 In a first step, we address the sampling variation by drawing multiple random samples of GSV data and averaging the series. In a second step, we combine the information from monthly and weekly series into a single daily series that is consistent with the weekly and monthly series. For this purpose, we apply Chow and Lin's (1971) disaggregation routine twice. First, to disaggregate the monthly series of GSV to weekly data and, second, to disaggregate these weekly results to daily data. This ensures consistency across the three frequencies. We implement these two steps in the R-package trendecon that interacts directly with the Google Trends API. This allows researchers to obtain frequency-consistent, long-run daily GSV data with just one line of open-source code.
To our knowledge, we are the first to raise and solve the issue of frequency inconsistency in GSV data. Our procedure allows us to build daily economic sentiment indices (DESI) for the (mainly) German-speaking countries Germany, Austria, and Switzerland over the period 2007-2020, including the beginning of the COVID-19 pandemic. These indices are based on GSV for several keywords, including "economic crisis" ("Wirtschaftskrise") and "unemployed" ("arbeitslos"). To aggregate the series into a DESI, we extract the common signal in the keyword series. 3 We show that our indices are significantly correlated with other leading indices in all three countries. The correlations for the much smaller economies Austria and Switzerland are of similar magnitude as for Germany, where GSV data suffer significantly less from sampling variation. We conclude that twice adjusted GSV data can be useful to monitor economic activity at a high frequency, even in smaller countries or regions. As our procedure to construct consistent long-run daily GSV series allows for improved use of GSV data, our contribution can be applied in a variety of other settings and disciplines to study the development of public sentiment as well as public interest in certain topics or products in real time as well as ex-post. Economic forecasting is just one of many possible areas of application.

| RELATED RESEARCH
The procedure we suggest contributes to improving the quality of GSV data, which is being used in a growing body of empirical research. Given that most Internet searches are performed by private households, the focus of economic studies usually lies on target variables related to the labor market or consumption. Vosen and Schmidt (2011), for example, show that Google search data outperforms survey-based indicators in forecasting US private consumption. Building on this, Woo and Owen (2019) further document the usefulness of incorporating online search data in forecasting models for US private consumption. Several studies have explored using Google Trends for forecasting unemployment, including Smith (2016) for the United Kingdom, González-Fernández and González-Velasco (2018) for Spain, and Maas (2019) for the United States. Ferrara and Simoni (2019) and Götz and Knetsch (2019) show that Google search data can also be useful for nowcasting quarterly gross domestic product (GDP) of Germany and the Euro Area, respectively. Narita and Yin (2018) show that GSV data significantly correlate with macroeconomic variables such as real GDP, inflation, and capital flows in low-income developing countries. Götz and Knetsch (2019) and Maas (2019) provide comprehensive overviews of the literature using GSV.
To our knowledge, the previous literature has not been aware of the frequency inconsistency of GSV data. This may be because studies have either used long-run monthly GSV series, which correctly reflect trends, or short-run daily series, neglecting long-run trends.
The sampling variation inherent to GSV data, the second issue we document and solve with our procedure, has been addressed in at least some of the previous literature, including Carrière-Swallow and Labbé (2013), D'Amuri and Marcucci (2017), Matsa et al. (2017), McLaren and Shanbhogue (2011), Narita and Yin (2018), and Vosen and Schmidt (2012). These studies address the problem using averages of several samples of monthly data drawn on different days. That procedure, however, does not allow extracting long weekly or daily GSV data, and hence one of the great advantages Google Trends offers is lost.
Since we use GSV data to construct economic sentiment indicators for a set of German-speaking countries, our paper further relates to alternative indices used to monitor the state of the economy. The most prominent example is Baker et al. (2016), who have developed a new index of economic policy uncertainty (EPU) based on newspaper coverage frequency for the United States and a number of other countries. Castelnuovo and Tran (2017) use monthly GSV data for the United States and Australia to construct an uncertainty index based on uncertainty-related keywords such as "bankruptcy," "stock market," "economic reforms," and "debt stabilization." Da et al. (2015) construct a Financial and Economic Attitudes Revealed by Search (FEARS) index as a new measure of investor sentiment. Kostopoulos et al. (2020) follow their approach and construct a FEARS index for Germany. 4 Closely related to the macroeconomic context of our paper, Lourenço and Rua (2021) use a latent variable of several daily series within a factor model framework to develop a daily economic indicator for Portugal.

DAILY GSV DATA
The aim is to obtain frequency-consistent long-run daily GSV series for a keyword search in a given region or administrative unit. This section describes the two potential sources of errors inherent to raw GSV data, and explains in detail the step-wise solution we introduce with this paper.

| Large sampling variation across random draws
Queries of GSV are based on a random sub-sample of all searches. 5 This introduces variation between the samples drawn for different time periods. This problem has been recognized and addressed by some researchers, while others using GSV data do not mention sampling at all.
We suspect that sampling variation was more severe for the first decade of the second millennium, and also for larger countries (e.g., Germany, see Vosen & Schmidt, 2012). Although the use of Google as a search engine has increased dramatically over the past decade, sampling still is a cause for concern. The problem worsens for less popular keywords, for smaller geographic regions, and for higher data frequency. Indeed, for most US states or smaller countries like Switzerland, results from search queries can differ substantially between draws. Figure 1 illustrates the problem. The thin, pale lines show 24 draws of daily query results for the term "recession" in the state of Massachusetts. The first draw runs from January 1 to April 4, 2020, the second from January 2 to April 4, 2020, and so forth. The last draw runs from January 24 to April 4, 2020. The resulting series show strikingly different results, with the average of all 24 draws being summarized in the bold dark line. This illustrates how the usage of the raw Google Trends data, for example in a forecast exercise, may lead to misleading results. EICHENAUER ET AL. Figure 2 provides further systematic evidence of the sampling problem. For all 50 US states and the (mainly) German-speaking countries Austria, Germany, and Switzerland, we query monthly, weekly, and daily GSV time series for "recession" ("Rezession" in German, respectively) for overlapping periods between January 1, 2007, and April 4, 2020. Figure 2a shows how the mean of the standard deviation of the draws decreases as the population of a US state or a country increases. For this monthly data, visual inspection suggests that individual monthly queries are particularly noisy if the population is smaller than 10 million inhabitants. Figure 2b shows the corresponding mean of the standard deviation for queries at daily frequency. Again, the series show more randomness for smaller regions. Furthermore, the comparison between Panels (a) and (b) shows that for any given region, the instability problem is more severe in daily than in monthly data. For weekly data (not shown), the standard deviations are almost identical in magnitude as for daily data. F I G U R E 2 Average of the within-month and within-day sampling standard deviations, respectively, of the Google search volumes (GSV) query results for "recession" by population size across US states, Austria, Germany, and Switzerland. For German-speaking countries, we use the translation "Rezession." (a) Highlights monthly results in blue, results for daily data are shown in gray. (b) Highlights daily results in blue and monthly data are shown in gray. For weekly data, the standard deviations are almost identical in magnitude as for daily data. We omit them from the figure for readability. Source: GSV data; own calculations 3.2 | Sampling adjustment: averaging multiple time-shifted draws To obtain sample-consistent daily series, we draw multiple samples for each keyword and frequency. Regarding the number of draws, we need to strike a balance between the reduction of the sampling problem and the need not to overstress the Google API, as Google blocks the IP after too many queries within a short period. We choose a minimum of 12 draws, as this reduces the variance of the mean values of the series at a given point in time by approximately 90%. Monthly series are drawn exactly 12 times; daily (weekly) series are drawn by applying a rolling window that shifts by 15 days (11 weeks). We describe the intuition for the rolling window at the daily level in Section 3.1. For each frequency separately, we then take the daily (or, respectively, weekly or monthly) average of the 12 or more draws to obtain a sample-consistent series.

| Long-run daily data are frequency-inconsistent
The second challenge when using long daily GSV data is frequency inconsistency. For any given keyword, the aggregation of the sample-consistent daily, weekly, and monthly series to lower frequency would lead to different results. For example, the weekly average of the daily series obtained after resampling and averaging as described in Section 3.2 does not correspond to the resampled and averaged weekly series. To the best of our knowledge, we are the first to identify the frequency inconsistency between daily, weekly, and monthly series.
The reason is that daily (weekly) data cannot be downloaded from the Google API for time spans longer than nine months (5.25 years). In contrast, monthly data queries allow downloading the entire time span of available data. Therefore, although Google indexes the search volumes such that within a time window the highest value is normalized to 100 and the lowest value to zero, the long-term trend and relative changes are still captured correctly in the monthly series. This is not the case for daily and, to a lesser extent, for weekly data. Because daily data can only be queried for shorter time spans, and because of the within-time-window normalization applied by Google, the relative search volumes cannot be compared over distant points in time. By simply chaining daily data, one misses the long-term trend observed in monthly data.
To illustrate this issue, Figure 3 compares a monthly series for the search term "recession" constructed from chained daily results, with the monthly series obtained from a window that covers the entire time span. For both series, we use averages of more than 10 draws, hence the series are sample-consistent. The series for the original monthly series and the chain-linked daily data differ starkly. According to the original monthly series displayed in blue, GSV for "recession" were much higher during the 2007/2008 financial crisis and the 2020 COVID-19 outbreak. If one were to rely on GSV index for 'recession' daily data, chain−linked, at monthly frequency monthly F I G U R E 3 Google search volumes (GSV) query results for the term "recession" in Massachusetts. The solid blue line shows the standardized monthly search volume based on raw monthly GSV data. The dashed black line shows the monthly search volume based on raw, chained daily GSV data aggregated to monthly frequency. We do not show the consistent monthly series from our corrected daily series because, by construction, they are identical to the original monthly series. Source: GSV data; own calculations EICHENAUER ET AL.
the red line instead, search frequencies for the term "recession" hovered at relatively low search volumes between 2007 and 2020, and the magnitudes observed at the outbreaks of the 2020 COVID-19 crisis and the Great Recession suggest that these crises are of similar economic importance. Since we know that the original monthly data queried for the entire period correctly captures the relative magnitude of search volumes between the two recessions, we conclude that daily and weekly data queried over the shorter time windows imposed by the Google API do not correctly capture longterm trends.

| Frequency-consistency: combining monthly, weekly, and daily data
We obtain a frequency-consistent daily series by combining the information from monthly (and weekly) series into a single daily series that is consistent with the weekly and monthly series. We use a two-step procedure based on Chow and Lin's (1971) disaggregation routine. The method is commonly used by statistical offices to disaggregate a low frequency time series to a higher frequency series. The goal is that the average (or sum) of the resulting high frequency series is consistent with the low frequency series. One (or several) high frequency series can be used as indicators. These indicator series will determine the shape of the disaggregated series. On the low frequency aggregation, the framework provided by Chow and Lin (1971) performs a General Least Square regression between the indicator series and the series of interest. It also distributes the residuals so that the average (or sum) of the resulting series is consistent with the low frequency series. The problem at hand is therefore an ideal application, as the higher frequency GSV are natural indicators to disaggregate the lower frequency series. A Chow-Lin method applicable to daily or higher frequencies is available in R (Sax & Steiner, 2013). Unlike the Denton (1971) method which could be used for this purpose as well, it is computationally more efficient and leads to similar results.
Our procedure is based on three assumptions: 1. Monthly data captures the long-term trend in search activity in the most accurate way. 2. Weekly data is best to analyze the searches over a few weeks. 3. Daily data is best to analyze short term behavior over several days.
In a first step, we temporally disaggregate the weekly data by using the daily series as an indicator using the Chow and Lin (1971) framework. This preserves the movement of the daily series and ensures that weekly averages are identical to the original (i.e., resampled and averaged) weekly series.
In a second step, we apply the same procedure using these adjusted weekly series from the first step as an indicator to temporally disaggregate the monthly values. This produces a series that maintains most of the movement of the daily and the weekly series, but has the same monthly averages as the original monthly series.

| ROBUST DESI
We use the method described above to obtain consistent GSV data for several keywords, and construct DESI for the (mainly) German-speaking countries Germany, Austria, and Switzerland over the period 2007-2020. Below we describe how we choose the keywords and time frame, how we seasonally adjust the robust GSV series, how we aggregate them into a daily index, and how the resulting indices compare to existing economic indicators.

| Time frame and keyword choice
GSV data is in principle available since 2004, yet Internet usage was not as widespread in early years and Google made improvements to their database over time, too. Thus, data quality may be worse in earlier years. Our index starts in January 2007 for two reasons. First, this is the year the first iPhone was introduced, an innovation that triggered a dramatic increase in mobile Internet use. Second, it covers 20 months before the collapse of Lehman Brothers, which marked the onset of the subsequent recession in 2008/2009. This will allow us to evaluate our DESI over a long time span that contains two major economic downturns.
We start with a set of German keywords related to the state of the economy. These keywords are commonly used in the economic policy debate, as well as in daily life. To protect users' privacy, GSV data default to zero for small search volumes (these are caused by low popularity of a search term, the small size of the country or region, or a combination of both). For this reason, we discard keywords that frequently return zeros. The keywords must therefore be general enough to be used frequently over the whole time period. We only consider keywords that show significant changes over time that coincide with changes in economic activity, such as the financial crisis.
As it turns out, search terms with a positive connotation, such as "economic recovery" ("Wirtschaftsaufschwung"), "economic growth" ("Wirtschaftswachstum"), or "invest" ("investieren") do either not coincide with changes in economic activity or have very low frequencies, such that they mostly default to zero or return no result at all. We conclude that people's interest in the state of the economy reflected in Internet searches is asymmetric: it increases during busts but not during booms. Google search activity related to the state of the economy therefore carries especially valuable, timely information during downturns, which in turn is reflected by the rise in search terms with a negative connotation.
Finally, we consider the factor loadings of the first principal component to choose the keywords on which the DESI will be based (see Section 4.2). We end up with four keywords: economic crisis ("Wirtschaftskrise"), short-time work 6 ("Kurzarbeit"), unemployed ("arbeitslos"), and bankruptcy ("Insolvenz"). GSV for four keywords may not seem like a lot to build an economic index. However, as Baker et al. (2016) show in their seminal contribution of economic uncertainty indices, adding more words does not always significantly increase the content of information. 7

| Seasonal adjustment and principal component
For each keyword, we construct daily indices using the procedure described in Section 3. These daily series feature weekly, monthly, and yearly seasonalities. Moreover, there are irregular holidays, such as Easter, that occur at different dates each year. We use the Prophet procedure (Taylor & Letham, 2018) to seasonally adjust each time series on a daily basis. Framing the forecasting problem as a curve-fitting exercise, the fully automated procedure uses an additive model with non-linear trends, periodic changes (e.g., yearly and weekly seasonality), and irregular events such as holidays. However, the Prophet procedure assumes that yearly and weekly seasonal effects are constant over time, and the procedure disregards any possible monthly seasonality patterns not captured by the weekly seasonal effects.
To aggregate the seasonally adjusted series into a DESI, we perform a principal component analysis on the normalized series and extract the first principal component as the common signal in the four keyword series. Each DESI shows the relative change in search volumes over time. Because Google only shares relative search term frequency due to privacy concerns, we normalize all series such that the long-run average equals zero and the standard deviation is one. An index value of 2 therefore means that the search volume for the index is two standard deviations above its long-term average.

| DESI for German-speaking countries
Using the procedure described above, we construct DESI for Germany, Austria, and Switzerland. We use the same four German keywords in all three countries. 8 Figure 4a depicts the German DESI in comparison with the monthly consumer goods production index (Destatis, 2020) during the full time horizon for first visual inspection. Albeit being much more volatile due to its daily frequency, the DESI clearly mirrors changes in the medium term, such as the Great Recession in 2008/2009. Figure 4b provides an impression of the three DESI during the period of the COVID-19 outbreak in 2020. Broadly in line with the spread of the virus and lock-down policy measures, the DESI for Austria declined and bottomed out slightly earlier than the indices for the other two countries. Subsequently, it took somewhat longer for the German and Swiss DESI to recover, as Austria was earlier in easing the lock-down measures. Switzerlandthe smallest of the three countries-evolves almost identically to the German index, but still reveals somewhat more volatility on a daily basis. EICHENAUER ET AL. -7

| Correlations with other main economic indicators
We assess the goodness of fit of our indices compared to several other, established monthly economic indicators. In order to extract autocorrelation from the variables, we fit an ARðpÞ process to each series following Neusser (2016). To determine the lag length p, we use the AIC criterion. We use the residuals to test for significant cross-correlations with existing economic indicators.
Cross-correlation tests of the series, prewhitened as just described, show that for all three countries the respective DESI are indeed significantly correlated with other leading economic indicators at a monthly frequency. Both, the German and the Austrian DESI, feature a significant lead as well as concurrent correlation with the consumer confidence index of the respective country (GFK Group, 2020; Ipsos Austria, 2020) when considering the period from January 2006 to spring 2020 ( Figure 5 left, middle). 9 For Switzerland, where no monthly data of consumer confidence exist, the DESI also shows a significant concurrent and leading correlation with the most prominent leading composite indicator for the Swiss economy, the monthly KOF Barometer (Abberger et al., 2014), as the right of Figure 5 shows. We examine the robustness of these correlations by varying the time window. While correlations are lower during the time window 2011-2019 without substantial economic downturns, they become very high during the crisis episodes before and after this time window. In Germany and Austria, the DESI also shows significant concurrent correlations with real variables, such as industrial production. 10 This is a valuable feature for real-time business cycle analysis, since our index can be calculated at the end of the month, while German industrial production, for example, is published with a lag of around 40 days.

| Comparison to unsampled and frequency inconsistent indices
We have stated that both sampling-and frequency-adjustment contributes to more reliable indices. What is the effect of each of these adjustments? To illustrate this, we recompute daily DESI indices for Switzerland with and without the two adjustments. Figure 6 shows the effect of the two adjustments: 1. In each panel, the effect of the sampling adjustment is shown as the difference between the thin and the thick lines.
The thin lines show 24 individual draws of an "unsampled" DESI, where the four keywords are drawn directly from Sampling adjustment reduces the variance in Google Trend data that arises from Google's sampling and thus has no economic interpretation. Frequency adjustment ensures in addition that the series has a meaningful interpretation over longer periods of time. Medium-and longer-term trends are thus best reflected in a sampling-and frequency-adjusted index.

| DISCUSSION AND FURTHER APPLICATIONS
In this section, we attempt a brief look at the implications of our new sampling technique implemented in the trendecon package for empirical research in economics and beyond. Its application to DESI is only one of many possible examples how this new method can be put to use.
One of the big advantages of GSV data is that it is easily accessible and free, and that it covers many years, allowing researchers to study different phenomena of interest over time. Because the data is freely and easily available, GSV data are also a popular data source for robustness checks in academic research. Last but not least, the data is available in real-time, a highly demanded feature for both, business and policy analysis. Our approach makes an important contribution to improving GSV series, opening new possibilities to use this data for empirical research not only in economics but also in other social sciences. The trendecon R-package provides an easy-to-use tool for researchers and policy analysts alike, who are interested in the salience of different types of issues and topics with the broad public over time and space.
Below, we provide some examples of the manifold possible applications. One area of application is the consistent comparison of real time daily GSV with earlier times and events. In this paper, we use our technique to benchmark our GSV-based DESI to other leading economic indices. Our technique allows benchmarking an ongoing event with daily information to similar occurrences in the past, for which robust and established indicators are available.
Another type of applications are those that use daily GSV data to determine the occurrence and the relative importance over time and space of new or recurring events. Examples include the spread of Black Friday sales, public sentiment about (COVID-19) vaccines, the transmission of fake news, or the interest in Super Bowl advertisement. Public attention for election results or election fraud-and more generally: political scandals-may be of interest for studies in political economy.
Since the technique produces consistent daily series, it also opens new possibilities for event studies at a daily level. Daily GSV data allows researchers, for example, to study public worries or anti-immigrant sentiments following terrorist attacks; monitor the spread of diseases after natural disasters; or to measure public awareness about environmental risks and disasters and thus predict the likelihood for humanitarian aid (Eisensee & Strömberg, 2007) or local disaster preparedness (Magontier, 2020).
As a final remark, we want to highlight the importance of using consistent GSV from a scientific point of view. Since our method produces more stable GSV data series, we strongly advise to use these corrected series to obtain more reliable and better reproducible results, especially when working with small areas or not very frequent search terms. The easy implementation via the R-package trendecon makes the use of our method straight forward, and the opensource nature of the software allows for adjustments if needed.

| CONCLUSION
Google Trends queries suffer from a small sample problem and from frequency inconsistency between daily, weekly, and monthly series. The sampling technique introduced in this paper solves these two issues with Google Trends data. First, it generates stable daily Google search results that are consistent with weekly and monthly queries. Second, the technique provides stable series for medium-sized administrative units like US states or Switzerland. Using the open source R-package trendecon, which we have developed for this purpose, one can directly query sample-robust GSV data even for small countries or subnational regions.
We apply this technique to construct DESI for German-speaking countries, including small countries such as Switzerland or Austria. A comparison to other, well-established leading economic indicators shows that our DESI are significantly correlated with these leading indicators. Our indices seem to be particularly useful during crises, such as at the outbreak of the COVID-19 pandemic. This makes the indicators especially useful for policy analysis: they capture large downturns in real-time, while at the same time crisis episodes are typically marked by an increased need for timely data.