We took advantage of publicly available, anonymized, and aggregated national-level data from Google’s Symptom Search Dataset (SSD), which reports the relative frequency of Internet searches on 420 signs, symptoms, and health conditions with well-documented privacy protections.31. For comparison, we used data from: (1) the Centers for Disease Control and Prevention (CDC) National Trial Syndrome Surveillance Program (NSSP), which tracks emergency department visits (EDs) for a variety of facility health conditions across 48 U.S. states. states.6 and (2) the U.S. Census Bureau’s Household Pulse Survey (HPS) assessing the social and economic impact of the pandemic.7. The main features of these data sets are summarized in Table 1.
SSD is publicly available30 and provides a daily and weekly time series of the relative volume of searches in the United States in English or Spanish for common symptoms and conditions. The data is available at the national, state, and county levels in the United States and five other English-speaking countries. Search queries related to each symptom are joined and anonymized through the use of differential privacy32and then normalized by the total search volume in that region, as detailed elsewhere31.
An SSD was created using Google’s web-based search tools that map queries to the Knowledge Graph.33,34 beings constantly learning the associations between words in user questions and the beings described in web pages seen following those questions. The 420 symptoms and conditions included in an SSD represent the most frequently searched entities (by query volume). Each entity (symptom or condition) is related to tens or hundreds of thousands of individual questions posted by Google users on desktops or laptops. Quotation marks and capitalization in questions are ignored and spelling errors are automatically corrected. Sample questions included [lexapro], [depression test]or [signs of depression] for depression; [trazodone], [agoraphobia] or [panic attack] for anxiety; and [I want to die], [how to die] and [I want to kill myself] for a suicidal ideation.
For the present study, we focused on SSD search questions related to anxiety, depression, and suicidal ideation between January 1, 2018, and December 31, 2020. We selected these entities a priori because they represent common conditions that often sought, and because of their high importance to the mental health of the population. We also considered searches related to motion sickness as a supposed negative control in a subset of our analyzes.
We compared national-level, weekly Internet search data by SSD measurements to national-level data on ED visits as reported by the NSSP. The NSSP is a CDC-led partnership to collect, analyze, and share electronic health data from approximately 3,500 emergency rooms, emergency and outpatient care centers, health care facilities, and laboratories (collectively referred to as ED facilities here) across 48 states (excluding Hawaii). and Wyoming) and Washington6. These facilities account for approximately 70% of all U.S. ED facilities. The data used in this analysis were previously used by Holland et al. (2021)20 and reused in the present study with permission of the authors.
We focused on two variables reported by Holland et al. (2021)20: (1) national counts of weekly ED visits for mental health conditions associated with natural or man-made disasters, such as stress, anxiety, symptoms associated with acute stress disorder or post-traumatic stress disorder, and panic, and (2) national counts of weekly suicide attempts . The database included weekly ED visit numbers from December 30, 2018 to October 10, 2020.
We also compared internet search data with HPS data. The HPS is a national survey designed to measure the effects of the COVID-19 pandemic on the economic, physical, and mental health of American households.7. The 1st phase of the survey took place between 23 April 2020 and 21 July 2020, the 2nd phase took place from 19 August 2020 to 26 October 2020, and Phase 3 took place between October 28, 2020 and March 29, 2021. Although the survey is still ongoing, in the current analysis we used HPS data from these three phases.35.
Questions about symptoms of anxiety and depression were administered in all phases of the survey, while questions about mental health service were included in Phases 2 and 3. Questions about symptoms of anxiety and depression included 4 items that are a modified version of the two-element. Patient Health Questionnaire (PHQ-2) and the two-element questionnaires on Generalized Anxiety Disorder (GAD-2). For each question, answers covered the last 7 days and were coded as follows: not at all = 0, several days = 1, more than half of the days = 2, and almost daily = 3. Scores for anxiety and depression were obtained by summing answers through the two questions for each construction. The percentage of respondents scoring 3 or more on these summed points is used in analysis of survey results. Items indexing mental health care have estimated the percentage of adults in the past 4 weeks who reported taking prescription medication, receiving counseling or therapy from a mental health professional, or needing counseling or treatment from a mental health professional but not receiving it (i.e., unmet needs). ).
We first used graphical approaches and descriptive statistics to identify time patterns in Internet searches related to anxiety, depression, and suicidal ideation. We then adjust a generalized linear model with a log-link function to quantify the effects on relative search volumes associated with the week of Thanksgiving and the Christmas holidays and the onset of the COVID-19 pandemic (defined as the first 4 weeks of March 2020). , adjusting for calendar year and season.
Second, we quantified the change in search volumes associated with the pandemic by calculating the percentage change in search frequency for each topic against the same week 1 year earlier for the period from January 1, 2020 to December 31, 2020. similarly assessed the change. in rates of ED visits for mental health symptoms and suicide attempts by the NSSP.
Third, we calculated two Pearson correlation coefficients between simultaneous measurements derived from SSD, NSSP, and HPS. Results were not materially different when using Spearman rather than Pearson correlation coefficients. We also used disparate plots to more deeply visualize the relationship between specific pairs of markers. In sensitive analyzes, we considered the possibility for the presence of a 1- or 2-week delay between a change in search volumes and a change in rates of ED visits for mental health or suicide attempt. Specifically, we used a generalized linear model with a log-link function to quantify the relative change in ED visits associated with searches the same week, the previous week, and 2 weeks earlier. We fit separate models for each search concept. All analyzes were performed using R (version: 4.0.2). The code to reproduce these analyzes is publicly available on GitHub at https://github.com/anthonysun95/Google_SSD_and_Mental_Health.