Skip to content

Commit

Permalink
move dip warning to limitations section (#1587)
Browse files Browse the repository at this point in the history
  • Loading branch information
nmdefries authored Jan 28, 2025
1 parent 60df975 commit 8cfb191
Showing 1 changed file with 10 additions and 3 deletions.
13 changes: 10 additions & 3 deletions docs/api/covidcast-signals/google-symptoms.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,15 @@ nav_order: 1
* **Time type:** day (see [date format docs](../covidcast_times.md))
* **License:** To download or use the data, you must agree to the Google [Terms of Service](https://policies.google.com/terms)

<div style="background-color:#ff00001c; padding: 10px 30px;"><strong>Data issue:</strong> Between May 13 2024 and August 6 2024, signals values were 25%-50% lower compared to previous time periods. This affects <i>all</i> signals and symptom sets. Currently there is no explanation for the decrease in search volume, and the issue is under investigation by our data source partners.</div>

## Overview

This data source is based on the [COVID-19 Search Trends symptoms
dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-search-trends?hl=en-GB). Using
this search data, we estimate the volume of searches mapped to symptom sets related
to COVID-19. The resulting daily dataset for each region shows the average relative frequency of searches for each symptom set. The signals are measured in arbitrary units that are normalized for overall search users in the region and scaled by the maximum value of the normalized popularity within a geographic region across a specific time range. **Values are comparable across signals in the same location but NOT across geographic regions**. For example, within a state, we can compare `s01_smoothed_search` and `s02_smoothed_search`. However, we cannot compare `s01_smoothed_search` between states. Larger numbers represent increased relative popularity of symptom-related searches.

Between May 13 2024 and August 6 2024, [signal values were much lower](#limitations) compared to previous time periods due to a data outage.

#### Symptom sets

* _s01_: Cough, Phlegm, Sputum, Upper respiratory tract infection
Expand Down Expand Up @@ -94,7 +94,7 @@ population-weighted averaging.

For aggregation purposes only, we assign a value of 0 to source regions that
have no data provided due to quality or privacy issues for a certain day (see
Limitations for details). We do not report aggregated regions if none of their
[Limitations](#limitations) for details). We do not report aggregated regions if none of their
source regions have data. Because of this censoring behavior, the resulting data
for aggregated regions does not fully match the _actual_ search volume for these
regions (which is not provided to us).
Expand All @@ -106,6 +106,13 @@ As a result the delay can range from 3 to 10 days or even more. We check for
updates every day and provide the most up-to-date data.

## Limitations

Between May 13 2024 and August 6 2024, signal values were 25%-50% lower compared to previous time periods.
This affected _all_ signals and symptom sets.
The drop does not reflect actual search term popularity during the affected period.
The apparent decrease in search volume was caused by an outage in the data pipeline on the source side.
The data was unfortunately not recoverable and the dip can not be repaired, but data outside the listed time period is unaffected.

When daily volume in a region does not meet quality or privacy thresholds, set
by Google, no daily value is reported. Weekly data may be available from Google
in these cases, but we do not yet support importation using weekly data.
Expand Down

0 comments on commit 8cfb191

Please sign in to comment.