diff --git a/.github/workflows/python_ci.yml b/.github/workflows/python_ci.yml index 60198d6e..27a78102 100644 --- a/.github/workflows/python_ci.yml +++ b/.github/workflows/python_ci.yml @@ -11,8 +11,10 @@ on: jobs: build: - runs-on: ubuntu-latest + defaults: + run: + working-directory: Python-packages/covidcast-py/ strategy: matrix: python-version: [3.6] @@ -25,12 +27,10 @@ jobs: - name: Install dependencies run: | python -m pip install --upgrade pip - pip install -r Python-packages/covidcast-py/requirements.txt + pip install -r requirements_ci.txt - name: Lint with pylint and mypy run: | - pylint Python-packages/covidcast-py/covidcast/ --rcfile Python-packages/covidcast-py/.pylintrc - mypy Python-packages/covidcast-py/covidcast --config-file Python-packages/covidcast-py/mypy.ini + make lint - name: Test with pytest run: | - pytest Python-packages/covidcast-py/ -W ignore::UserWarning - + make test diff --git a/.github/workflows/r_ci.yml b/.github/workflows/r_ci.yml new file mode 100644 index 00000000..05340103 --- /dev/null +++ b/.github/workflows/r_ci.yml @@ -0,0 +1,48 @@ +# This workflow uses actions that are not certified by GitHub. +# They are provided by a third-party and are governed by +# separate terms of service, privacy policy, and support +# documentation. +# +# See https://github.com/r-lib/actions/tree/master/examples#readme for +# additional example workflows available for the R community. + +name: R + +on: + push: + branches: [ main ] + pull_request: + branches: [ main ] + +jobs: + build: + runs-on: ubuntu-latest + defaults: + run: + working-directory: R-packages/covidcast/ + strategy: + matrix: + r-version: [3.5] + + steps: + - uses: actions/checkout@v2 + - name: Set up R ${{ matrix.r-version }} + uses: r-lib/actions/setup-r@ffe45a39586f073cc2e9af79c4ba563b657dc6e3 + with: + r-version: ${{ matrix.r-version }} + - name: Install libcurl + run: sudo apt-get install libcurl4-openssl-dev + - name: Cache R packages + uses: actions/cache@v2 + with: + path: ${{ env.R_LIBS_USER }} + key: ${{ runner.os }}-r-1- + - name: Install dependencies + run: | + install.packages(c("remotes", "rcmdcheck")) + remotes::install_deps(dependencies = TRUE) + shell: Rscript {0} + - name: Check + run: | + rcmdcheck::rcmdcheck(args = c("--no-manual", "--ignore-vignettes", "--as-cran"), build_args = c("--no-build-vignettes"), error_on = "error") + shell: Rscript {0} diff --git a/Python-packages/covidcast-py/DEVELOP.md b/Python-packages/covidcast-py/DEVELOP.md index 295ee2fa..8f502f74 100644 --- a/Python-packages/covidcast-py/DEVELOP.md +++ b/Python-packages/covidcast-py/DEVELOP.md @@ -1,33 +1,73 @@ # Developing the covidcast package +## Structure +From `covidcast/Python-packages/covidcast-py`, the Python library files are located in the +`covidcast/` folder, with corresponding tests in `tests/covidcast/`. +Currently, primary user facing functions across the modules are being imported in `covidcast/__init__.py` +for organization and namespace purposes. + +Sphinx documentation is in the `docs/` folder. See "Building the Package and Documentation" below +for information on how to build the documentation. + +The CI workflow is stored in the repo's top level directory in `.github/workflows/python_ci.yml` + +## Development +These are general recommendations for developing. They do not have to be strictly followed, +but are encouraged. + +__Environment__ +- A virtual environment is recommended, which can be started with the following commands: + + ```sh + python3 -m venv env + source env/bin/activate + ``` + this will create an `env/` folder containing files required in the environment, which + is gitignored. The environment can be deactived by running `deactivate`, and reactived by + rerunning `source env/bin/activate`. To create a new environment, you can delete the + `env/` folder and rerun the above commands if you do not require the old one anymore, + or rerun the above command with a new environment name in place of `env`. + +__Style__ +- Run `make lint` from `Python-packages/covidcast-py/` to run the lint commands. +- `mypy`, `pylint`, and `pydocstyle` are used for linting, with associated configurations for +`pylint` in `.pylintrc` and for `mypy` in `mypy.ini`. + +__Testing__ +- Run `make test` from `Python-packages/covidcast-py/` to run the test commands. +- `pytest` is the framework used in this package. +- Each function should have corresponding unit tests. +- Tests should be deterministic. +- Similarly, tests should not make network calls. + +__Documentation__ +- New public methods should have comprehensive docstrings and +an entry in the Sphinx documentation. +- Usage examples in Sphinx are recommended. + +## Building the Package and Documentation The package is fairly straightforward in structure, following the basic [packaging documentation](https://packaging.python.org/tutorials/packaging-projects/) and a few other pieces I found. -When you develop a new package version, there are several steps to consider: +When you develop a new package version, there are several steps to consider. +These are written from the `Python-packages/covidcast-py/` directory: 1. Increment the package version in `setup.py` and in Sphinx's `conf.py`. -2. Rebuild the package. You will need to install the `wheel` package: - - ```sh - python3 setup.py clean - python3 setup.py sdist bdist_wheel - ``` - - Verify the build worked without errors. -3. Locally install the package with `python3 setup.py install`. -4. Install dependencies with `pip3 install -r requirements.txt` +2. Install the requirements needed to build the package and documentation with `make install-requirements` +3. Rebuild and install the package locally with `make build-and-install` 5. Rebuild the documentation. The documentation lives in `docs/` and is built by [Sphinx](https://www.sphinx-doc.org/en/master/), which automatically reads the function docstrings and formats them. `docs/index.rst` contains the main documentation and the `.. autofunction::` directives insert documentation of specified functions. - To rebuild the documentation, install the `sphinx` package and run + To rebuild the documentation, run ```sh cd docs/ + make clean make html ``` @@ -36,7 +76,7 @@ When you develop a new package version, there are several steps to consider: If you make changes to `index.rst`, you can simply run `make html` to rebuild without needing to reinstall the package. -4. Upload to PyPI. It should be as easy as +6. Upload to PyPI. It should be as easy as ```sh twine upload dist/covidcast-0.0.9* diff --git a/Python-packages/covidcast-py/Makefile b/Python-packages/covidcast-py/Makefile new file mode 100644 index 00000000..1e445596 --- /dev/null +++ b/Python-packages/covidcast-py/Makefile @@ -0,0 +1,18 @@ +.PHONY = lint, test, install-requirements, build-and-install + +install-requirements: + pip install -r requirements_dev.txt + pip install -r requirements_ci.txt + +build-and-install: install-requirements + python3 setup.py clean + python3 setup.py sdist bdist_wheel + pip3 install -e . + +lint: + pylint covidcast/ --rcfile .pylintrc + mypy covidcast --config-file mypy.ini + pydocstyle covidcast/ + +test: + pytest tests/ -W ignore::UserWarning diff --git a/Python-packages/covidcast-py/bubble.png b/Python-packages/covidcast-py/bubble.png new file mode 100644 index 00000000..6484eae2 Binary files /dev/null and b/Python-packages/covidcast-py/bubble.png differ diff --git a/Python-packages/covidcast-py/covidcast/__init__.py b/Python-packages/covidcast-py/covidcast/__init__.py index 64c74ed4..5536e589 100644 --- a/Python-packages/covidcast-py/covidcast/__init__.py +++ b/Python-packages/covidcast-py/covidcast/__init__.py @@ -13,6 +13,6 @@ """ from .covidcast import signal, metadata, aggregate_signals -from .plotting import plot_choropleth, get_geo_df, animate +from .plotting import plot, plot_choropleth, get_geo_df, animate from .geography import (fips_to_name, cbsa_to_name, abbr_to_name, name_to_abbr, name_to_cbsa, name_to_fips) diff --git a/Python-packages/covidcast-py/covidcast/covidcast.py b/Python-packages/covidcast-py/covidcast/covidcast.py index aa63649a..dfcb709d 100644 --- a/Python-packages/covidcast-py/covidcast/covidcast.py +++ b/Python-packages/covidcast-py/covidcast/covidcast.py @@ -7,6 +7,8 @@ import pandas as pd from delphi_epidata import Epidata +from .errors import NoDataWarning + # Point API requests to the AWS endpoint Epidata.BASE_URL = "https://api.covidcast.cmu.edu/epidata/api.php" @@ -101,45 +103,52 @@ def signal(data_source: str, columns: ``geo_value`` - identifies the location, such as a state name or county FIPS code. The + Identifies the location, such as a state name or county FIPS code. The geographic coding used by COVIDcast is described in the `API documentation here `_. + ``signal`` + Name of the signal, same as the value of the ``signal`` input argument. Used for + downstream functions to recognize where this signal is from. + ``time_value`` - contains a `pandas Timestamp object + Contains a `pandas Timestamp object `_ identifying the date this estimate is for. ``issue`` - contains a `pandas Timestamp object + Contains a `pandas Timestamp object `_ identifying the date this estimate was issued. For example, an estimate with a ``time_value`` of June 3 might have been issued on June 5, after the data for June 3rd was collected and ingested into the API. ``lag`` - an integer giving the difference between ``issue`` and ``time_value``, + Integer giving the difference between ``issue`` and ``time_value``, in days. ``value`` - the signal quantity requested. For example, in a query for the + The signal quantity requested. For example, in a query for the ``confirmed_cumulative_num`` signal from the ``usa-facts`` source, this would be the cumulative number of confirmed cases in the area, as of the ``time_value``. ``stderr`` - the value's standard error, if available. + The value's standard error, if available. ``sample_size`` - indicates the sample size available in that geography on that day; + Indicates the sample size available in that geography on that day; sample size may not be available for all signals, due to privacy or other constraints. - ``direction`` - uses a local linear fit to estimate whether the signal in this region is - currently increasing or decreasing (reported as -1 for decreasing, 1 for - increasing, and 0 for neither). + ``geo_type`` + Geography type for the signal, same as the value of the ``geo_type`` input argument. + Used for downstream functions to parse ``geo_value`` correctly + + ``data_source`` + Name of the signal source, same as the value of the ``data_source`` input argument. Used for + downstream functions to recognize where this signal is from. Consult the `signal documentation `_ @@ -147,7 +156,6 @@ def signal(data_source: str, specific signals. """ - if geo_type not in VALID_GEO_TYPES: raise ValueError("geo_type must be one of " + ", ".join(VALID_GEO_TYPES)) @@ -250,7 +258,6 @@ def metadata() -> pd.DataFrame: ``max_lag`` Largest lag from observation to issue, in days. """ - meta = Epidata.covidcast_meta() if meta["result"] != 1: @@ -362,7 +369,6 @@ def _fetch_single_geo(data_source: str, entries. """ - as_of_str = _date_to_api_string(as_of) if as_of is not None else None issues_strs = _dates_to_api_strings(issues) if issues is not None else None @@ -379,10 +385,14 @@ def _fetch_single_geo(data_source: str, issues=issues_strs, lag=lag) # Two possible error conditions: no data or too much data. - if day_data["message"] != "success": - warnings.warn("Problem obtaining data on {day}: {message}".format( - day=day_str, - message=day_data["message"])) + if day_data["message"] == "no results": + warnings.warn(f"No {data_source} {signal} data found on {day_str} " + f"for geography '{geo_type}'", + NoDataWarning) + if day_data["message"] not in {"success", "no results"}: + warnings.warn(f"Problem obtaining {data_source} {signal} data on {day_str} " + f"for geography '{geo_type}': {day_data['message']}", + RuntimeWarning) # In the too-much-data case, we continue to try putting the truncated # data in our results. In the no-data case, skip this day entirely, @@ -394,7 +404,7 @@ def _fetch_single_geo(data_source: str, if len(dfs) > 0: out = pd.concat(dfs) - + out.drop("direction", axis=1, inplace=True) out["time_value"] = pd.to_datetime(out["time_value"], format="%Y%m%d") out["issue"] = pd.to_datetime(out["issue"], format="%Y%m%d") out["geo_type"] = geo_type @@ -409,7 +419,6 @@ def _signal_metadata(data_source: str, signal: str, # pylint: disable=W0621 geo_type: str) -> dict: """Fetch metadata for a single signal as a dict.""" - meta = metadata() mask = ((meta.data_source == data_source) & @@ -434,13 +443,11 @@ def _signal_metadata(data_source: str, def _date_to_api_string(date: date) -> str: # pylint: disable=W0621 """Convert a date object to a YYYYMMDD string expected by the API.""" - return date.strftime("%Y%m%d") def _dates_to_api_strings(dates: Union[date, list, tuple]) -> str: """Convert a date object, or pair of (start, end) objects, to YYYYMMDD strings.""" - if not isinstance(dates, (list, tuple)): return _date_to_api_string(dates) diff --git a/Python-packages/covidcast-py/covidcast/errors.py b/Python-packages/covidcast-py/covidcast/errors.py new file mode 100644 index 00000000..8f783baa --- /dev/null +++ b/Python-packages/covidcast-py/covidcast/errors.py @@ -0,0 +1,5 @@ +"""Custom warnings and exceptions for covidcast functions.""" + + +class NoDataWarning(Warning): + """Warning raised when no data is returned on a given day by covidcast.signal().""" diff --git a/Python-packages/covidcast-py/covidcast/geography.py b/Python-packages/covidcast-py/covidcast/geography.py index 61be0cf0..480496cb 100644 --- a/Python-packages/covidcast-py/covidcast/geography.py +++ b/Python-packages/covidcast-py/covidcast/geography.py @@ -1,3 +1,4 @@ +"""Functions for converting and mapping between geographic types.""" import re import warnings from typing import Union, Iterable @@ -237,7 +238,7 @@ def _lookup(key: Union[str, Iterable], def _get_first_tie(dict_list: list) -> list: - """Return a list with the first value for the first key for each of the input dicts + """Return a list with the first value for the first key for each of the input dicts. Needs to be Python 3.6+ for this to work, since earlier versions don't preserve insertion order. diff --git a/Python-packages/covidcast-py/covidcast/plotting.py b/Python-packages/covidcast-py/covidcast/plotting.py index 7ef10f80..2eb0e78b 100644 --- a/Python-packages/covidcast-py/covidcast/plotting.py +++ b/Python-packages/covidcast-py/covidcast/plotting.py @@ -1,17 +1,17 @@ """This contains the plotting and geo data management methods for the COVIDcast signals.""" import io +import warnings from datetime import date, timedelta from typing import Tuple, Any - import geopandas as gpd import imageio -import matplotlib.figure import numpy as np import pandas as pd import pkg_resources from matplotlib import pyplot as plt +from matplotlib import figure, axes from tqdm import tqdm from .covidcast import _detect_metadata, _signal_metadata @@ -39,10 +39,12 @@ "46", "47", "48", "49", "50", "51", "53", "54", "55", "56"} -def plot_choropleth(data: pd.DataFrame, - time_value: date = None, - **kwargs: Any) -> matplotlib.figure.Figure: - """Given the output data frame of :py:func:`covidcast.signal`, plot a choropleth map. +def plot(data: pd.DataFrame, + time_value: date = None, + plot_type: str = "choropleth", + combine_megacounties: bool = True, + **kwargs: Any) -> figure.Figure: + """Given the output data frame of :py:func:`covidcast.signal`, plot a choropleth or bubble map. Projections used for plotting: @@ -64,52 +66,62 @@ def plot_choropleth(data: pd.DataFrame, documentation `_. + Bubble maps use a purple bubble by default, with all values discretized into 8 bins between 0.1 + and the signal's historical mean value + 3 standard deviations. Values below 0 have no + bubble but have the region displayed in white, and values above the mean + 3 std dev are binned + into the highest bubble. Bubbles are scaled by area. + :param data: Data frame of signal values, as returned from :py:func:`covidcast.signal`. :param time_value: If multiple days of data are present in ``data``, map only values from this day. Defaults to plotting the most recent day of data in ``data``. + :param combine_megacounties: For each state, display all counties without a signal value as a + single polygon with the megacounty value, as opposed to plotting all the county boundaries. + Defaults to `True`. :param kwargs: Optional keyword arguments passed to ``GeoDataFrame.plot()``. + :param plot_type: Type of plot to create. Either choropleth (default) or bubble map. :return: Matplotlib figure object. """ - + if plot_type not in {"choropleth", "bubble"}: + raise ValueError("`plot_type` must be 'choropleth' or 'bubble'.") data_source, signal, geo_type = _detect_metadata(data) # pylint: disable=W0212 meta = _signal_metadata(data_source, signal, geo_type) # pylint: disable=W0212 # use most recent date in data if none provided day_to_plot = time_value if time_value else max(data.time_value) day_data = data.loc[data.time_value == pd.to_datetime(day_to_plot), :] - data_w_geo = get_geo_df(day_data) - - kwargs["vmin"] = kwargs.get("vmin", 0) kwargs["vmax"] = kwargs.get("vmax", meta["mean_value"] + 3 * meta["stdev_value"]) - kwargs["cmap"] = kwargs.get("cmap", "YlOrRd") kwargs["figsize"] = kwargs.get("figsize", (12.8, 9.6)) - fig, ax = plt.subplots(1, figsize=kwargs["figsize"]) - ax.axis("off") - sm = plt.cm.ScalarMappable(cmap=kwargs["cmap"], - norm=plt.Normalize(vmin=kwargs["vmin"], vmax=kwargs["vmax"])) - # this is to remove the set_array error that occurs on some platforms - sm._A = [] # pylint: disable=W0212 - plt.title(f"{data_source}: {signal}, {day_to_plot.strftime('%Y-%m-%d')}") + fig, ax = _plot_background_states(kwargs["figsize"]) + ax.set_title(f"{data_source}: {signal}, {day_to_plot.strftime('%Y-%m-%d')}") + if plot_type == "choropleth": + _plot_choro(ax, day_data, combine_megacounties, **kwargs) + else: + _plot_bubble(ax, day_data, geo_type, **kwargs) + return fig - # plot all states as light grey first - state_shapefile_path = pkg_resources.resource_filename(__name__, SHAPEFILE_PATHS["state"]) - state = gpd.read_file(state_shapefile_path) - for state in _project_and_transform(state, "STATEFP"): - state.plot(color="0.9", ax=ax) - for shape in _project_and_transform(data_w_geo): - if not shape.empty: # only plot nonempty ones to avoid matplotlib warning. - shape.plot("value", ax=ax, **kwargs) - plt.colorbar(sm, ticks=np.linspace(kwargs["vmin"], kwargs["vmax"], 8), ax=ax, - orientation="horizontal", fraction=0.045, pad=0.04, format="%.2f") - return fig +def plot_choropleth(data: pd.DataFrame, + time_value: date = None, + combine_megacounties: bool = True, + **kwargs: Any) -> figure.Figure: + """Plot choropleths for a signal. This method is deprecated and has been generalized to plot(). + + :param data: Data frame of signal values, as returned from :py:func:`covidcast.signal`. + :param time_value: If multiple days of data are present in ``data``, map only values from this + day. Defaults to plotting the most recent day of data in ``data``. + :param kwargs: Optional keyword arguments passed to ``GeoDataFrame.plot()``. + :return: Matplotlib figure object. + """ + warnings.warn("Function `plot_choropleth` is deprecated. Use `plot()` instead.") + return plot(data, time_value, "choropleth", combine_megacounties, **kwargs) def get_geo_df(data: pd.DataFrame, geo_value_col: str = "geo_value", geo_type_col: str = "geo_type", - join_type: str = "right") -> gpd.GeoDataFrame: + join_type: str = "right", + combine_megacounties: bool = False) -> gpd.GeoDataFrame: """Augment a :py:func:`covidcast.signal` data frame with the shape of each geography. This method takes in a pandas DataFrame object and returns a GeoDataFrame @@ -127,9 +139,11 @@ def get_geo_df(data: pd.DataFrame, ``outer``, and ``inner`` joins are also supported and can be selected with the ``join_type`` argument. - For right joins on counties, all counties without a signal value will be - given the value of the megacounty (if present). Other joins will not use - megacounties. See the `geographic coding documentation + If ``combine_megacounties=False`` (default) all counties without a signal value will be + given the value of the megacounty if present. If ``combine_megacounties=True``, a left join + will be conducted and the megacounty rows will be given a polygon of the union of all + constituent counties without a value. Other joins will not use megacounties. + See the `geographic coding documentation `_ for information about megacounties. @@ -149,6 +163,8 @@ def get_geo_df(data: pd.DataFrame, :param geo_type_col: Name of column containing geography type. :param join_type: Type of join to do between input data (left side) and geo data (right side). Must be one of `right` (default), `left`, `outer`, or `inner`. + :param combine_megacounties: For each state, return all counties without a signal value as a + single row and polygon with the megacounty value. Defaults to `False`. :return: GeoDataFrame containing all columns from the input ``data``, along with a ``geometry`` column (containing a polygon) and a ``state_fips`` column (a two-digit FIPS code identifying the US state containing this @@ -157,7 +173,6 @@ def get_geo_df(data: pd.DataFrame, WGS84 for HRRs. """ - if join_type == "right" and any(data[geo_value_col].duplicated()): raise ValueError("join_type `right` is incompatible with duplicate values in a " "given region. Use `left` or ensure your input data is a single signal for" @@ -178,7 +193,7 @@ def get_geo_df(data: pd.DataFrame, geo_info["geometry"] = geo_info["geometry"].translate(0, -0.185) # fix projection shift bug output = _join_hrr_geo_df(data, geo_value_col, geo_info, join_type) else: # geo_type must be "county" - output = _join_county_geo_df(data, geo_value_col, geo_info, join_type) + output = _join_county_geo_df(data, geo_value_col, geo_info, join_type, combine_megacounties) return output @@ -194,7 +209,7 @@ def animate(data: pd.DataFrame, filepath: str, fps: int = 3, dpi: int = 150, **k :param filepath: Path where video will be saved. Filename must contain supported extension. :param fps: Frame rate in frames per second for animation. Defaults to 3. :param dpi: Dots per inch for output video. Defaults to 150 on a 12.8x9.6 figure (1920x1440). - :param kwargs: Optional keyword arguments passed to :py:func:`covidcast.plot_choropleth`. + :param kwargs: Optional keyword arguments passed to :py:func:`covidcast.plot`. :return: None """ # probesize is set to avoid warning by ffmpeg on frame rate up to 4k resolution. @@ -203,7 +218,7 @@ def animate(data: pd.DataFrame, filepath: str, fps: int = 3, dpi: int = 150, **k day_list = [min(data.time_value) + timedelta(days=x) for x in range(num_days+1)] for d in tqdm(day_list): buf = io.BytesIO() - plot_choropleth(data, time_value=d, **kwargs) + plot(data, time_value=d, **kwargs) plt.savefig(buf, dpi=dpi) plt.close() buf.seek(0) @@ -211,6 +226,83 @@ def animate(data: pd.DataFrame, filepath: str, fps: int = 3, dpi: int = 150, **k writer.close() +def _plot_choro(ax: axes.Axes, + data: gpd.GeoDataFrame, + combine_megacounties: bool, + **kwargs: Any) -> None: + """Generate a choropleth map on a given Figure/Axes from a GeoDataFrame. + + :param ax: Matplotlib axes to plot on. + :param data: GeoDataFrame with information to plot. + :param kwargs: Optional keyword arguments passed to ``GeoDataFrame.plot()``. + :return: Matplotlib axes with the plot added. + """ + kwargs["vmin"] = kwargs.get("vmin", 0) + kwargs["cmap"] = kwargs.get("cmap", "YlOrRd") + data_w_geo = get_geo_df(data, combine_megacounties=combine_megacounties) + for shape in _project_and_transform(data_w_geo): + if not shape.empty: + shape.plot(column="value", ax=ax, **kwargs) + sm = plt.cm.ScalarMappable(cmap=kwargs["cmap"], + norm=plt.Normalize(vmin=kwargs["vmin"], vmax=kwargs["vmax"])) + # this is to remove the set_array error that occurs on some platforms + sm._A = [] # pylint: disable=W0212 + plt.colorbar(sm, ticks=np.linspace(kwargs["vmin"], kwargs["vmax"], 8), ax=ax, + orientation="horizontal", fraction=0.045, pad=0.04, format="%.2f") + + +def _plot_bubble(ax: axes.Axes, data: gpd.GeoDataFrame, geo_type: str, **kwargs: Any) -> None: + """Generate a bubble map on a given Figure/Axes from a GeoDataFrame. + + The maximum bubble size is set to the figure area / 1.5, with a x3 multiplier if the geo_type + is ``state``. + + :param ax: Matplotlib axes to plot on. + :param data: GeoDataFrame with information to plot. + :param kwargs: Optional keyword arguments passed to ``GeoDataFrame.plot()``. + :return: Matplotlib axes with the plot added. + """ + kwargs["vmin"] = kwargs.get("vmin", 0.1) + kwargs["color"] = kwargs.get("color", "purple") + kwargs["alpha"] = kwargs.get("alpha", 0.5) + data_w_geo = get_geo_df(data, join_type="inner") + label_bins = np.linspace(kwargs["vmin"], kwargs["vmax"], 8) # set bin labels + value_bins = list(label_bins) + [np.inf] # set ranges for bins by adding +inf for largest bin + # set max bubble size proportional to figure size, with a multiplier for state plots + state_multiple = 3 if geo_type == "state" else 1 + bubble_scale = np.prod(kwargs["figsize"]) / 1.5 / kwargs["vmax"] * state_multiple + # discretize data and scale labels to correct sizes + data_w_geo["binval"] = pd.cut(data_w_geo.value, labels=label_bins, bins=value_bins, right=False) + data_w_geo["binval"] = data_w_geo.binval.astype(float) * bubble_scale + for shape in _project_and_transform(data_w_geo): + if not shape.empty and not shape.binval.isnull().values.all(): + shape.plot(color="1", ax=ax, legend=True, edgecolor="0.8", linewidth=0.5) + shape["geometry"] = shape["geometry"].centroid # plot bubbles at each polgyon centroid + shape.plot(markersize="binval", color=kwargs["color"], ax=ax, alpha=kwargs["alpha"]) + # to generate the legend, need to plot the reference points as scatter plots off the map + for b in label_bins: + ax.scatter([1e10], [1e10], color=kwargs["color"], alpha=kwargs["alpha"], + s=b*bubble_scale, label=round(b, 2)) + ax.legend(frameon=False, ncol=8, loc="lower center", bbox_to_anchor=(0.5, -0.1)) + + +def _plot_background_states(figsize: tuple) -> tuple: + """Plot US states in light grey as the background for other plots. + + :param figsize: Dimensions of plot. + :return: Matplotlib figure and axes. + """ + fig, ax = plt.subplots(1, figsize=figsize) + ax.axis("off") + state_shapefile_path = pkg_resources.resource_filename(__name__, SHAPEFILE_PATHS["state"]) + state = gpd.read_file(state_shapefile_path) + for state in _project_and_transform(state, "STATEFP"): + state.plot(color="0.9", ax=ax, edgecolor="0.8", linewidth=0.5) + ax.set_xlim(plt.xlim()) + ax.set_ylim(plt.ylim()) + return fig, ax + + def _project_and_transform(data: gpd.GeoDataFrame, state_col: str = "state_fips") -> Tuple: """Segment and break GeoDF into Contiguous US, Alaska, Puerto Rico, and Hawaii for plotting. @@ -245,6 +337,8 @@ def _join_state_geo_df(data: pd.DataFrame, :param data: DF with state info :param state_col: name of column in `data` containing state info to join on :param geo_info: GeoDF of state shape info read from Census shapefiles + :param join_type: Type of join to do between input data (left side) and geo data (right side). + Must be one of {‘left’, ‘right’, ‘outer’, ‘inner’}. :return: ``data`` with state polygon and state FIPS joined. """ input_cols = list(data.columns) @@ -259,7 +353,8 @@ def _join_state_geo_df(data: pd.DataFrame, def _join_county_geo_df(data: pd.DataFrame, county_col: str, geo_info: gpd.GeoDataFrame, - join_type: str = "right") -> gpd.GeoDataFrame: + join_type: str = "right", + combine_megacounties: bool = False) -> gpd.GeoDataFrame: """Join DF information to polygon information in a GeoDF at the county level. Counties with no direct key in the data DF will have the megacounty value joined. @@ -267,25 +362,81 @@ def _join_county_geo_df(data: pd.DataFrame, :param data: DF with county info. :param county_col: name of column in `data` containing county info to join on. :param geo_info: GeoDF of county shape info read from Census shapefiles. + :param join_type: Type of join to do between input data (left side) and geo data (right side). + Must be one of {‘left’, ‘right’, ‘outer’, ‘inner’}. + :param combine_megacounties: For each state, return all counties without a signal value as a + single polygon with the megacounty value. :return: ``data`` with county polygon and state fips joined. """ input_cols = list(data.columns) # create state FIPS code in copy, otherwise original gets modified data = data.assign(state=[i[:2] for i in data[county_col]]) + if combine_megacounties: + merged = _combine_megacounties(data, county_col, geo_info) + else: + merged = _distribute_megacounties(data, county_col, geo_info, join_type) + merged[county_col] = merged.GEOID.combine_first(merged[county_col]) + merged.rename(columns={"STATEFP": "state_fips"}, inplace=True) + return gpd.GeoDataFrame(merged[input_cols + ["state_fips", "geometry"]]) + + +def _combine_megacounties(data: pd.DataFrame, + county_col: str, + geo_info: gpd.GeoDataFrame) -> gpd.GeoDataFrame: + """Join a DataFrame of county signals with a GeoDataFrame of polygons for plotting. + + Merges a DataFrame of counties and signals with a DataFrame of county polygons. Megacounties, + if present, are assigned a polygon which is the union of all counties in the state with no + signal value. + + :param data: DataFrame of county signals. + :param county_col: Name of column containing county. + :parm geo_info: GeoDataFrame of counties and corresponding polygons. + :return: ``data`` with county polygon and state fips joined. No polgyon information is + provided for counties without a signal value since they are captured by the megacounty + polygon. + """ + merged = data.merge(geo_info, how="left", left_on=county_col, right_on="GEOID", sort=True) + missing = set(geo_info.GEOID) - set(data[county_col]) + for i, row in merged.iterrows(): + if _is_megacounty(row[county_col]): + state = row[county_col][:2] + state_missing = [j for j in missing if j.startswith(state)] + combined_poly = geo_info.loc[geo_info.GEOID.isin(state_missing), "geometry"].unary_union + # pandas has a bug when assigning MultiPolygons, so you need to do this weird workaround + # https://github.com/geopandas/geopandas/issues/992 + merged.loc[[i], "geometry"] = gpd.GeoSeries(combined_poly).values + merged.loc[[i], "STATEFP"] = state + return merged + + +def _distribute_megacounties(data: pd.DataFrame, + county_col: str, + geo_info: gpd.GeoDataFrame, + join_type: str = "right") -> gpd.GeoDataFrame: + """Join a DataFrame of county signals with a GeoDataFrame of polygons for plotting. + + Merges a DataFrame of counties and signals with a DataFrame of county polygons. Counties + without a value but with a corresponding megacounty take on the megacounty value. + + :param data: DataFrame of county signals. + :param county_col: Name of column containing county. + :param geo_info: GeoDataFrame of counties and corresponding polygons. + :param join_type: Type of join to do between input data (left side) and geo data (right side). + Must be one of {‘left’, ‘right’, ‘outer’, ‘inner’}. + :return: ``data`` with county polygon and state fips joined. No polgyon information is + provided for megacounties. + """ # join all counties with valid FIPS - merged = data.merge(geo_info, how=join_type, left_on=county_col, right_on="GEOID", sort=True) - mega_county_df = data.loc[[i.endswith("000") for i in data[county_col]], :] - if not mega_county_df.empty and join_type == "right": + merged = data.merge(geo_info, how=join_type, left_on=county_col, right_on="GEOID", sort=True) + mega_df = data.loc[[_is_megacounty(i) for i in data[county_col]], :] + if not mega_df.empty and join_type == "right": # if mega counties exist, join them on state - merged = merged.merge(mega_county_df, how="left", left_on="STATEFP", - right_on="state", sort=True) + merged = merged.merge(mega_df, how="left", left_on="STATEFP", right_on="state", sort=True) # if no county value present, us the megacounty values - for c in input_cols: + for c in data.columns: merged[c] = merged[f"{c}_x"].combine_first(merged[f"{c}_y"]) - # use the full county FIPS list in the return - merged[county_col] = merged.GEOID.combine_first(merged[county_col]) - merged.rename(columns={"STATEFP": "state_fips"}, inplace=True) - return gpd.GeoDataFrame(merged[input_cols + ["state_fips", "geometry"]]) + return merged def _join_msa_geo_df(data: pd.DataFrame, @@ -299,6 +450,8 @@ def _join_msa_geo_df(data: pd.DataFrame, :param data: DF with state info :param msa_col: cname of column in `data` containing state info to join on :param geo_info: GeoDF of state shape info read from Census shapefiles + :param join_type: Type of join to do between input data (left side) and geo data (right side). + Must be one of {‘left’, ‘right’, ‘outer’, ‘inner’}. :return: ``data`` with cbsa polygon and state fips joined. """ geo_info = geo_info[geo_info.LSAD == "M1"] # only get metro and not micropolitan areas @@ -320,6 +473,8 @@ def _join_hrr_geo_df(data: pd.DataFrame, :param data: DF with state info :param msa_col: cname of column in `data` containing state info to join on :param geo_info: GeoDF of state shape info read from Census shapefiles + :param join_type: Type of join to do between input data (left side) and geo data (right side). + Must be one of {‘left’, ‘right’, ‘outer’, ‘inner’}. :return: ``data`` with HRR polygon and state fips joined. """ geo_info["hrr_num"] = geo_info.hrr_num.astype("int").astype(str) # original col was a float @@ -330,3 +485,13 @@ def _join_hrr_geo_df(data: pd.DataFrame, # get the first state, which will be the first two characters in the HRR name merged["state_fips"] = [STATE_ABBR_TO_FIPS.get(i[:2]) for i in merged.hrr_name] return gpd.GeoDataFrame(merged[input_cols + ["state_fips", "geometry"]]) + + +def _is_megacounty(fips: str) -> bool: + """Determine if a code is a megacounty. + + :param fips: FIPS code to test. + :return: Boolean for if the input code is a megacounty or not. + + """ + return fips.endswith("000") and len(fips) == 5 diff --git a/Python-packages/covidcast-py/docs/changelog.rst b/Python-packages/covidcast-py/docs/changelog.rst index 87cf900c..132b402e 100644 --- a/Python-packages/covidcast-py/docs/changelog.rst +++ b/Python-packages/covidcast-py/docs/changelog.rst @@ -1,6 +1,11 @@ Changelog ========= +v0.1.1, TODO +- ``Direction`` is no longer supported and has been removed from the output + of :py:func:`covidcast.signal` + + v0.1.0, October 1, 2020 ----------------------- diff --git a/Python-packages/covidcast-py/docs/getting_started.rst b/Python-packages/covidcast-py/docs/getting_started.rst index 2ffedbb2..9862ff59 100644 --- a/Python-packages/covidcast-py/docs/getting_started.rst +++ b/Python-packages/covidcast-py/docs/getting_started.rst @@ -3,19 +3,73 @@ Getting Started =============== + +Overview +------------ + This package provides access to data from the `COVIDcast API `_, which -provides numerous COVID-related data streams, updated daily. To begin, you'll -want to browse the available data streams and determine which signals are useful -to you. The `COVIDcast interactive map `_ displays a -selection of the signals, and by selecting the Export Data feature, you can get -example Python code to access each of the signals. +provides numerous COVID-related data streams, updated daily. The data is retrieved +live from the server when you make a request, and is not stored within the package +itself. This means that each time you make a request, you will be receiving the latest +data available. If you are conducting an analysis or powering a service which will require +repeated access to the same signal, please download the data rather than making repeated +requests. + + +Installation +------------ + +This package is available on PyPI as `covidcast +`_, and can be installed using ``pip`` or +your favorite Python package manager: -To browse in more detail, the `data sources and signal documentation +.. code-block:: sh + + pip install covidcast + +This will install the package as well as all required dependencies. + +Signal Overview +--------------- +The `API documentation `_ lists -the available signals, including many not shown on the interactive map. Simply -obtain the data source and signal names from the list. Below we will demonstrate -how to use these names to access each data stream. +the available signals, including many not shown on the +`COVIDcast interactive map +`_. + +The data come from a variety of sources and cover information including official case counts, +internet search trends, hospital encounters, survey responses, and more. Below is +a brief overview of each source, with links to their full descriptions. + +- `Doctor Visits `_ + - Outpatient visits with COVID-related symptoms. +- `Google Health Trends `_ + - COVID-related Google search volume. +- `Hospital Admissions `_ + - Hospital admissions with COVID-associated diagnoses. +- `Indicator Combination `_ + - Aggregated signal of other sources to provide a single COVID activity indicator. +- `JHU Cases and Deaths `_ + - Confirmed COVID cases and deaths based on reports made available by Johns Hopkins University. +- `Quidel `_ + - Positive COVID antigen tests. +- `SafeGraph Mobility `_ + - Mobility (movement) data based on phone location data +- `Symptom Surveys `_ + - Various responses to the CMU symptom survey. +- `USAFacts Cases and Deaths `_ + - Confirmed COVID cases and deaths based on reports made available by USAFacts. + +To specify a signal, you will need the "Source Name" and "Signal" value, which are +listed for each source/signal combination on their respective page. +For example, to obtain the raw Google search volume for COVID-related topics, +the source would be ``ght`` and the signal would be ``raw_search``, as shown on the +`Google Health Trends page +`_. +These values will be provided as the arguments for the :py:func:`covidcast.signal` function to +retrieve the desired data. + Basic examples -------------- @@ -30,12 +84,12 @@ distributed through Facebook, for every county in the United States between ... date(2020, 5, 1), date(2020, 5, 7), ... "county") >>> data.head() - direction geo_value issue lag sample_size stderr time_value value -0 0.0 01000 2020-05-23 22 1722.4551 0.125573 2020-05-01 0.823080 -1 1.0 01001 2020-05-23 22 115.8025 0.800444 2020-05-01 1.261261 -2 0.0 01003 2020-05-23 22 584.3194 0.308680 2020-05-01 0.665129 -3 0.0 01015 2020-05-23 22 122.5577 0.526590 2020-05-01 0.574713 -4 NaN 01031 2020-05-23 22 114.8318 0.347450 2020-05-01 0.408163 + geo_value issue lag sample_size stderr time_value value +0 01000 2020-05-23 22 1722.4551 0.125573 2020-05-01 0.823080 +1 01001 2020-05-23 22 115.8025 0.800444 2020-05-01 1.261261 +2 01003 2020-05-23 22 584.3194 0.308680 2020-05-01 0.665129 +3 01015 2020-05-23 22 122.5577 0.526590 2020-05-01 0.574713 +4 01031 2020-05-23 22 114.8318 0.347450 2020-05-01 0.408163 Each row represents one observation in one county on one day. The county FIPS code is given in the ``geo_value`` column, the date in the ``time_value`` @@ -61,12 +115,12 @@ example, we obtain ``smoothed_cli`` in each state for every day since >>> data = covidcast.signal("fb-survey", "smoothed_cli", ... date(2020, 5, 1), geo_type="state") >>> data.head() - direction geo_value issue lag sample_size stderr time_value value -0 -1.0 ak 2020-05-23 22 1606.0000 0.158880 2020-05-01 0.460772 -1 0.0 al 2020-05-23 22 7540.2437 0.082553 2020-05-01 0.699511 -2 -1.0 ar 2020-05-23 22 4921.4827 0.103651 2020-05-01 0.759798 -3 0.0 az 2020-05-23 22 11220.9587 0.061794 2020-05-01 0.566937 -4 0.0 ca 2020-05-23 22 51870.1382 0.022803 2020-05-01 0.364908 + geo_value issue lag sample_size stderr time_value value +0 ak 2020-05-23 22 1606.0000 0.158880 2020-05-01 0.460772 +1 al 2020-05-23 22 7540.2437 0.082553 2020-05-01 0.699511 +2 ar 2020-05-23 22 4921.4827 0.103651 2020-05-01 0.759798 +3 az 2020-05-23 22 11220.9587 0.061794 2020-05-01 0.566937 +4 ca 2020-05-23 22 51870.1382 0.022803 2020-05-01 0.364908 Using the ``geo_values`` argument, we can request data for a specific geography, such as the state of Pennsylvania for the month of May 2020: @@ -75,12 +129,12 @@ such as the state of Pennsylvania for the month of May 2020: ... date(2020, 5, 1), date(2020, 5, 31), ... geo_type="state", geo_values="pa") >>> pa_data.head() - direction geo_value issue lag sample_size stderr time_value value -0 -1 pa 2020-05-23 22 31576.0165 0.030764 2020-05-01 0.400011 -0 -1 pa 2020-05-23 21 31344.0168 0.030708 2020-05-02 0.394774 -0 0 pa 2020-05-23 20 30620.0162 0.031173 2020-05-03 0.396340 -0 -1 pa 2020-05-23 19 30419.0163 0.029836 2020-05-04 0.357501 -0 0 pa 2020-05-23 18 29245.0172 0.030176 2020-05-05 0.354521 + geo_value issue lag sample_size stderr time_value value +0 pa 2020-05-23 22 31576.0165 0.030764 2020-05-01 0.400011 +0 pa 2020-05-23 21 31344.0168 0.030708 2020-05-02 0.394774 +0 pa 2020-05-23 20 30620.0162 0.031173 2020-05-03 0.396340 +0 pa 2020-05-23 19 30419.0163 0.029836 2020-05-04 0.357501 +0 pa 2020-05-23 18 29245.0172 0.030176 2020-05-05 0.354521 We can request multiple states by providing a list, such as ``["pa", "ny", "mo"]``. @@ -149,8 +203,8 @@ the ``as_of`` argument: ... start_day=date(2020, 5, 1), end_day=date(2020, 5, 1), ... geo_type="state", geo_values="pa", ... as_of=date(2020, 5, 7)) - direction geo_value issue lag sample_size stderr time_value value -0 -1 pa 2020-05-07 6 None None 2020-05-01 2.32192 + geo_value issue lag sample_size stderr time_value value +0 pa 2020-05-07 6 None None 2020-05-01 2.32192 This shows that an estimate of about 2.3% was issued on May 7. If we don't specify ``as_of``, we get the most recent estimate available: @@ -158,8 +212,8 @@ specify ``as_of``, we get the most recent estimate available: >>> covidcast.signal("doctor-visits", "smoothed_cli", ... start_day=date(2020, 5, 1), end_day=date(2020, 5, 1), ... geo_type="state", geo_values="pa") - direction geo_value issue lag sample_size stderr time_value value -0 -1 pa 2020-07-04 64 None None 2020-05-01 5.075015 + geo_value issue lag sample_size stderr time_value value +0 pa 2020-07-04 64 None None 2020-05-01 5.075015 Note the substantial change in the estimate, to over 5%, reflecting new data that became available *after* May 7 about visits occurring on May 1. This @@ -175,16 +229,16 @@ period: ... start_day=date(2020, 5, 1), end_day=date(2020, 5, 1), ... geo_type="state", geo_values="pa", ... issues=(date(2020, 5, 1), date(2020, 5, 15))) - direction geo_value issue lag sample_size stderr time_value value -0 -1 pa 2020-05-05 4 None None 2020-05-01 1.693061 -1 -1 pa 2020-05-06 5 None None 2020-05-01 2.524167 -2 -1 pa 2020-05-07 6 None None 2020-05-01 2.321920 -3 0 pa 2020-05-08 7 None None 2020-05-01 2.897032 -4 0 pa 2020-05-09 8 None None 2020-05-01 2.956456 -5 0 pa 2020-05-12 11 None None 2020-05-01 3.190634 -6 0 pa 2020-05-13 12 None None 2020-05-01 3.220023 -7 0 pa 2020-05-14 13 None None 2020-05-01 3.231314 -8 0 pa 2020-05-15 14 None None 2020-05-01 3.239970 + geo_value issue lag sample_size stderr time_value value +0 pa 2020-05-05 4 None None 2020-05-01 1.693061 +1 pa 2020-05-06 5 None None 2020-05-01 2.524167 +2 pa 2020-05-07 6 None None 2020-05-01 2.321920 +3 pa 2020-05-08 7 None None 2020-05-01 2.897032 +4 pa 2020-05-09 8 None None 2020-05-01 2.956456 +5 pa 2020-05-12 11 None None 2020-05-01 3.190634 +6 pa 2020-05-13 12 None None 2020-05-01 3.220023 +7 pa 2020-05-14 13 None None 2020-05-01 3.231314 +8 pa 2020-05-15 14 None None 2020-05-01 3.239970 This estimate was clearly updated many times as new data for May 1st arrived. Note that these results include only data issued or updated between 2020-05-01 @@ -199,12 +253,12 @@ issues 7 days after the corresponding ``time_value``: >>> covidcast.signal("doctor-visits", "smoothed_cli", ... start_day=date(2020, 5, 1), end_day=date(2020, 5, 7), ... geo_type="state", geo_values="pa", lag=7) - direction geo_value issue lag sample_size stderr time_value value -0 0 pa 2020-05-08 7 None None 2020-05-01 2.897032 -0 -1 pa 2020-05-09 7 None None 2020-05-02 2.802238 -0 0 pa 2020-05-12 7 None None 2020-05-05 3.483125 -0 0 pa 2020-05-13 7 None None 2020-05-06 2.968670 -0 0 pa 2020-05-14 7 None None 2020-05-07 2.400255 + geo_value issue lag sample_size stderr time_value value +0 pa 2020-05-08 7 None None 2020-05-01 2.897032 +0 pa 2020-05-09 7 None None 2020-05-02 2.802238 +0 pa 2020-05-12 7 None None 2020-05-05 3.483125 +0 pa 2020-05-13 7 None None 2020-05-06 2.968670 +0 pa 2020-05-14 7 None None 2020-05-07 2.400255 Note that though this query requested all values between 2020-05-01 and 2020-05-07, May 3rd and May 4th were *not* included in the results set. This is @@ -215,12 +269,12 @@ on May 10th (a 7-day lag), but in fact the value was not updated on that day: ... start_day=date(2020, 5, 3), end_day=date(2020, 5, 3), ... geo_type="state", geo_values="pa", ... issues=(date(2020, 5, 9), date(2020, 5, 15))) - direction geo_value issue lag sample_size stderr time_value value -0 -1 pa 2020-05-09 6 None None 2020-05-03 2.749537 -1 -1 pa 2020-05-12 9 None None 2020-05-03 2.989626 -2 -1 pa 2020-05-13 10 None None 2020-05-03 3.006860 -3 -1 pa 2020-05-14 11 None None 2020-05-03 2.970561 -4 -1 pa 2020-05-15 12 None None 2020-05-03 3.038054 + geo_value issue lag sample_size stderr time_value value +0 pa 2020-05-09 6 None None 2020-05-03 2.749537 +1 pa 2020-05-12 9 None None 2020-05-03 2.989626 +2 pa 2020-05-13 10 None None 2020-05-03 3.006860 +3 pa 2020-05-14 11 None None 2020-05-03 2.970561 +4 pa 2020-05-15 12 None None 2020-05-03 3.038054 Dealing with geographies ------------------------ @@ -243,10 +297,10 @@ We can use these functions to quickly query data for specific regions: ... start_day=date(2020, 5, 1), end_day=date(2020, 5, 1), ... geo_values=counties) >>> df - geo_value signal time_value direction issue lag value stderr sample_size geo_type data_source -0 42003 smoothed_cli 2020-05-01 -1 2020-07-04 64 1.336086 None None county doctor-visits -0 06037 smoothed_cli 2020-05-01 0 2020-07-04 64 5.787655 None None county doctor-visits -0 12086 smoothed_cli 2020-05-01 -1 2020-07-04 64 6.405477 None None county doctor-visits + geo_value signal time_value issue lag value stderr sample_size geo_type data_source +0 42003 smoothed_cli 2020-05-01 2020-07-04 64 1.336086 None None county doctor-visits +0 06037 smoothed_cli 2020-05-01 2020-07-04 64 5.787655 None None county doctor-visits +0 12086 smoothed_cli 2020-05-01 2020-07-04 64 6.405477 None None county doctor-visits We can also quickly convert back from the IDs returned by the API to diff --git a/Python-packages/covidcast-py/docs/index.rst b/Python-packages/covidcast-py/docs/index.rst index 39d6c52b..d1c33855 100644 --- a/Python-packages/covidcast-py/docs/index.rst +++ b/Python-packages/covidcast-py/docs/index.rst @@ -23,6 +23,8 @@ all the data sources and signals available through this API. The package source code and bug tracker can be found `on GitHub `_. +To get started, check out :ref:``. + .. note :: **You should consider subscribing** to the `API mailing list `_ to be @@ -39,23 +41,6 @@ The package source code and bug tracker can be found `on GitHub research product and not warranted for a particular purpose. -Installation ------------- - -This package is available on PyPI as `covidcast -`_, and can be installed using ``pip`` or -your favorite Python package manager: - -.. code-block:: sh - - pip install covidcast - -The package requires `pandas `_, `requests -`_, and several other packages; -these should be installed automatically. It also uses the `delphi-epidata -`_ package to access Delphi's Epidata -API. - Contents -------- diff --git a/Python-packages/covidcast-py/docs/plot_examples.rst b/Python-packages/covidcast-py/docs/plot_examples.rst index 9c9eab42..85d10026 100644 --- a/Python-packages/covidcast-py/docs/plot_examples.rst +++ b/Python-packages/covidcast-py/docs/plot_examples.rst @@ -6,11 +6,11 @@ Plotting Examples Built-in functionality ---------------------- The returned DataFrame from :py:func:`covidcast.signal` can be plotted using the built-in -:py:func:`covidcast.plot_choropleth`. Currently, state, county, hospital referral regions +:py:func:`covidcast.plot`. Currently, state, county, hospital referral regions (HRR), and metropolitan statistical area (MSA) geography types are supported. -County-level maps show estimates for each county, and color each state by the -megacounty estimates, if available. (Megacounties represent all counties with +County-level maps show estimates for each county, and color each state as a single +polygon with the megacounty estimates, if available. (Megacounties represent all counties with insufficient sample size to report in that state; see the `geographic coding documentation `_ for @@ -24,7 +24,7 @@ details.) ... date(2020, 8, 3), ... date(2020, 8, 4), ... geo_type="county") ->>> covidcast.plot_choropleth(data) +>>> covidcast.plot(data) >>> plt.show() .. plot:: @@ -33,7 +33,7 @@ details.) from datetime import date from matplotlib import pyplot as plt data = covidcast.signal("fb-survey", "smoothed_cli", start_day = date(2020,8,4), end_day = date(2020,8,4), geo_type = "county") - covidcast.plot_choropleth(data) + covidcast.plot(data) plt.show() State-level data can also be mapped: @@ -43,7 +43,7 @@ State-level data can also be mapped: ... date(2020, 8, 3), ... date(2020, 8, 4), ... geo_type="state") ->>> covidcast.plot_choropleth(data) +>>> covidcast.plot(data) >>> plt.show() .. plot:: @@ -52,7 +52,7 @@ State-level data can also be mapped: from datetime import date from matplotlib import pyplot as plt data = covidcast.signal("fb-survey", "smoothed_cli", start_day = date(2020,8,4), end_day = date(2020,8,4), geo_type = "state") - covidcast.plot_choropleth(data) + covidcast.plot(data) plt.show() Regions where no information is present are presented in light grey, as demonstrated by these MSA @@ -63,7 +63,7 @@ and HRR plots. ... date(2020, 8, 3), ... date(2020, 8, 4), ... geo_type="msa") ->>> covidcast.plot_choropleth(data) +>>> covidcast.plot(data) >>> plt.show() .. plot:: @@ -72,13 +72,13 @@ and HRR plots. from datetime import date from matplotlib import pyplot as plt data = covidcast.signal("fb-survey", "smoothed_cli", start_day = date(2020,8,4), end_day = date(2020,8,4), geo_type = "msa") - covidcast.plot_choropleth(data) + covidcast.plot(data) plt.show() >>> data = covidcast.signal("fb-survey", "smoothed_cli", ... date(2020, 8, 3), date(2020, 8, 4), ... geo_type="hrr") ->>> covidcast.plot_choropleth(data) +>>> covidcast.plot(data) >>> plt.show() .. plot:: @@ -87,18 +87,32 @@ and HRR plots. from datetime import date from matplotlib import pyplot as plt data = covidcast.signal("fb-survey", "smoothed_cli", start_day = date(2020,8,4), end_day = date(2020,8,4), geo_type = "hrr") - covidcast.plot_choropleth(data) + covidcast.plot(data) plt.show() +As an alternative to choropleths, bubble plots can be created with the ``plot_type="bubble"`` +argument. + +>>> covidcast.plot(data, plot_type="bubble") +>>> plt.show() + +.. plot:: + + import covidcast + from datetime import date + from matplotlib import pyplot as plt + data = covidcast.signal("fb-survey", "smoothed_cli", start_day = date(2020,8,4), end_day = date(2020,8,4), geo_type = "msa") + covidcast.plot(data, plot_type="bubble") + plt.show() Additional keyword arguments can also be provided. These correspond to most of the arguments available for the `GeoPandas plot() function `_. ->>> covidcast.plot_choropleth(data, -... cmap="viridis", -... edgecolor="0.8") +>>> covidcast.plot(data, +... cmap="viridis", +... edgecolor="0.8") >>> plt.show() .. plot:: @@ -107,14 +121,14 @@ available for the from datetime import date from matplotlib import pyplot as plt data = covidcast.signal("fb-survey", "smoothed_cli", start_day=date(2020,8,3), end_day=date(2020,8,4), geo_type="county") - covidcast.plot_choropleth(data, cmap="viridis", edgecolor="0.8") + covidcast.plot(data, cmap="viridis", edgecolor="0.8") plt.show() The function returns a `Matplotlib Figure object `_ which can be stored and altered further. ->>> fig = plotting.plot_choropleth(data) +>>> fig = plotting.plot(data) >>> fig.set_dpi(100) Animations @@ -138,7 +152,7 @@ the month of August. Video format, frame rate, and resolution are adjustable. Like the static maps, additional plotting -keyword arguments can be provided and are passed to :py:func:`covidcast.plot_choropleth`. +keyword arguments can be provided and are passed to :py:func:`covidcast.plot`. >>> covidcast.animate(df, ... "test_plot2.mp4", @@ -165,7 +179,8 @@ The :py:func:`covidcast.get_geo_df` method can return different joins depending default, it will try to compute the right join between the input data (left side of join) to the geometry data (right side of join), so that the returned GeoDataFrame will contain all the possible geometries with the signal values filled if present. When mapping counties, those that do not have values but have -a corresponding megacounty will inherit the megacounty values. +a corresponding megacounty will inherit the megacounty values. To have a singe polygon returned for each +megacounty, use the ``combine_megacounties=True`` argument. This operation depends on having only one row of signal information per geographic region. If this is not the the case, you must specify another join @@ -177,35 +192,35 @@ with the ``join_type`` argument. ... date(2020, 8, 4), ... geo_type = "county") >>> covidcast.get_geo_df(data) - geo_value time_value direction issue lag value stderr sample_size geo_type data_source signal geometry state_fips -0 24510 2020-08-04 NaN 2020-08-06 2.0 0.375601 0.193356 587.6289 county fb-survey smoothed_cli POLYGON ((-76.71131 39.37193, -76.62619 39.372... 24 -1 31169 2020-08-04 NaN 2020-08-06 2.0 0.928208 0.168783 1059.8130 county fb-survey smoothed_cli POLYGON ((-97.82082 40.35054, -97.36869 40.350... 31 -2 37077 2020-08-04 NaN 2020-08-06 2.0 0.627742 0.081884 3146.0176 county fb-survey smoothed_cli POLYGON ((-78.80252 36.21349, -78.80235 36.220... 37 -3 46091 2020-08-04 NaN 2020-08-06 2.0 0.589745 0.161989 778.7429 county fb-survey smoothed_cli POLYGON ((-97.97924 45.76257, -97.97878 45.935... 46 -4 39075 2020-08-04 NaN 2020-08-06 2.0 0.785641 0.099959 2767.5054 county fb-survey smoothed_cli POLYGON ((-82.22066 40.66758, -82.12620 40.668... 39 -... ... ... ... ... ... ... ... ... ... ... ... ... ... -3228 53055 2020-08-04 NaN 2020-08-06 2.0 0.440817 0.143404 944.1731 county fb-survey smoothed_cli MULTIPOLYGON (((-122.97714 48.79345, -122.9379... 53 -3229 39133 2020-08-04 NaN 2020-08-06 2.0 0.040082 0.089324 310.8495 county fb-survey smoothed_cli POLYGON ((-81.39328 41.02544, -81.39322 41.040... 39 -3230 08025 2020-08-04 NaN 2020-08-06 2.0 0.440306 0.123763 1171.5823 county fb-survey smoothed_cli POLYGON ((-104.05840 38.26084, -104.05392 38.5... 08 -3231 13227 2020-08-04 NaN 2020-08-06 2.0 1.009511 0.092993 3605.8731 county fb-survey smoothed_cli POLYGON ((-84.65437 34.54895, -84.52139 34.550... 13 -3232 21145 2020-08-04 NaN 2020-08-06 2.0 1.257862 0.915558 150.4266 county fb-survey smoothed_cli POLYGON ((-88.93308 37.22775, -88.93174 37.227... 21 + geo_value time_value issue lag value stderr sample_size geo_type data_source signal geometry state_fips +0 24510 2020-08-04 2020-08-06 2.0 0.375601 0.193356 587.6289 county fb-survey smoothed_cli POLYGON ((-76.71131 39.37193, -76.62619 39.372... 24 +1 31169 2020-08-04 2020-08-06 2.0 0.928208 0.168783 1059.8130 county fb-survey smoothed_cli POLYGON ((-97.82082 40.35054, -97.36869 40.350... 31 +2 37077 2020-08-04 2020-08-06 2.0 0.627742 0.081884 3146.0176 county fb-survey smoothed_cli POLYGON ((-78.80252 36.21349, -78.80235 36.220... 37 +3 46091 2020-08-04 2020-08-06 2.0 0.589745 0.161989 778.7429 county fb-survey smoothed_cli POLYGON ((-97.97924 45.76257, -97.97878 45.935... 46 +4 39075 2020-08-04 2020-08-06 2.0 0.785641 0.099959 2767.5054 county fb-survey smoothed_cli POLYGON ((-82.22066 40.66758, -82.12620 40.668... 39 +... ... ... ... ... ... ... ... ... ... ... ... ... ... +3228 53055 2020-08-04 2020-08-06 2.0 0.440817 0.143404 944.1731 county fb-survey smoothed_cli MULTIPOLYGON (((-122.97714 48.79345, -122.9379... 53 +3229 39133 2020-08-04 2020-08-06 2.0 0.040082 0.089324 310.8495 county fb-survey smoothed_cli POLYGON ((-81.39328 41.02544, -81.39322 41.040... 39 +3230 08025 2020-08-04 2020-08-06 2.0 0.440306 0.123763 1171.5823 county fb-survey smoothed_cli POLYGON ((-104.05840 38.26084, -104.05392 38.5... 08 +3231 13227 2020-08-04 2020-08-06 2.0 1.009511 0.092993 3605.8731 county fb-survey smoothed_cli POLYGON ((-84.65437 34.54895, -84.52139 34.550... 13 +3232 21145 2020-08-04 2020-08-06 2.0 1.257862 0.915558 150.4266 county fb-survey smoothed_cli POLYGON ((-88.93308 37.22775, -88.93174 37.227... 21 [3233 rows x 13 columns] Note that there are 3233 output rows for the 3233 counties present in the Census shapefiles. >>> covidcast.get_geo_df(covid, join_type="left") - geo_value time_value direction issue lag value stderr sample_size geo_type data_source signal geometry state_fips -0 01000 2020-08-04 None 2020-08-06 2 1.153447 0.136070 1759.8539 county fb-survey smoothed_cli None NaN -1 01001 2020-08-04 None 2020-08-06 2 0.539568 0.450588 107.9345 county fb-survey smoothed_cli POLYGON ((-86.91759 32.66417, -86.81657 32.660... 01 -2 01003 2020-08-04 None 2020-08-06 2 1.625496 0.522036 455.2964 county fb-survey smoothed_cli POLYGON ((-88.02927 30.22271, -88.02399 30.230... 01 -3 01015 2020-08-04 None 2020-08-06 2 0.000000 0.378788 115.2302 county fb-survey smoothed_cli POLYGON ((-86.14371 33.70913, -86.12388 33.710... 01 -4 01051 2020-08-04 None 2020-08-06 2 0.786565 0.435877 112.5569 county fb-survey smoothed_cli POLYGON ((-86.41333 32.75059, -86.37497 32.753... 01 -.. ... ... ... ... ... ... ... ... ... ... ... ... ... -840 55141 2020-08-04 None 2020-08-06 2 1.190476 0.867751 144.3682 county fb-survey smoothed_cli POLYGON ((-90.31605 44.42450, -90.31596 44.424... 55 -841 56000 2020-08-04 None 2020-08-06 2 0.822092 0.254670 628.9937 county fb-survey smoothed_cli None NaN -842 56021 2020-08-04 None 2020-08-06 2 0.269360 0.315094 197.9646 county fb-survey smoothed_cli POLYGON ((-105.28064 41.33100, -105.27824 41.6... 56 -843 56025 2020-08-04 None 2020-08-06 2 0.170940 0.304654 192.0237 county fb-survey smoothed_cli POLYGON ((-107.54353 42.78156, -107.50142 42.7... 56 -844 72000 2020-08-04 None 2020-08-06 2 0.000000 0.228310 100.9990 county fb-survey smoothed_cli None NaN + geo_value time_value issue lag value stderr sample_size geo_type data_source signal geometry state_fips +0 01000 2020-08-04 2020-08-06 2 1.153447 0.136070 1759.8539 county fb-survey smoothed_cli None NaN +1 01001 2020-08-04 2020-08-06 2 0.539568 0.450588 107.9345 county fb-survey smoothed_cli POLYGON ((-86.91759 32.66417, -86.81657 32.660... 01 +2 01003 2020-08-04 2020-08-06 2 1.625496 0.522036 455.2964 county fb-survey smoothed_cli POLYGON ((-88.02927 30.22271, -88.02399 30.230... 01 +3 01015 2020-08-04 2020-08-06 2 0.000000 0.378788 115.2302 county fb-survey smoothed_cli POLYGON ((-86.14371 33.70913, -86.12388 33.710... 01 +4 01051 2020-08-04 2020-08-06 2 0.786565 0.435877 112.5569 county fb-survey smoothed_cli POLYGON ((-86.41333 32.75059, -86.37497 32.753... 01 +.. ... ... ... ... ... ... ... ... ... ... ... ... +840 55141 2020-08-04 2020-08-06 2 1.190476 0.867751 144.3682 county fb-survey smoothed_cli POLYGON ((-90.31605 44.42450, -90.31596 44.424... 55 +841 56000 2020-08-04 2020-08-06 2 0.822092 0.254670 628.9937 county fb-survey smoothed_cli None NaN +842 56021 2020-08-04 2020-08-06 2 0.269360 0.315094 197.9646 county fb-survey smoothed_cli POLYGON ((-105.28064 41.33100, -105.27824 41.6... 56 +843 56025 2020-08-04 2020-08-06 2 0.170940 0.304654 192.0237 county fb-survey smoothed_cli POLYGON ((-107.54353 42.78156, -107.50142 42.7... 56 +844 72000 2020-08-04 2020-08-06 2 0.000000 0.228310 100.9990 county fb-survey smoothed_cli None NaN [845 rows x 13 columns] With the left join, there are 845 rows since the signal returned information for 845 counties and diff --git a/Python-packages/covidcast-py/docs/plotting.rst b/Python-packages/covidcast-py/docs/plotting.rst index 9a22ae49..772e31c8 100644 --- a/Python-packages/covidcast-py/docs/plotting.rst +++ b/Python-packages/covidcast-py/docs/plotting.rst @@ -11,7 +11,7 @@ signal and generates a choropleth map, using `matplotlib `_ underneath. Detailed examples are provided in the :ref:`usage examples `. -.. autofunction:: covidcast.plot_choropleth +.. autofunction:: covidcast.plot Animate a signal over time -------------------------- diff --git a/Python-packages/covidcast-py/requirements.txt b/Python-packages/covidcast-py/requirements_ci.txt similarity index 90% rename from Python-packages/covidcast-py/requirements.txt rename to Python-packages/covidcast-py/requirements_ci.txt index 8660836d..7147792b 100644 --- a/Python-packages/covidcast-py/requirements.txt +++ b/Python-packages/covidcast-py/requirements_ci.txt @@ -1,5 +1,6 @@ pylint pytest +pydocstyle delphi_epidata pandas geopandas diff --git a/Python-packages/covidcast-py/requirements_dev.txt b/Python-packages/covidcast-py/requirements_dev.txt new file mode 100644 index 00000000..b81d6f03 --- /dev/null +++ b/Python-packages/covidcast-py/requirements_dev.txt @@ -0,0 +1,3 @@ +wheel +sphinx +sphinx-autodoc-typehints \ No newline at end of file diff --git a/Python-packages/covidcast-py/tests/covidcast/test_covidcast.py b/Python-packages/covidcast-py/tests/covidcast/test_covidcast.py index ad8a4f2c..dc536bf7 100644 --- a/Python-packages/covidcast-py/tests/covidcast/test_covidcast.py +++ b/Python-packages/covidcast-py/tests/covidcast/test_covidcast.py @@ -1,14 +1,17 @@ +import warnings from datetime import date, datetime from unittest.mock import patch # Force tests to use a specific backend, so they reproduce across platforms import matplotlib + matplotlib.use("AGG") import pandas as pd import numpy as np import pytest from covidcast import covidcast +from covidcast.errors import NoDataWarning def sort_df(df): @@ -22,7 +25,9 @@ def sort_df(df): @patch("delphi_epidata.Epidata.covidcast") def test_signal(mock_covidcast, mock_metadata): mock_covidcast.return_value = {"result": 1, # successful API response - "epidata": [{"time_value": 20200622, "issue": 20200724}], + "epidata": [{"time_value": 20200622, + "issue": 20200724, + "direction": None}], "message": "success"} mock_metadata.return_value = {"max_time": pd.Timestamp("2020-08-04 00:00:00"), "min_time": pd.Timestamp("2020-08-03 00:00:00")} @@ -196,13 +201,18 @@ def test__detect_metadata(): def test__fetch_single_geo(mock_covidcast): # not generating full DF since most attributes used mock_covidcast.side_effect = [{"result": 1, # successful API response - "epidata": [{"time_value": 20200622, "issue": 20200724}], + "epidata": [{"time_value": 20200622, + "issue": 20200724, + "direction": None}], "message": "success"}, {"result": 1, # second successful API - "epidata": [{"time_value": 20200821, "issue": 20200925}], + "epidata": [{"time_value": 20200821, + "issue": 20200925}], "message": "success"}, - {"message": "error: failed"}, # unsuccessful API response - {"message": "success"}] # no epidata + {"message": "failed"}, # unknown failed API response + {"message": "no results"}, # no data API response + {"message": "success"}, # no epidata + {"message": "success"}] # test happy path with 2 day range response = covidcast._fetch_single_geo( @@ -216,20 +226,33 @@ def test__fetch_single_geo(mock_covidcast): index=[0, 0]) assert sort_df(response).equals(sort_df(expected)) - # test warning is raised if unsuccessful API response - with pytest.warns(UserWarning): - covidcast._fetch_single_geo(None, None, date(2020, 4, 2), date(2020, 4, 2), - None, None, None, None, None) + # test warning when an unknown bad response is received + with warnings.catch_warnings(record=True) as w: + covidcast._fetch_single_geo("source", "signal", date(2020, 4, 2), date(2020, 4, 2), + "*", None, None, None, None) + assert len(w) == 1 + assert str(w[0].message) == \ + "Problem obtaining source signal data on 20200402 for geography '*': failed" + assert w[0].category is RuntimeWarning + + # test warning when a no data response is received + with warnings.catch_warnings(record=True) as w: + covidcast._fetch_single_geo("source", "signal", date(2020, 4, 2), date(2020, 4, 2), + "county", None, None, None, None) + assert len(w) == 1 + assert str(w[0].message) == "No source signal data found on 20200402 for geography 'county'" + assert w[0].category is NoDataWarning # test no epidata yields nothing - assert not covidcast._fetch_single_geo(None, None, date(2020, 4, 2), date(2020, 4, 1), + assert not covidcast._fetch_single_geo(None, None, date(2020, 4, 1), date(2020, 4, 1), None, None, None, None, None) # test end_day < start_day yields nothing - assert not covidcast._fetch_single_geo(None, None, date(2020, 4, 2), date(2020, 4, 1), + assert not covidcast._fetch_single_geo(None, None, date(2020, 4, 1), date(2020, 4, 1), None, None, None, None, None) + @patch("covidcast.covidcast.metadata") def test__signal_metadata(mock_metadata): mock_metadata.return_value = pd.DataFrame({"data_source": ["usa-facts", "doctor-visits"], diff --git a/Python-packages/covidcast-py/tests/covidcast/test_plotting.py b/Python-packages/covidcast-py/tests/covidcast/test_plotting.py index 34027205..aa2dbd2d 100644 --- a/Python-packages/covidcast-py/tests/covidcast/test_plotting.py +++ b/Python-packages/covidcast-py/tests/covidcast/test_plotting.py @@ -22,14 +22,21 @@ "sample_size", "geo_type", "data_source", "signal", "state_fips"] +def _convert_to_array(fig: matplotlib.figure.Figure) -> np.array: + """Covert Matplotlib Figure into an numpy array for comparison.""" + return np.frombuffer(fig.canvas.tostring_rgb(), dtype=np.uint8) # get np array representation + + @pytest.mark.skipif(platform.system() != "Linux", reason="Linux specific plot rendering expected.") @patch("covidcast.plotting._signal_metadata") -def test_plot_choropleth(mock_metadata): +def test_plot(mock_metadata): mock_metadata.side_effect = [ + {"mean_value": 0.5330011, "stdev_value": 0.4683431}, {"mean_value": 0.5330011, "stdev_value": 0.4683431}, {"mean_value": 0.5330011, "stdev_value": 0.4683431}, {"mean_value": 0.5304083, "stdev_value": 0.235302}, {"mean_value": 0.5705364, "stdev_value": 0.4348706}, + {"mean_value": 0.5705364, "stdev_value": 0.4348706}, ] matplotlib.use("agg") # load expected choropleth as an array @@ -41,32 +48,45 @@ def test_plot_choropleth(mock_metadata): test_county["time_value"] = test_county.time_value.astype("datetime64[D]") test_county["value"] = test_county.value.astype("float") - fig1 = plotting.plot_choropleth(test_county, time_value=date(2020, 8, 4)) - data1 = np.frombuffer(fig1.canvas.tostring_rgb(), dtype=np.uint8) # get np array representation + # w/o megacounties + no_mega_fig1 = plotting.plot(test_county, + time_value=date(2020, 8, 4), + combine_megacounties=False) # give margin of +-2 for floating point errors and weird variations (1 isn't consistent) - assert np.allclose(data1, expected["expected_1"], atol=2, rtol=0) + assert np.allclose(_convert_to_array(no_mega_fig1), expected["no_mega_1"], atol=2, rtol=0) - fig2 = plotting.plot_choropleth(test_county, cmap="viridis", figsize=(5, 5), edgecolor="0.8") - data2 = np.frombuffer(fig2.canvas.tostring_rgb(), dtype=np.uint8) - assert np.allclose(data2, expected["expected_2"], atol=2, rtol=0) + no_mega_fig2 = plotting.plot_choropleth(test_county, + cmap="viridis", + figsize=(5, 5), + edgecolor="0.8", + combine_megacounties=False) + assert np.allclose(_convert_to_array(no_mega_fig2), expected["no_mega_2"], atol=2, rtol=0) + + # w/ megacounties + mega_fig = plotting.plot_choropleth(test_county, time_value=date(2020, 8, 4)) + # give margin of +-2 for floating point errors and weird variations (1 isn't consistent) + assert np.allclose(_convert_to_array(mega_fig), expected["mega"], atol=2, rtol=0) # test state test_state = pd.read_csv( os.path.join(CURRENT_PATH, "../reference_data/test_input_state_signal.csv"), dtype=str) test_state["time_value"] = test_state.time_value.astype("datetime64[D]") test_state["value"] = test_state.value.astype("float") - fig3 = plotting.plot_choropleth(test_state) - data3 = np.frombuffer(fig3.canvas.tostring_rgb(), dtype=np.uint8) - assert np.allclose(data3, expected["expected_3"], atol=2, rtol=0) + state_fig = plotting.plot(test_state) + assert np.allclose(_convert_to_array(state_fig), expected["state"], atol=2, rtol=0) # test MSA test_msa = pd.read_csv( os.path.join(CURRENT_PATH, "../reference_data/test_input_msa_signal.csv"), dtype=str) test_msa["time_value"] = test_msa.time_value.astype("datetime64[D]") test_msa["value"] = test_msa.value.astype("float") - fig4 = plotting.plot_choropleth(test_msa) - data4 = np.frombuffer(fig4.canvas.tostring_rgb(), dtype=np.uint8) - assert np.allclose(data4, expected["expected_4"], atol=2, rtol=0) + msa_fig = plotting.plot(test_msa) + assert np.allclose(_convert_to_array(msa_fig), expected["msa"], atol=2, rtol=0) + + # test bubble + msa_bubble_fig = plotting.plot(test_msa, plot_type="bubble") + from matplotlib import pyplot as plt + assert np.allclose(_convert_to_array(msa_bubble_fig), expected["msa_bubble"], atol=2, rtol=0) def test_get_geo_df(): @@ -167,20 +187,31 @@ def test__join_county_geo_df(): "test_value": [1.5, 2.5, 3], "test_value2": [21.5, 32.5, 34]}) geo_info = gpd.read_file(os.path.join(CURRENT_PATH, SHAPEFILE_PATHS["county"])) + # test w/o megacounty combine # test right join - output1 = plotting._join_county_geo_df(test_input, "county_code", geo_info) - assert type(output1) is gpd.GeoDataFrame - expected1 = gpd.read_file( - os.path.join(CURRENT_PATH, "../reference_data/expected__join_county_geo_df_right.gpkg"), + no_mega_r = plotting._join_county_geo_df(test_input, "county_code", geo_info) + assert type(no_mega_r) is gpd.GeoDataFrame + expected_no_mega_r = gpd.read_file( + os.path.join(CURRENT_PATH, + "../reference_data/expected__join_county_geo_df_no_mega_right.gpkg"), dtype={"geo_value": str}) - pd.testing.assert_frame_equal(expected1, output1) + pd.testing.assert_frame_equal(expected_no_mega_r, no_mega_r) # test left join - output2 = plotting._join_county_geo_df(test_input, "county_code", geo_info, "left") - expected2 = gpd.read_file( - os.path.join(CURRENT_PATH, "../reference_data/expected__join_county_geo_df_left.gpkg"), + no_mega_l = plotting._join_county_geo_df(test_input, "county_code", geo_info, "left") + expected_no_mega_l = gpd.read_file( + os.path.join(CURRENT_PATH, + "../reference_data/expected__join_county_geo_df_no_mega_left.gpkg"), dtype={"geo_value": str}) - pd.testing.assert_frame_equal(expected2, output2) + pd.testing.assert_frame_equal(expected_no_mega_l, no_mega_l) + + # test w/ megacounty combine + mega = plotting._join_county_geo_df(test_input, "county_code", geo_info, "left", True) + expected_mega = gpd.read_file( + os.path.join(CURRENT_PATH, + "../reference_data/expected__join_county_geo_df_mega.gpkg"), + dtype={"geo_value": str}) + pd.testing.assert_frame_equal(expected_mega, mega) def test__join_msa_geo_df(): @@ -227,3 +258,9 @@ def test__join_hrr_geo_df(): os.path.join(CURRENT_PATH, "../reference_data/expected__join_hrr_geo_df_left.gpkg"), dtype={"geo_value": str}) pd.testing.assert_frame_equal(expected2, output2) + + +def test__is_megacounty(): + assert plotting._is_megacounty("12000") + assert not plotting._is_megacounty("12001") + assert not plotting._is_megacounty("120000") diff --git a/Python-packages/covidcast-py/tests/reference_data/expected__join_county_geo_df_mega.gpkg b/Python-packages/covidcast-py/tests/reference_data/expected__join_county_geo_df_mega.gpkg new file mode 100644 index 00000000..45ee65ac Binary files /dev/null and b/Python-packages/covidcast-py/tests/reference_data/expected__join_county_geo_df_mega.gpkg differ diff --git a/Python-packages/covidcast-py/tests/reference_data/expected__join_county_geo_df_left.gpkg b/Python-packages/covidcast-py/tests/reference_data/expected__join_county_geo_df_no_mega_left.gpkg similarity index 100% rename from Python-packages/covidcast-py/tests/reference_data/expected__join_county_geo_df_left.gpkg rename to Python-packages/covidcast-py/tests/reference_data/expected__join_county_geo_df_no_mega_left.gpkg diff --git a/Python-packages/covidcast-py/tests/reference_data/expected__join_county_geo_df_right.gpkg b/Python-packages/covidcast-py/tests/reference_data/expected__join_county_geo_df_no_mega_right.gpkg similarity index 100% rename from Python-packages/covidcast-py/tests/reference_data/expected__join_county_geo_df_right.gpkg rename to Python-packages/covidcast-py/tests/reference_data/expected__join_county_geo_df_no_mega_right.gpkg diff --git a/Python-packages/covidcast-py/tests/reference_data/expected_plot_arrays.npz b/Python-packages/covidcast-py/tests/reference_data/expected_plot_arrays.npz index cfc07f7f..cc183d94 100644 Binary files a/Python-packages/covidcast-py/tests/reference_data/expected_plot_arrays.npz and b/Python-packages/covidcast-py/tests/reference_data/expected_plot_arrays.npz differ diff --git a/R-packages/covidcast/.Rbuildignore b/R-packages/covidcast/.Rbuildignore new file mode 100644 index 00000000..701247c9 --- /dev/null +++ b/R-packages/covidcast/.Rbuildignore @@ -0,0 +1,6 @@ +^.*\.Rproj$ +^\.Rhistory$ +^\.Rproj\.user$ +^docs$ +^_pkgdown\.yml$ +^index\.md \ No newline at end of file diff --git a/R-packages/covidcast/DESCRIPTION b/R-packages/covidcast/DESCRIPTION index cfc7a198..66c3bd71 100644 --- a/R-packages/covidcast/DESCRIPTION +++ b/R-packages/covidcast/DESCRIPTION @@ -1,18 +1,36 @@ Package: covidcast Type: Package Title: Client for Delphi's COVIDcast API -Version: 0.3.0 -Authors@R: as.person(c( - "Jacob Bien [aut]", - "Logan Brooks [aut]", - "David Farrow [aut]", - "Pedrito Maynard-Zhang [aut]", - "Alex Reinhart [aut,cre]", - "Ryan Tibshirani [aut]" - )) -URL: https://cmu-delphi.github.io/covidcast/covidcastR/, https://github.com/cmu-delphi/covidcast +Version: 0.3.1 +Authors@R: + c( + person(given = "Jacob", + family = "Bien", + role = "aut"), + person(given = "Logan", + family = "Brooks", + role = "aut"), + person(given = "David", + family = "Farrow", + role = "aut"), + person(given = "Jed", + family = "Grabman", + role = "ctb"), + person(given = "Pedrito", + family = "Maynard-Zhang", + role = "ctb"), + person(given = "Alex", + family = "Reinhart", + role = c("aut", "cre"), + email = "areinhar@stat.cmu.edu"), + person(given = "Ryan", + family = "Tibshirani", + role = "aut")) +URL: https://cmu-delphi.github.io/covidcast/covidcastR/, + https://github.com/cmu-delphi/covidcast BugReports: https://github.com/cmu-delphi/covidcast/issues -Description: R tools surrounding Delphi's COVIDcast API: data access tools for our COVID-19 indicators, maps and time series plotting, and basic signal processing. +Description: Tools for Delphi's COVIDcast API: data access (for our COVID-19 + indicators), maps and time series plotting, and basic signal processing. Depends: R (>= 3.5.0) License: MIT + file LICENSE Encoding: UTF-8 diff --git a/R-packages/covidcast/NAMESPACE b/R-packages/covidcast/NAMESPACE index aae15411..20536d2c 100644 --- a/R-packages/covidcast/NAMESPACE +++ b/R-packages/covidcast/NAMESPACE @@ -7,11 +7,13 @@ S3method(print,covidcast_meta) S3method(print,covidcast_signal) S3method(summary,covidcast_meta) S3method(summary,covidcast_signal) +export(abbr_to_fips) export(abbr_to_name) export(cbsa_to_name) export(covidcast_cor) export(covidcast_meta) export(covidcast_signal) +export(fips_to_abbr) export(fips_to_name) export(name_to_abbr) export(name_to_cbsa) diff --git a/R-packages/covidcast/R/cor.R b/R-packages/covidcast/R/cor.R index db94b188..27fa17f6 100644 --- a/R-packages/covidcast/R/cor.R +++ b/R-packages/covidcast/R/cor.R @@ -31,8 +31,8 @@ covidcast_cor = function(x, y, dt_x = 0, dt_y = 0, method = c("pearson", "kendall", "spearman")) { x = latest_issue(x) y = latest_issue(y) - if (dt_x < 0 || dt_y < 0) stop("Both dt_x and dt_y must be nonnegative") - if (dt_x > 0 && dt_y > 0) stop("Only one of dt_x and dt_y can be positive") + if (dt_x < 0 || dt_y < 0) stop("Both `dt_x` and `dt_y` must be nonnegative.") + if (dt_x > 0 && dt_y > 0) stop("Only one of `dt_x` and `dt_y` can be positive.") by = match.arg(by) method = match.arg(method) diff --git a/R-packages/covidcast/R/covidcast.R b/R-packages/covidcast/R/covidcast.R index 2d29ff5f..c3cb4ef7 100644 --- a/R-packages/covidcast/R/covidcast.R +++ b/R-packages/covidcast/R/covidcast.R @@ -9,7 +9,7 @@ COVIDCAST_BASE_URL <- 'https://api.covidcast.cmu.edu/epidata/api.php' packageStartupMessage(paste(msg, collapse = "\n")) } -#' Produce a data frame for one signal. +#' Obtain a data frame for one COVIDcast signal #' #' Obtains data for selected date ranges for all geographic regions of the #' United States. Available data sources and signals are documented in the @@ -35,8 +35,8 @@ COVIDCAST_BASE_URL <- 'https://api.covidcast.cmu.edu/epidata/api.php' #' returns the most recent issue available for every observation. The `as_of`, #' `issues`, and `lag` parameters allow the user to select specific issues #' instead, or to see all updates to observations. These options are mutually -#' exclusive; if you specify more than one, `as_of` will take priority over -#' `issues`, which will take priority over `lag`. +#' exclusive, and you should only specify one; if you specify more than one, you +#' may get an error or confusing results. #' #' Note that the API only tracks the initial value of an estimate and *changes* #' to that value. If a value was first issued on June 5th and never updated, @@ -67,62 +67,61 @@ COVIDCAST_BASE_URL <- 'https://api.covidcast.cmu.edu/epidata/api.php' #' documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) #' for a list of available signals. #' @param start_day Query data beginning on this date. Date object, or string in -#' the form `"YYYY-MM-DD"`. If `start_day` is `NULL`, defaults to first day +#' the form "YYYY-MM-DD". If `start_day` is `NULL`, defaults to first day #' data is available for this signal. #' @param end_day Query data up to this date, inclusive. Date object or string -#' in the form `"YYYY-MM-DD"`. If `end_day` is `NULL`, defaults to the most +#' in the form "YYYY-MM-DD". If `end_day` is `NULL`, defaults to the most #' recent day data is available for this signal. #' @param geo_type The geography type for which to request this data, such as -#' `"county"` or `"state"`. Defaults to `"county"`. See the [geographic coding +#' "county" or "state". Defaults to "county". See the [geographic coding #' documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html) #' for details on which types are available. -#' @param geo_values Which geographies to return. The default, `"*"`, fetches +#' @param geo_values Which geographies to return. The default, "*", fetches #' all geographies. To fetch specific geographies, specify their IDs as a #' vector or list of strings. See the [geographic coding #' documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html) #' for details on how to specify these IDs. #' @param as_of Fetch only data that was available on or before this date, -#' provided as a `Date` object or string in the form `"YYYY-MM-DD"`. If -#' `NULL`, the default, return the most recent available data. +#' provided as a `Date` object or string in the form "YYYY-MM-DD". If +#' `NULL`, the default, return the most recent available data. Note that only +#' one of `as_of`, `issues`, and `lag` should be provided; it does not make +#' sense to specify more than one. #' @param issues Fetch only data that was published or updated ("issued") on #' these dates. Provided as either a single `Date` object (or string in the -#' form `"YYYY-MM-DD"`), indicating a single date to fetch data issued on, or +#' form "YYYY-MM-DD"), indicating a single date to fetch data issued on, or #' a vector specifying two dates, start and end. In this case, return all data #' issued in this range. There may be multiple rows for each observation, #' indicating several updates to its value. If `NULL`, the default, return the #' most recently issued data. -#' @param lag Integer. If, for example, `lag=3`, fetch only data that was -#' published or updated exactly 3 days after the date. For example, a row with -#' `time_value` of June 3 will only be included in the results if its data was -#' issued or updated on June 6. If `NULL`, the default, return the most -#' recently issued data regardless of its lag. -#' +#' @param lag Integer. If, for example, `lag = 3`, then we fetch only data that +#' was published or updated exactly 3 days after the date. For example, a row +#' with `time_value` of June 3 will only be included in the results if its +#' data was issued or updated on June 6. If `NULL`, the default, return the +#' most recently issued data regardless of its lag. +#' #' @return Data frame with matching data. Each row is one observation of one #' signal on one day in one geographic location. Contains the following #' columns: #' #' \item{data_source}{Data source from which this observation was obtained.} -#' \item{signal}{The signal from which this observation was obtained.} -#' \item{geo_value}{identifies the location, such as a state name or county -#' FIPS code} -#' \item{time_value}{a `Date` object identifying the date of this observation} -#' \item{issue}{a `Date` object identifying the date this estimate was issued. +#' \item{signal}{Signal from which this observation was obtained.} +#' \item{geo_value}{String identifying the location, such as a state name or +#' county FIPS code.} +#' \item{time_value}{Date object identifying the date of this observation.} +#' \item{issue}{Date object identifying the date this estimate was issued. #' For example, an estimate with a `time_value` of June 3 might have been #' issued on June 5, after the data for June 3rd was collected and ingested #' into the API.} -#' \item{lag}{an integer giving the difference between ``issue`` and -#' ``time_value``, in days.} -#' \item{value}{the signal quantity requested. For example, in a query for the -#' `confirmed_cumulative_num` signal from the `usa-facts` source, this would -#' be the cumulative number of confirmed cases in the area, as of the +#' \item{lag}{Integer giving the difference between `issue` and `time_value`, +#' in days.} +#' \item{value}{Signal value being requested. For example, in a query for the +#' "confirmed_cumulative_num" signal from the "usa-facts" source, this would +#' be the cumulative number of confirmed cases in the area, as of the given #' `time_value`.} -#' \item{stderr}{the value's standard error, if available} -#' \item{sample_size}{indicates the sample size available in that geography on -#' that day; sample size may not be available for all signals, due to privacy -#' or other constraints, in which case they will be `NA`.} -#' \item{direction}{uses a local linear fit to estimate whether the signal in -#' this region is currently increasing or decreasing (reported as -1 for -#' decreasing, 1 for increasing, and 0 for neither).} +#' \item{stderr}{Associated standard error of the signal value, if available.} +#' \item{sample_size}{Integer indicating the sample size available in that +#' geography on that day; sample size may not be available for all signals, +#' due to privacy or other constraints, in which case it will be `NA`.} #' #' Consult the signal documentation for more details on how values and #' standard errors are calculated for specific signals. @@ -207,8 +206,7 @@ covidcast_signal <- function(data_source, signal, } if (start_day > end_day) { - stop("end_day must be on or after start_day, but start_day = '", - start_day, "' and end_day = '", end_day, "'") + stop("`end_day` must be on or after `start_day`.") } if (!is.null(as_of)) { @@ -301,7 +299,7 @@ summary.covidcast_signal = function(object, ...) { #' @param plot_type One of "choro", "bubble", "line" indicating whether to plot #' a choropleth map, bubble map, or line (time series) graph, respectively. #' The default is "choro". -#' @param time_value Date object (or string in the form `"YYYY-MM-DD"`) +#' @param time_value Date object (or string in the form "YYYY-MM-DD") #' specifying the day to map, for choropleth and bubble maps. If `NULL`, the #' default, then the last date in `x` is used for the maps. Time series plots #' always include all available time values in `x`. @@ -463,7 +461,7 @@ plot.covidcast_signal <- function(x, plot_type = c("choro", "bubble", "line"), ########## -#' Obtain multiple signals in one data frame. +#' Obtain multiple COVIDcast signals in one data frame #' #' This convenience function uses `covidcast_signal()` to obtain multiple #' signals, potentially from multiple data sources, in one data frame. See the @@ -518,7 +516,7 @@ covidcast_signals <- function(signals, start_day = NULL, end_day = NULL, ########## -#' Fetch Delphi's COVID-19 Surveillance Streams metadata. +#' Obtain COVIDcast metadata #' #' Obtains a data frame of metadata describing all publicly available data #' streams from the COVIDcast API. @@ -537,11 +535,11 @@ covidcast_signals <- function(signals, start_day = NULL, end_day = NULL, #' \item{num_locations}{Number of distinct geographic locations available for #' this signal. For example, if `geo_type` is county, the number of counties #' for which this signal has ever been reported.} -#' \item{min_value}{The smallest value that has ever been reported.} -#' \item{max_value}{The largest value that has ever been reported.} -#' \item{mean_value}{The arithmetic mean of all reported values.} -#' \item{stdev_value}{The sample standard deviation of all reported values.} -#' \item{max_issue}{The most recent issue date for this signal.} +#' \item{min_value}{Smallest value that has ever been reported.} +#' \item{max_value}{Largest value that has ever been reported.} +#' \item{mean_value}{Arithmetic mean of all reported values.} +#' \item{stdev_value}{Sample standard deviation of all reported values.} +#' \item{max_issue}{Most recent issue date for this signal.} #' \item{min_lag}{Smallest lag from observation to issue, in `time_type` units} #' \item{max_lag}{Largest lag from observation to issue, in `time_type` units} #' @@ -555,7 +553,7 @@ covidcast_meta <- function() { meta <- .request(list(source='covidcast_meta', cached="true")) if (meta$message != "success") { - stop("Failed to obtain metadata: ", meta$message) + stop("Failed to obtain metadata: ", meta$message, ".") } meta <- meta$epidata %>% @@ -685,14 +683,14 @@ single_geo <- function(data_source, signal, start_day, end_day, geo_type, return(df) } -# Fetch Delphi's COVID-19 Surveillance Streams +# Fetch Delphi's COVID-19 indicators covidcast <- function(data_source, signal, time_type, geo_type, time_values, geo_value, as_of, issues, lag) { # Check parameters if(missing(data_source) || missing(signal) || missing(time_type) || missing(geo_type) || missing(time_values) || missing(geo_value)) { - stop('`data_source`, `signal`, `time_type`, `geo_type`, `time_values`, and ', - '`geo_value` are all required') + stop("`data_source`, `signal`, `time_type`, `geo_type`, `time_values`, and ", + "`geo_value` are all required.") } # Set up request @@ -718,7 +716,7 @@ covidcast <- function(data_source, signal, time_type, geo_type, time_values, } else if (length(issues) == 1) { params$issues <- date_to_string(issues) } else { - stop("`issues` must be either a single date or a date interval") + stop("`issues` must be either a single date or a date interval.") } } diff --git a/R-packages/covidcast/R/data.R b/R-packages/covidcast/R/data.R index 2a30067d..6b2f3978 100644 --- a/R-packages/covidcast/R/data.R +++ b/R-packages/covidcast/R/data.R @@ -6,7 +6,7 @@ #' states and DC). There are many columns. The most crucial are: #' #' \describe{ -#' \item{FIPS}{5-digit county FIPS codes. These are unique identifiers +#' \item{FIPS}{Five-digit county FIPS codes. These are unique identifiers #' used, for example, as the `geo_values` argument to `covidcast_signal()` to #' request data from a specific county.} #' \item{CTYNAME}{County name, to help find counties by name.} @@ -18,8 +18,7 @@ #' @references Census Bureau documentation of all columns and their meaning: #' \url{https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.pdf} #' -#' @source -#' United States Census Bureau, at +#' @source United States Census Bureau, at #' \url{https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv} "county_census" @@ -49,7 +48,7 @@ #' @references Census Bureau documentation of all columns and their meaning: #' \url{https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/metro/totals/cbsa-est2019-alldata.pdf} #' -#' @source +#' @source United States Census Bureau, at #' \url{https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/metro/totals/cbsa-est2019-alldata.csv} "msa_census" @@ -66,43 +65,44 @@ #' \item{POPESTIMATE2019}{Estimate of the state's resident population in 2019.} #' } #' -#' @source +#' @source United States Census Bureau, at #' \url{https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/state/detail/SCPRC-EST2019-18+POP-RES.csv} "state_census" #' County latitudes and longitudes #' -#' Data set on latitudes and longitudes of county centroids, from the National -#' Weather Service. +#' Data set on latitudes and longitudes of county centroids, from the `usmap` +#' package. #' -#' @format Data frame with 3331 rows, each representing one county. Columns: +#' @format Data frame with 3142 rows, each representing one county. Columns: #' #' \describe{ -#' \item{COUNTYNAME}{Name of the county.} -#' \item{FIPS}{5-digit county FIPS code.} -#' \item{STATE}{Two-letter state abbreviation.} -#' \item{LON}{Longitude of county centroid.} -#' \item{LAT}{Latitude of county centroid.} +#' \item{x}{Longitude of county centroid.} +#' \item{y}{Latitude of county centroid.} +#' \item{fips}{Five-digit county FIPS code.} +#' \item{abbr}{Two-letter state abbreviation.} +#' \item{full}{State name.} +#' \item{county}{County name.} #' } #' -#' @source -#' \url{https://www.weather.gov/gis/Counties} +#' @source `usmap` "county_geo" #' State latitudes and longitudes #' -#' Data set on latitudes and longitudes of state centroids, from Google's DSPL. +#' Data set on latitudes and longitudes of state centroids, from the `usmap` +#' package. #' -#' @format Data frame with 52 rows, each representing one state (including -#' Puerto Rico and the District of Columbia). Columns: +#' @format Data frame with 51 rows, each representing one state (including the +#' District of Columbia). Columns: #' #' \describe{ -#' \item{STATE}{Two-letter state abbreviation.} -#' \item{LAT}{Latitude of state centroid.} -#' \item{LON}{Longitude of state centroid.} -#' \item{NAME}{Name of state.} +#' \item{x}{Longitude of state centroid.} +#' \item{y}{Latitude of state centroid.} +#' \item{fips}{Five-digit county FIPS code.} +#' \item{abbr}{Two-letter state abbreviation.} +#' \item{full}{State name.} #' } #' -#' @source -#' \url{https://developers.google.com/public-data/docs/canonical/states_csv} +#' @source `usmap` "state_geo" diff --git a/R-packages/covidcast/R/plot.R b/R-packages/covidcast/R/plot.R index efd36fc7..58685bf4 100644 --- a/R-packages/covidcast/R/plot.R +++ b/R-packages/covidcast/R/plot.R @@ -57,7 +57,7 @@ plot_choro = function(x, time_value = NULL, include = c(), range, # For intensity, create a discrete color function, if we need to else if (!direction && !is.null(breaks)) { if (length(breaks) != length(col)) { - stop("'breaks' must have length equal to the number of colors.") + stop("`breaks` must have length equal to the number of colors.") } col_fun = function(val, alpha = 1) { alpha_str = substr(grDevices::rgb(0, 0, 0, alpha = alpha), 8, 9) @@ -73,7 +73,7 @@ plot_choro = function(x, time_value = NULL, include = c(), range, # For direction, create a discrete color function else { if (length(dir_col) != 3) { - stop("'dir_col' must have length 3.") + stop("`dir_col` must have length 3.") } col_fun = function(val, alpha = 1) { alpha_str = substr(grDevices::rgb(0, 0, 0, alpha = alpha), 8, 9) @@ -147,7 +147,7 @@ plot_choro = function(x, time_value = NULL, include = c(), range, geom_args$mapping = aes(x = x, y = y, group = group) geom_args$data = map_df polygon_layer = do.call(ggplot2::geom_polygon, geom_args) - + # For intensity and continuous color scale, create a legend layer if (!direction && is.null(breaks)) { # Create legend breaks and legend labels, if we need to @@ -262,7 +262,7 @@ plot_bubble = function(x, time_value = NULL, include = c(), range = NULL, if (is.null(legend_width)) legend_width = 15 if (is.null(legend_digits)) legend_digits = 2 if (is.null(legend_pos)) legend_pos = "bottom" - + # Create breaks, if we need to breaks = params$breaks if (!is.null(breaks)) num_bins = length(breaks) @@ -294,7 +294,7 @@ plot_bubble = function(x, time_value = NULL, include = c(), range = NULL, for (i in 1:length(breaks)) val_out[val >= breaks[i]] = breaks[i] return(val_out) } - + # Set some basic layers element_text = ggplot2::element_text margin = ggplot2::margin @@ -346,22 +346,18 @@ plot_bubble = function(x, time_value = NULL, include = c(), range = NULL, geom_args$data = map_df polygon_layer = do.call(ggplot2::geom_polygon, geom_args) - # Set the lats and lons for counties + # Retrieve coordinates for mapping + # Reading from usmap files to ensure consistency with borders if (attributes(x)$geo_type == "county") { - g = county_geo[county_geo$FIPS %in% map_geo, ] - cur_geo = g$FIPS - cur_lon = g$LON - cur_lat = g$LAT + centroids = covidcast::county_geo[covidcast::county_geo$fips %in% map_geo, ] + cur_geo = centroids$fips cur_val = rep(NA, length(cur_geo)) } - - # Set the lats and lons for states else if (attributes(x)$geo_type == "state") { - state_geo$STATE = tolower(state_geo$STATE) - g = state_geo[state_geo$STATE %in% map_geo, ] - cur_geo = g$STATE - cur_lon = g$LON - cur_lat = g$LAT + centroids = covidcast::state_geo + centroids$abbr = tolower(centroids$abbr) + centroids = centroids[centroids$abbr %in% map_geo, ] + cur_geo = centroids$abbr cur_val = rep(NA, length(cur_geo)) } @@ -377,12 +373,11 @@ plot_bubble = function(x, time_value = NULL, include = c(), range = NULL, cur_val[cur_val == 0] = NA levels(cur_val)[levels(cur_val) == 0] = NA } - + # Create the bubble layer - bubble_df = data.frame(lon = cur_lon, lat = cur_lat, val = cur_val) - suppressWarnings({ bubble_trans = usmap::usmap_transform(bubble_df) }) - bubble_layer = ggplot2::geom_point(aes(x = lon.1, y = lat.1, size = val), - data = bubble_trans, color = col, + bubble_df = data.frame(lat = centroids$x, lon = centroids$y, val = cur_val) + bubble_layer = ggplot2::geom_point(aes(x = lat, y = lon, size = val), + data = bubble_df, color = col, alpha = alpha, na.rm = TRUE) # Create the scale layer @@ -391,7 +386,7 @@ plot_bubble = function(x, time_value = NULL, include = c(), range = NULL, scale_layer = ggplot2::scale_size_manual(values = sizes, breaks = breaks, labels = labels, drop = FALSE, guide = guide) - + # Put it all together and return return(ggplot2::ggplot() + polygon_layer + ggplot2::coord_equal() + title_layer + bubble_layer + scale_layer + theme_layer) @@ -412,13 +407,13 @@ plot_line = function(x, range = NULL, title = NULL, params = list()) { if (is.null(ylab)) ylab = "Value" if (is.null(stderr_bands)) stderr_bands = FALSE if (is.null(stderr_alpha)) stderr_alpha = 0.5 - + # Grab the values df = x %>% dplyr::select(value, time_value, geo_value, stderr) # Set the range, if we need to if (is.null(range)) range = base::range(df$value, na.rm = TRUE) - + # Create label and theme layers label_layer = ggplot2::labs(title = title, x = xlab, y = ylab) theme_layer = ggplot2::theme_bw() + diff --git a/R-packages/covidcast/R/utils.R b/R-packages/covidcast/R/utils.R index de9f955f..353f3f1f 100644 --- a/R-packages/covidcast/R/utils.R +++ b/R-packages/covidcast/R/utils.R @@ -1,4 +1,4 @@ -#' Fetch only the latest issue for each observation in a data frame. +#' Fetch only the latest issue for each observation in a data frame #' #' Since `covidcast_signal()` can, with the right options, return multiple #' issues for a single observation in a single geo, we may want only the most @@ -23,7 +23,7 @@ latest_issue <- function(df) { return(df) } -#' Fetch only the earliest issue for each observation in a data frame. +#' Fetch only the earliest issue for each observation in a data frame #' #' Since `covidcast_signal()` can, with the right options, return multiple #' issues for a single observation in a single geo, we may want only the most @@ -66,10 +66,10 @@ earliest_issue <- function(df) { #' @param state Two letter state abbreviation (case insensitive) indicating a #' parent state used to restrict the search. For example, when `state = "NY"`, #' then `name_to_fips()` searches only over only counties lying in New York -#' state, whereas `name_to_cbsa()` searches over the metropolitan areas lying, +#' state, whereas `name_to_cbsa()` searches over the metropolitan areas lying, #' either fully or partially (as a metropolitan area can span several states), #' in New York state. If `NULL`, the default, then the search is performed -#' US-wide (not restricted to any state in particular). +#' US-wide (not restricted to any state in particular). #' #' @return A vector of FIPS or CBSA codes if `ties_method` equals "first", and a #' list of FIPS or CBSA codes otherwise. @@ -86,13 +86,13 @@ earliest_issue <- function(df) { name_to_fips = function(name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all"), state = NULL) { # Leave states in county_census (so we can find state fips) - df = county_census # %>% dplyr::filter(COUNTY != 0) + df = covidcast::county_census # %>% dplyr::filter(COUNTY != 0) # Restrict to a particular state, if we're asked to if (!is.null(state)) { df = df %>% dplyr::filter(STNAME == abbr_to_name(toupper(state))) } - + # Now perform the grep-based look up grep_lookup(key = name, keys = df$CTYNAME, values = df$FIPS, ignore.case = ignore.case, perl = perl, fixed = fixed, @@ -104,10 +104,10 @@ name_to_fips = function(name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, name_to_cbsa = function(name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all"), state = NULL) { # Restrict msa_census to metro areas - df = msa_census %>% dplyr::filter(LSAD == "Metropolitan Statistical Area") + df = covidcast::msa_census %>% dplyr::filter(LSAD == "Metropolitan Statistical Area") # Restrict to a particular state, if we're asked to - if (!is.null(state)) { + if (!is.null(state)) { df = df %>% dplyr::slice(grep(toupper(state), df$STATE)) } @@ -131,8 +131,8 @@ name_to_cbsa = function(name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, #' @param ties_method If "first", then only the first match for each code is #' returned. If "all", then all matches for each code are returned. #' -#' @return A vector of FIPS or CBSA codes if `ties_method` equals "first", and a -#' list of FIPS or CBSA codes otherwise. +#' @return A vector of county or metro names if `ties_method` equals "first", +#' and a list of county or names otherwise. #' #' @examples #' fips_to_name("42003") @@ -148,7 +148,7 @@ name_to_cbsa = function(name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, fips_to_name = function(code, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all")) { # Leave states in county_census (so we can find state fips) - df = county_census # %>% dplyr::filter(COUNTY != 0) + df = covidcast::county_census # %>% dplyr::filter(COUNTY != 0) # Now perform the grep-based look up grep_lookup(key = code, keys = df$FIPS, values = df$CTYNAME, @@ -161,7 +161,7 @@ fips_to_name = function(code, ignore.case = FALSE, perl = FALSE, fixed = FALSE, cbsa_to_name = function(code, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all")) { # Restrict msa_census to metro areas - df = msa_census %>% dplyr::filter(LSAD == "Metropolitan Statistical Area") + df = covidcast::msa_census %>% dplyr::filter(LSAD == "Metropolitan Statistical Area") # Now perform the grep-based look up grep_lookup(key = code, keys = df$CBSA, values = df$NAME, @@ -173,7 +173,7 @@ cbsa_to_name = function(code, ignore.case = FALSE, perl = FALSE, fixed = FALSE, #' #' Look up state abbreviations by state names (including District of Columbia #' and Puerto Rico); this function is based on `grep()`, and hence allows for -#' regular expressions. +#' regular expressions. #' #' @param name Vector of state names to look up. #' @param ignore.case,perl,fixed Arguments to pass to `grep()`, with the same @@ -184,19 +184,19 @@ cbsa_to_name = function(code, ignore.case = FALSE, perl = FALSE, fixed = FALSE, #' returned. If "all", then all matches for each name are returned. #' #' @return A vector of state abbreviations if `ties_method` equals "first", and -#' a list of state abbreviations otherwise. +#' a list of state abbreviations otherwise. #' #' @examples -#' name_to_abbr("Penn") +#' name_to_abbr("Penn") #' name_to_abbr(c("Penn", "New"), ties_method = "all") #' #' @seealso [abbr_to_name()] #' @export name_to_abbr = function(name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all")) { - # First get rid of United States from state_census - df = state_census %>% dplyr::filter(STATE > 0) - + # First get rid of United States from state_census + df = covidcast::state_census %>% dplyr::filter(STATE > 0) + # Now perform the grep-based look up grep_lookup(key = name, keys = df$NAME, values = df$ABBR, ignore.case = ignore.case, perl = perl, fixed = fixed, @@ -207,7 +207,7 @@ name_to_abbr = function(name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, #' #' Look up state names by state abbreviations (including District of Columbia #' and Puerto Rico); this function is based on `grep()`, and hence allows for -#' regular expressions. +#' regular expressions. #' #' @param abbr Vector of state abbreviations to look up. #' @param ignore.case,perl,fixed Arguments to pass to `grep()`, with the same @@ -218,7 +218,7 @@ name_to_abbr = function(name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, #' returned. If "all", then all matches for each name are returned. #' #' @return A vector of state names if `ties_method` equals "first", and a list -#' of state names otherwise. +#' of state names otherwise. #' #' @examples #' abbr_to_name("PA") @@ -228,15 +228,98 @@ name_to_abbr = function(name, ignore.case = FALSE, perl = FALSE, fixed = FALSE, #' @export abbr_to_name = function(abbr, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all")) { - # First get rid of United States from state_census - df = state_census %>% dplyr::filter(STATE > 0) - + # First get rid of United States from state_census + df = covidcast::state_census %>% dplyr::filter(STATE > 0) + # Perform the grep-based look up grep_lookup(key = abbr, keys = df$ABBR, values = df$NAME, ignore.case = ignore.case, perl = perl, fixed = fixed, ties_method = ties_method) } +#' Get FIPS codes from state abbreviations +#' +#' Look up FIPS codes by state abbreviations (including District of Columbia and +#' Puerto Rico); this function is based on `grep()`, and hence allows for +#' regular expressions. +#' +#' @param abbr Vector of state abbreviations to look up. +#' @param ignore.case,perl,fixed Arguments to pass to `grep()`, with the same +#' defaults as in the latter function, except for `ignore.case = TRUE`. Hence, +#' by default, regular expressions are used; to match against a fixed string +#' (no regular expressions), set `fixed = TRUE`. +#' @param ties_method If "first", then only the first match for each name is +#' returned. If "all", then all matches for each name are returned. +#' +#' @return A vector of FIPS codes if `ties_method` equals "first", and a list of +#' FIPS codes otherwise. These FIPS codes have five digits (ending in "000"). +#' +#' @examples +#' abbr_to_fips("PA") +#' abbr_to_fips(c("PA", "PR", "DC")) +#' +#' # Note that name_to_fips() works for state names too: +#' name_to_fips("^Pennsylvania$") +#' +#' @seealso [abbr_to_name()] +#' @export +abbr_to_fips = function(abbr, ignore.case = TRUE, perl = FALSE, fixed = FALSE, + ties_method = c("first", "all")) { + # First get rid of United States from state_census, then convert FIPS codes to + # appropriate character format + df = covidcast::state_census %>% dplyr::filter(STATE > 0) %>% + dplyr::mutate(STATE = format_state_fips(STATE)) + + # Now perform the grep-based look up + grep_lookup(key = abbr, keys = df$ABBR, values = df$STATE, + ignore.case = ignore.case, perl = perl, fixed = fixed, + ties_method = ties_method) +} + +#' Get state abbreviations from FIPS codes +#' +#' Look up state abbreviations by FIPS codes (including District of Columbia and +#' Puerto Rico); this function is based on `grep()`, and hence allows for +#' regular expressions. +#' +#' @param code Vector of FIPS codes to look up; codes can have either two digits +#' (as in "42") or five digits (as in "42000"), either is allowed. +#' @param ignore.case,perl,fixed Arguments to pass to `grep()`, with the same +#' defaults as in the latter function, except for `ignore.case = TRUE`. Hence, +#' by default, regular expressions are used; to match against a fixed string +#' (no regular expressions), set `fixed = TRUE`. +#' @param ignore.case,perl,fixed Arguments to pass to `grep()`, with the same +#' defaults as in the latter function. Hence, by default, regular expressions +#' are used; to match against a fixed string (no regular expressions), set +#' `fixed = TRUE`. +#' @param ties_method If "first", then only the first match for each code is +#' returned. If "all", then all matches for each code are returned. +#' +#' @return A vector of state abbreviations if `ties_method` equals "first", and +#' a list of state abbreviations otherwise. +#' +#' @examples +#' fips_to_abbr("42000") +#' fips_to_abbr(c("42", "72", "11")) +#' +#' # Note that fips_to_name() works for state names too: +#' fips_to_name("42000") +#' +#' @seealso [abbr_to_fips()] +#' @export +fips_to_abbr = function(code, ignore.case = TRUE, perl = FALSE, fixed = FALSE, + ties_method = c("first", "all")) { + # First get rid of United States from state_census, then convert FIPS codes to + # appropriate character format + df = covidcast::state_census %>% dplyr::filter(STATE > 0) %>% + dplyr::mutate(STATE = format_state_fips(STATE)) + + # Now perform the grep-based look up + grep_lookup(key = code, keys = df$STATE, values = df$ABBR, + ignore.case = ignore.case, perl = perl, fixed = fixed, + ties_method = ties_method) +} + # This is the core lookup function grep_lookup = function(key, keys, values, ignore.case = FALSE, perl = FALSE, fixed = FALSE, ties_method = c("first", "all")) { @@ -259,3 +342,7 @@ grep_lookup = function(key, keys, values, ignore.case = FALSE, perl = FALSE, } return(sapply(res, `[`, 1)) } + +# Simple convenience functions for FIPS formatting +format_fips = function(fips) { sprintf("%05d", fips) } +format_state_fips = function(fips) { sprintf("%02d000", fips) } diff --git a/R-packages/covidcast/data/county_geo.rda b/R-packages/covidcast/data/county_geo.rda index 97774936..0d66bb0b 100644 Binary files a/R-packages/covidcast/data/county_geo.rda and b/R-packages/covidcast/data/county_geo.rda differ diff --git a/R-packages/covidcast/data/state_geo.rda b/R-packages/covidcast/data/state_geo.rda index 0a237c5e..9e2724cc 100644 Binary files a/R-packages/covidcast/data/state_geo.rda and b/R-packages/covidcast/data/state_geo.rda differ diff --git a/R-packages/covidcast/man/abbr_to_fips.Rd b/R-packages/covidcast/man/abbr_to_fips.Rd new file mode 100644 index 00000000..cef7feed --- /dev/null +++ b/R-packages/covidcast/man/abbr_to_fips.Rd @@ -0,0 +1,45 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/utils.R +\name{abbr_to_fips} +\alias{abbr_to_fips} +\title{Get FIPS codes from state abbreviations} +\usage{ +abbr_to_fips( + abbr, + ignore.case = TRUE, + perl = FALSE, + fixed = FALSE, + ties_method = c("first", "all") +) +} +\arguments{ +\item{abbr}{Vector of state abbreviations to look up.} + +\item{ignore.case, perl, fixed}{Arguments to pass to \code{grep()}, with the same +defaults as in the latter function, except for \code{ignore.case = TRUE}. Hence, +by default, regular expressions are used; to match against a fixed string +(no regular expressions), set \code{fixed = TRUE}.} + +\item{ties_method}{If "first", then only the first match for each name is +returned. If "all", then all matches for each name are returned.} +} +\value{ +A vector of FIPS codes if \code{ties_method} equals "first", and a list of +FIPS codes otherwise. These FIPS codes have five digits (ending in "000"). +} +\description{ +Look up FIPS codes by state abbreviations (including District of Columbia and +Puerto Rico); this function is based on \code{grep()}, and hence allows for +regular expressions. +} +\examples{ +abbr_to_fips("PA") +abbr_to_fips(c("PA", "PR", "DC")) + +# Note that name_to_fips() works for state names too: +name_to_fips("^Pennsylvania$") + +} +\seealso{ +\code{\link[=abbr_to_name]{abbr_to_name()}} +} diff --git a/R-packages/covidcast/man/county_census.Rd b/R-packages/covidcast/man/county_census.Rd index 1814eb0c..decae840 100644 --- a/R-packages/covidcast/man/county_census.Rd +++ b/R-packages/covidcast/man/county_census.Rd @@ -9,7 +9,7 @@ A data frame with 3193 rows, one for each county (along with the 50 states and DC). There are many columns. The most crucial are: \describe{ -\item{FIPS}{5-digit county FIPS codes. These are unique identifiers +\item{FIPS}{Five-digit county FIPS codes. These are unique identifiers used, for example, as the \code{geo_values} argument to \code{covidcast_signal()} to request data from a specific county.} \item{CTYNAME}{County name, to help find counties by name.} diff --git a/R-packages/covidcast/man/county_geo.Rd b/R-packages/covidcast/man/county_geo.Rd index 413f6b59..2df47aa9 100644 --- a/R-packages/covidcast/man/county_geo.Rd +++ b/R-packages/covidcast/man/county_geo.Rd @@ -5,24 +5,25 @@ \alias{county_geo} \title{County latitudes and longitudes} \format{ -Data frame with 3331 rows, each representing one county. Columns: +Data frame with 3142 rows, each representing one county. Columns: \describe{ -\item{COUNTYNAME}{Name of the county.} -\item{FIPS}{5-digit county FIPS code.} -\item{STATE}{Two-letter state abbreviation.} -\item{LON}{Longitude of county centroid.} -\item{LAT}{Latitude of county centroid.} +\item{x}{Longitude of county centroid.} +\item{y}{Latitude of county centroid.} +\item{fips}{Five-digit county FIPS code.} +\item{abbr}{Two-letter state abbreviation.} +\item{full}{State name.} +\item{county}{County name.} } } \source{ -\url{https://www.weather.gov/gis/Counties} +\code{usmap} } \usage{ county_geo } \description{ -Data set on latitudes and longitudes of county centroids, from the National -Weather Service. +Data set on latitudes and longitudes of county centroids, from the \code{usmap} +package. } \keyword{datasets} diff --git a/R-packages/covidcast/man/covidcast_meta.Rd b/R-packages/covidcast/man/covidcast_meta.Rd index dba11b23..21a9c31b 100644 --- a/R-packages/covidcast/man/covidcast_meta.Rd +++ b/R-packages/covidcast/man/covidcast_meta.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/covidcast.R \name{covidcast_meta} \alias{covidcast_meta} -\title{Fetch Delphi's COVID-19 Surveillance Streams metadata.} +\title{Obtain COVIDcast metadata} \usage{ covidcast_meta() } @@ -21,11 +21,11 @@ own metadata.} \item{num_locations}{Number of distinct geographic locations available for this signal. For example, if \code{geo_type} is county, the number of counties for which this signal has ever been reported.} -\item{min_value}{The smallest value that has ever been reported.} -\item{max_value}{The largest value that has ever been reported.} -\item{mean_value}{The arithmetic mean of all reported values.} -\item{stdev_value}{The sample standard deviation of all reported values.} -\item{max_issue}{The most recent issue date for this signal.} +\item{min_value}{Smallest value that has ever been reported.} +\item{max_value}{Largest value that has ever been reported.} +\item{mean_value}{Arithmetic mean of all reported values.} +\item{stdev_value}{Sample standard deviation of all reported values.} +\item{max_issue}{Most recent issue date for this signal.} \item{min_lag}{Smallest lag from observation to issue, in \code{time_type} units} \item{max_lag}{Largest lag from observation to issue, in \code{time_type} units} } diff --git a/R-packages/covidcast/man/covidcast_signal.Rd b/R-packages/covidcast/man/covidcast_signal.Rd index 4a65bfae..2fe55213 100644 --- a/R-packages/covidcast/man/covidcast_signal.Rd +++ b/R-packages/covidcast/man/covidcast_signal.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/covidcast.R \name{covidcast_signal} \alias{covidcast_signal} -\title{Produce a data frame for one signal.} +\title{Obtain a data frame for one COVIDcast signal} \usage{ covidcast_signal( data_source, @@ -26,39 +26,42 @@ see the \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals. for a list of available signals.} \item{start_day}{Query data beginning on this date. Date object, or string in -the form \code{"YYYY-MM-DD"}. If \code{start_day} is \code{NULL}, defaults to first day +the form "YYYY-MM-DD". If \code{start_day} is \code{NULL}, defaults to first day data is available for this signal.} \item{end_day}{Query data up to this date, inclusive. Date object or string -in the form \code{"YYYY-MM-DD"}. If \code{end_day} is \code{NULL}, defaults to the most +in the form "YYYY-MM-DD". If \code{end_day} is \code{NULL}, defaults to the most recent day data is available for this signal.} \item{geo_type}{The geography type for which to request this data, such as -\code{"county"} or \code{"state"}. Defaults to \code{"county"}. See the \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html}{geographic coding documentation} +"county" or "state". Defaults to "county". See the \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html}{geographic coding documentation} for details on which types are available.} -\item{geo_values}{Which geographies to return. The default, \code{"*"}, fetches +\item{geo_values}{Which geographies to return. The default, "*", fetches all geographies. To fetch specific geographies, specify their IDs as a vector or list of strings. See the \href{https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html}{geographic coding documentation} for details on how to specify these IDs.} \item{as_of}{Fetch only data that was available on or before this date, -provided as a \code{Date} object or string in the form \code{"YYYY-MM-DD"}. If -\code{NULL}, the default, return the most recent available data.} +provided as a \code{Date} object or string in the form "YYYY-MM-DD". If +\code{NULL}, the default, return the most recent available data. Note that only +one of \code{as_of}, \code{issues}, and \code{lag} should be provided; it does not make +sense to specify more than one.} + \item{issues}{Fetch only data that was published or updated ("issued") on these dates. Provided as either a single \code{Date} object (or string in the -form \code{"YYYY-MM-DD"}), indicating a single date to fetch data issued on, or +form "YYYY-MM-DD"), indicating a single date to fetch data issued on, or a vector specifying two dates, start and end. In this case, return all data issued in this range. There may be multiple rows for each observation, indicating several updates to its value. If \code{NULL}, the default, return the most recently issued data.} -\item{lag}{Integer. If, for example, \code{lag=3}, fetch only data that was -published or updated exactly 3 days after the date. For example, a row with -\code{time_value} of June 3 will only be included in the results if its data was -issued or updated on June 6. If \code{NULL}, the default, return the most -recently issued data regardless of its lag.} +\item{lag}{Integer. If, for example, \code{lag = 3}, then we fetch only data that +was published or updated exactly 3 days after the date. For example, a row +with \code{time_value} of June 3 will only be included in the results if its +data was issued or updated on June 6. If \code{NULL}, the default, return the +most recently issued data regardless of its lag.} } \value{ Data frame with matching data. Each row is one observation of one @@ -66,27 +69,24 @@ signal on one day in one geographic location. Contains the following columns: \item{data_source}{Data source from which this observation was obtained.} -\item{signal}{The signal from which this observation was obtained.} -\item{geo_value}{identifies the location, such as a state name or county -FIPS code} -\item{time_value}{a \code{Date} object identifying the date of this observation} -\item{issue}{a \code{Date} object identifying the date this estimate was issued. +\item{signal}{Signal from which this observation was obtained.} +\item{geo_value}{String identifying the location, such as a state name or +county FIPS code.} +\item{time_value}{Date object identifying the date of this observation.} +\item{issue}{Date object identifying the date this estimate was issued. For example, an estimate with a \code{time_value} of June 3 might have been issued on June 5, after the data for June 3rd was collected and ingested into the API.} -\item{lag}{an integer giving the difference between \code{issue} and -\code{time_value}, in days.} -\item{value}{the signal quantity requested. For example, in a query for the -\code{confirmed_cumulative_num} signal from the \code{usa-facts} source, this would -be the cumulative number of confirmed cases in the area, as of the +\item{lag}{Integer giving the difference between \code{issue} and \code{time_value}, +in days.} +\item{value}{Signal value being requested. For example, in a query for the +"confirmed_cumulative_num" signal from the "usa-facts" source, this would +be the cumulative number of confirmed cases in the area, as of the given \code{time_value}.} -\item{stderr}{the value's standard error, if available} -\item{sample_size}{indicates the sample size available in that geography on -that day; sample size may not be available for all signals, due to privacy -or other constraints, in which case they will be \code{NA}.} -\item{direction}{uses a local linear fit to estimate whether the signal in -this region is currently increasing or decreasing (reported as -1 for -decreasing, 1 for increasing, and 0 for neither).} +\item{stderr}{Associated standard error of the signal value, if available.} +\item{sample_size}{Integer indicating the sample size available in that +geography on that day; sample size may not be available for all signals, +due to privacy or other constraints, in which case it will be \code{NA}.} Consult the signal documentation for more details on how values and standard errors are calculated for specific signals. @@ -116,8 +116,8 @@ data, creating a new issue on June 8th. By default, \code{covidcast_signal()} returns the most recent issue available for every observation. The \code{as_of}, \code{issues}, and \code{lag} parameters allow the user to select specific issues instead, or to see all updates to observations. These options are mutually -exclusive; if you specify more than one, \code{as_of} will take priority over -\code{issues}, which will take priority over \code{lag}. +exclusive, and you should only specify one; if you specify more than one, you +may get an error or confusing results. Note that the API only tracks the initial value of an estimate and \emph{changes} to that value. If a value was first issued on June 5th and never updated, diff --git a/R-packages/covidcast/man/earliest_issue.Rd b/R-packages/covidcast/man/earliest_issue.Rd index 088d3df3..57a9bcf0 100644 --- a/R-packages/covidcast/man/earliest_issue.Rd +++ b/R-packages/covidcast/man/earliest_issue.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/utils.R \name{earliest_issue} \alias{earliest_issue} -\title{Fetch only the earliest issue for each observation in a data frame.} +\title{Fetch only the earliest issue for each observation in a data frame} \usage{ earliest_issue(df) } diff --git a/R-packages/covidcast/man/fips_to_abbr.Rd b/R-packages/covidcast/man/fips_to_abbr.Rd new file mode 100644 index 00000000..d6763263 --- /dev/null +++ b/R-packages/covidcast/man/fips_to_abbr.Rd @@ -0,0 +1,46 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/utils.R +\name{fips_to_abbr} +\alias{fips_to_abbr} +\title{Get state abbreviations from FIPS codes} +\usage{ +fips_to_abbr( + code, + ignore.case = TRUE, + perl = FALSE, + fixed = FALSE, + ties_method = c("first", "all") +) +} +\arguments{ +\item{code}{Vector of FIPS codes to look up; codes can have either two digits +(as in "42") or five digits (as in "42000"), either is allowed.} + +\item{ignore.case, perl, fixed}{Arguments to pass to \code{grep()}, with the same +defaults as in the latter function. Hence, by default, regular expressions +are used; to match against a fixed string (no regular expressions), set +\code{fixed = TRUE}.} + +\item{ties_method}{If "first", then only the first match for each code is +returned. If "all", then all matches for each code are returned.} +} +\value{ +A vector of state abbreviations if \code{ties_method} equals "first", and +a list of state abbreviations otherwise. +} +\description{ +Look up state abbreviations by FIPS codes (including District of Columbia and +Puerto Rico); this function is based on \code{grep()}, and hence allows for +regular expressions. +} +\examples{ +fips_to_abbr("42000") +fips_to_abbr(c("42", "72", "11")) + +# Note that fips_to_name() works for state names too: +fips_to_name("42000") + +} +\seealso{ +\code{\link[=abbr_to_fips]{abbr_to_fips()}} +} diff --git a/R-packages/covidcast/man/fips_to_name.Rd b/R-packages/covidcast/man/fips_to_name.Rd index 5eb76036..9d0d7deb 100644 --- a/R-packages/covidcast/man/fips_to_name.Rd +++ b/R-packages/covidcast/man/fips_to_name.Rd @@ -33,8 +33,8 @@ are used; to match against a fixed string (no regular expressions), set returned. If "all", then all matches for each code are returned.} } \value{ -A vector of FIPS or CBSA codes if \code{ties_method} equals "first", and a -list of FIPS or CBSA codes otherwise. +A vector of county or metro names if \code{ties_method} equals "first", +and a list of county or names otherwise. } \description{ Look up county or metropolitan area names by FIPS or CBSA codes, diff --git a/R-packages/covidcast/man/latest_issue.Rd b/R-packages/covidcast/man/latest_issue.Rd index ea8e9272..be742d62 100644 --- a/R-packages/covidcast/man/latest_issue.Rd +++ b/R-packages/covidcast/man/latest_issue.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/utils.R \name{latest_issue} \alias{latest_issue} -\title{Fetch only the latest issue for each observation in a data frame.} +\title{Fetch only the latest issue for each observation in a data frame} \usage{ latest_issue(df) } diff --git a/R-packages/covidcast/man/msa_census.Rd b/R-packages/covidcast/man/msa_census.Rd index a27363b5..48d37e2f 100644 --- a/R-packages/covidcast/man/msa_census.Rd +++ b/R-packages/covidcast/man/msa_census.Rd @@ -22,6 +22,7 @@ July 1, 2019.} } } \source{ +United States Census Bureau, at \url{https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/metro/totals/cbsa-est2019-alldata.csv} } \usage{ diff --git a/R-packages/covidcast/man/plot.covidcast_signal.Rd b/R-packages/covidcast/man/plot.covidcast_signal.Rd index d7079072..45800979 100644 --- a/R-packages/covidcast/man/plot.covidcast_signal.Rd +++ b/R-packages/covidcast/man/plot.covidcast_signal.Rd @@ -32,7 +32,7 @@ mapped or plotted.} a choropleth map, bubble map, or line (time series) graph, respectively. The default is "choro".} -\item{time_value}{Date object (or string in the form \code{"YYYY-MM-DD"}) +\item{time_value}{Date object (or string in the form "YYYY-MM-DD") specifying the day to map, for choropleth and bubble maps. If \code{NULL}, the default, then the last date in \code{x} is used for the maps. Time series plots always include all available time values in \code{x}.} diff --git a/R-packages/covidcast/man/state_census.Rd b/R-packages/covidcast/man/state_census.Rd index 2bb372cf..dd62e508 100644 --- a/R-packages/covidcast/man/state_census.Rd +++ b/R-packages/covidcast/man/state_census.Rd @@ -15,6 +15,7 @@ Important columns: } } \source{ +United States Census Bureau, at \url{https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/state/detail/SCPRC-EST2019-18+POP-RES.csv} } \usage{ diff --git a/R-packages/covidcast/man/state_geo.Rd b/R-packages/covidcast/man/state_geo.Rd index 0a98ff35..c0ac4871 100644 --- a/R-packages/covidcast/man/state_geo.Rd +++ b/R-packages/covidcast/man/state_geo.Rd @@ -5,23 +5,25 @@ \alias{state_geo} \title{State latitudes and longitudes} \format{ -Data frame with 52 rows, each representing one state (including -Puerto Rico and the District of Columbia). Columns: +Data frame with 51 rows, each representing one state (including the +District of Columbia). Columns: \describe{ -\item{STATE}{Two-letter state abbreviation.} -\item{LAT}{Latitude of state centroid.} -\item{LON}{Longitude of state centroid.} -\item{NAME}{Name of state.} +\item{x}{Longitude of state centroid.} +\item{y}{Latitude of state centroid.} +\item{fips}{Five-digit county FIPS code.} +\item{abbr}{Two-letter state abbreviation.} +\item{full}{State name.} } } \source{ -\url{https://developers.google.com/public-data/docs/canonical/states_csv} +\code{usmap} } \usage{ state_geo } \description{ -Data set on latitudes and longitudes of state centroids, from Google's DSPL. +Data set on latitudes and longitudes of state centroids, from the \code{usmap} +package. } \keyword{datasets} diff --git a/R-packages/covidcast/vignettes/covidcast.Rmd b/R-packages/covidcast/vignettes/covidcast.Rmd index d6157169..f66986dc 100644 --- a/R-packages/covidcast/vignettes/covidcast.Rmd +++ b/R-packages/covidcast/vignettes/covidcast.Rmd @@ -25,6 +25,17 @@ devtools::install_github("cmu-delphi/covidcast", ref = "main", subdir = "R-packages/covidcast") ``` +Building the vignettes, such as this Getting Started guide, takes a substantial +amount of time. They are not included in the package by default. If you wish to +include vignettes, use this modified command: + +```{r, eval = FALSE} +devtools::install_github("cmu-delphi/covidcast", ref = "main", + subdir = "R-packages/covidcast", + build_vignettes = TRUE, + dependencies = TRUE) +``` + ## Basic examples For full usage information, see the [function diff --git a/R-packages/data-raw/c_03mr20/c_03mr20.dbf b/R-packages/data-raw/c_03mr20/c_03mr20.dbf deleted file mode 100755 index 08d64977..00000000 Binary files a/R-packages/data-raw/c_03mr20/c_03mr20.dbf and /dev/null differ diff --git a/R-packages/data-raw/c_03mr20/c_03mr20.prj b/R-packages/data-raw/c_03mr20/c_03mr20.prj deleted file mode 100755 index 440136e0..00000000 --- a/R-packages/data-raw/c_03mr20/c_03mr20.prj +++ /dev/null @@ -1 +0,0 @@ -GEOGCS["GCS_North_American_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.017453292519943295],VERTCS["NAD_1983",DATUM["D_North_American_1983",SPHEROID["GRS_1980",6378137.0,298.257222101]],PARAMETER["Vertical_Shift",0.0],PARAMETER["Direction",1.0],UNIT["Meter",1.0]]] \ No newline at end of file diff --git a/R-packages/data-raw/c_03mr20/c_03mr20.shp b/R-packages/data-raw/c_03mr20/c_03mr20.shp deleted file mode 100755 index 3866f5a7..00000000 Binary files a/R-packages/data-raw/c_03mr20/c_03mr20.shp and /dev/null differ diff --git a/R-packages/data-raw/c_03mr20/c_03mr20.shx b/R-packages/data-raw/c_03mr20/c_03mr20.shx deleted file mode 100755 index 025f5463..00000000 Binary files a/R-packages/data-raw/c_03mr20/c_03mr20.shx and /dev/null differ diff --git a/R-packages/data-raw/make.R b/R-packages/data-raw/make.R index 4cb57dc0..5f0d0946 100644 --- a/R-packages/data-raw/make.R +++ b/R-packages/data-raw/make.R @@ -30,22 +30,15 @@ state_abbr["Puerto Rico Commonwealth"] = "PR" state_census$ABBR = state_abbr save(state_census, file = "../covidcast/data/state_census.rda") -# County geo data from https://www.weather.gov/gis/Counties -library(sf) -county_geo = st_read("c_03mr20/c_03mr20.shp") -county_geo$STATE = as.character(county_geo$STATE) -county_geo$CWA = as.character(county_geo$CWA) -county_geo$FE_AREA = as.character(county_geo$FE_AREA) -county_geo$COUNTYNAME = as.character(county_geo$COUNTYNAME) -county_geo$TIME_ZONE = NULL -county_geo$geometry = NULL -county_geo = data.frame(county_geo) +# County geo centroids from usmap +county_col_classes = c("numeric", "numeric", "character", "character", + "character", "character") +county_file = system.file("extdata", "us_counties_centroids.csv", package = "usmap") +county_geo = utils::read.csv(county_file, colClasses = county_col_classes) save(county_geo, file = "../covidcast/data/county_geo.rda", compress = "bzip2") -county_census$FIPS[!county_census$FIPS %in% county_geo$FIPS] # Just the states themselves -county_geo[!county_geo$FIPS %in% county_census$FIPS, ] # AS, PR, VI, GU, etc. - -# State geo data from https://developers.google.com/public-data/docs/canonical/states_csv -state_geo = read.table("state-geo.txt", sep = "\t", stringsAsFactors = FALSE) -colnames(state_geo) = c("STATE", "LAT", "LON", "NAME") +# State geo centroids from usmap +state_col_classes = c("numeric", "numeric", "character", "character", "character") +state_file = system.file("extdata", "us_states_centroids.csv", package = "usmap") +state_geo = utils::read.csv(state_file, colClasses = state_col_classes) save(state_geo, file = "../covidcast/data/state_geo.rda") diff --git a/R-packages/data-raw/state-geo.txt b/R-packages/data-raw/state-geo.txt deleted file mode 100644 index 36a90ca6..00000000 --- a/R-packages/data-raw/state-geo.txt +++ /dev/null @@ -1,52 +0,0 @@ -AK 63.588753 -154.493062 Alaska -AL 32.318231 -86.902298 Alabama -AR 35.20105 -91.831833 Arkansas -AZ 34.048928 -111.093731 Arizona -CA 36.778261 -119.417932 California -CO 39.550051 -105.782067 Colorado -CT 41.603221 -73.087749 Connecticut -DC 38.905985 -77.033418 District of Columbia -DE 38.910832 -75.52767 Delaware -FL 27.664827 -81.515754 Florida -GA 32.157435 -82.907123 Georgia -HI 19.898682 -155.665857 Hawaii -IA 41.878003 -93.097702 Iowa -ID 44.068202 -114.742041 Idaho -IL 40.633125 -89.398528 Illinois -IN 40.551217 -85.602364 Indiana -KS 39.011902 -98.484246 Kansas -KY 37.839333 -84.270018 Kentucky -LA 31.244823 -92.145024 Louisiana -MA 42.407211 -71.382437 Massachusetts -MD 39.045755 -76.641271 Maryland -ME 45.253783 -69.445469 Maine -MI 44.314844 -85.602364 Michigan -MN 46.729553 -94.6859 Minnesota -MO 37.964253 -91.831833 Missouri -MS 32.354668 -89.398528 Mississippi -MT 46.879682 -110.362566 Montana -NC 35.759573 -79.0193 North Carolina -ND 47.551493 -101.002012 North Dakota -NE 41.492537 -99.901813 Nebraska -NH 43.193852 -71.572395 New Hampshire -NJ 40.058324 -74.405661 New Jersey -NM 34.97273 -105.032363 New Mexico -NV 38.80261 -116.419389 Nevada -NY 43.299428 -74.217933 New York -OH 40.417287 -82.907123 Ohio -OK 35.007752 -97.092877 Oklahoma -OR 43.804133 -120.554201 Oregon -PA 41.203322 -77.194525 Pennsylvania -PR 18.220833 -66.590149 Puerto Rico -RI 41.580095 -71.477429 Rhode Island -SC 33.836081 -81.163725 South Carolina -SD 43.969515 -99.901813 South Dakota -TN 35.517491 -86.580447 Tennessee -TX 31.968599 -99.901813 Texas -UT 39.32098 -111.093731 Utah -VA 37.431573 -78.656894 Virginia -VT 44.558803 -72.577841 Vermont -WA 47.751074 -120.740139 Washington -WI 43.78444 -88.787868 Wisconsin -WV 38.597626 -80.454903 West Virginia -WY 43.075968 -107.290284 Wyoming diff --git a/README.md b/README.md index 83c465fd..3f5c5b1b 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # covidcast -Public-facing documents and tools supporting Delphi's -[COVIDcast](https://covidcast.cmu.edu) effort. +Public-facing tools supporting [Delphi's](https://delphi.cmu.edu) +[COVIDcast](https://covidcast.cmu.edu) effort. ## API clients @@ -9,8 +9,8 @@ This repository includes two clients for the [COVIDcast API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html), which provides access to Delphi's COVID-19 indicators and related data: -* R: [covidcast](https://cmu-delphi.github.io/covidcast/covidcastR/) -* Python: [covidcast](https://cmu-delphi.github.io/covidcast/covidcast-py/html/) +- R: [covidcast](https://cmu-delphi.github.io/covidcast/covidcastR/) +- Python: [covidcast](https://cmu-delphi.github.io/covidcast/covidcast-py/html/) ## R notebooks @@ -24,5 +24,18 @@ our signals: This repository also includes some talks we've given about the COVIDcast project: -- [COVIDcast intro, our API](https://cmu-delphi.github.io/covidcast/talks/intro-api/talk.html) -- [Our Surveys Through Facebook](https://cmu-delphi.github.io/covidcast/talks/fb-survey/talk.html) +- [Intro to COVIDcast and API](https://cmu-delphi.github.io/covidcast/talks/intro-api/talk.html) +- [Survey Through Facebook](https://cmu-delphi.github.io/covidcast/talks/fb-survey/talk.html) +- [Medical Claims Indicators](https://docs.google.com/presentation/d/1Pt2qMwIHyyuyGwwigZyndPGjcjILAS6RYxBcXKuuQ4U/edit?usp=sharing) +- [Forecast Evaluation Toolkit](https://cmu-delphi.github.io/covidcast/talks/evalcast/talk.html) + +## Related repos + +- [delphi-epidata](https://github.com/cmu-delphi/delphi-epidata/): Back end for + Delphi's Epidata API +- [covidcast-indicators](https://github.com/cmu-delphi/covidcast-indicators/): + Back end for Delphi's COVID indicators +- [www-covidcast](https://github.com/cmu-delphi/www-covidcast/): Front end for + Delphi's [COVIDcast map](https://covidcast.cmu.edu) +- [covid-19-forecast](https://github.com/cmu-delphi/covid-19-forecast/): Public + repo for Delphi's COVID forecasters diff --git a/docs/covidcastR/404.html b/docs/covidcastR/404.html index 71821b72..dc2f8b07 100644 --- a/docs/covidcastR/404.html +++ b/docs/covidcastR/404.html @@ -151,7 +151,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/LICENSE-text.html b/docs/covidcastR/LICENSE-text.html index 327cbb44..f8876cec 100644 --- a/docs/covidcastR/LICENSE-text.html +++ b/docs/covidcastR/LICENSE-text.html @@ -153,7 +153,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/articles/correlation-utils.html b/docs/covidcastR/articles/correlation-utils.html index ebd24fe4..3df20129 100644 --- a/docs/covidcastR/articles/correlation-utils.html +++ b/docs/covidcastR/articles/correlation-utils.html @@ -85,7 +85,7 @@ -
+
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-3-1.png b/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-3-1.png index 0eb53e5e..eb499dc6 100644 Binary files a/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-3-1.png and b/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-3-1.png differ diff --git a/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-4-1.png b/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-4-1.png index 34aa389d..35e63bbb 100644 Binary files a/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-4-1.png and b/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-5-1.png b/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-5-1.png index 49b51502..87580830 100644 Binary files a/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-5-1.png and b/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-6-1.png b/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-6-1.png index 45f6b0ad..831ce8d6 100644 Binary files a/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-6-1.png and b/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-6-1.png differ diff --git a/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-7-1.png b/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-7-1.png index f946d64f..ddddc165 100644 Binary files a/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-7-1.png and b/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-7-1.png differ diff --git a/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-8-1.png b/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-8-1.png index 3f5f22be..44169e17 100644 Binary files a/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-8-1.png and b/docs/covidcastR/articles/correlation-utils_files/figure-html/unnamed-chunk-8-1.png differ diff --git a/docs/covidcastR/articles/covidcast.html b/docs/covidcastR/articles/covidcast.html index 8aad6405..3ca314b4 100644 --- a/docs/covidcastR/articles/covidcast.html +++ b/docs/covidcastR/articles/covidcast.html @@ -85,7 +85,7 @@ -
+

Basic examples

For full usage information, see the function documentation.

To obtain smoothed estimates of COVID-like illness from our Facebook survey for every county in the United States between 2020-05-01 and 2020-05-07, we can use covidcast_signal():

-
library(covidcast)
+
+library(covidcast)
 
-cli <- suppressMessages(
-  covidcast_signal(data_source = "fb-survey", signal = "smoothed_cli",
-                   start_day = "2020-05-01", end_day = "2020-05-07",
-                   geo_type = "county")
-)
-head(cli)
+cli <- suppressMessages( + covidcast_signal(data_source = "fb-survey", signal = "smoothed_cli", + start_day = "2020-05-01", end_day = "2020-05-07", + geo_type = "county") +) +head(cli)
##   geo_value       signal time_value direction      issue lag     value
 ## 1     01000 smoothed_cli 2020-05-01        NA 2020-09-03 125 0.8254101
 ## 2     01001 smoothed_cli 2020-05-01        NA 2020-09-03 125 1.2994255
@@ -133,10 +141,11 @@ 

## 4 0.5485655 122.5577 fb-survey ## 5 0.3608268 114.8318 fb-survey ## 6 0.7086324 110.6544 fb-survey

-

Each row represents one observation in one county on one day. The county FIPS code is given in the geo_value column, the date in the time_value column. Here value is the requested signal—in this case, the smoothed estimate of the percentage of people with COVID-like illness, based on the symptom surveys, and stderr is its standard error. See the covidcast_signal() documentation for details on the returned data frame.

+

Each row represents one observation in one county on one day. The county FIPS code is given in the geo_value column, the date in the time_value column. Here value is the requested signal---in this case, the smoothed estimate of the percentage of people with COVID-like illness, based on the symptom surveys, and stderr is its standard error. See the covidcast_signal() documentation for details on the returned data frame.

Notice the use of suppressMessages() to hide progress reporting from the function as it downloads the data; if you download particularly large amounts of data, you may prefer to allow the progress reporting so you know how long to wait.

To get a basic summary of the returned data frame:

-
summary(cli)
+
+summary(cli)
## A `covidcast_signal` data frame with 7080 rows and 10 columns.
 ## 
 ## data_source : fb-survey
@@ -147,12 +156,13 @@ 

## last date : 2020-05-07 ## median number of geo_values per day : 1021

To request estimates for states instead of counties:

-
cli <- suppressMessages(
-  covidcast_signal(data_source = "fb-survey", signal = "smoothed_cli",
-                   start_day = "2020-05-01", end_day = "2020-05-07",
-                   geo_type = "state")
-)
-head(cli)
+
+cli <- suppressMessages(
+  covidcast_signal(data_source = "fb-survey", signal = "smoothed_cli", 
+                   start_day = "2020-05-01", end_day = "2020-05-07", 
+                   geo_type = "state")
+)
+head(cli)
##   geo_value       signal time_value direction      issue lag     value
 ## 1        ak smoothed_cli 2020-05-01        NA 2020-09-03 125 0.4607721
 ## 2        al smoothed_cli 2020-05-01        NA 2020-09-03 125 0.6994761
@@ -168,12 +178,13 @@ 

## 5 0.0228051 51870.138 fb-survey ## 6 0.0684792 10105.894 fb-survey

One can also select a specific geographic region by its ID. For example, this is the FIPS code for Allegheny County, Pennsylvania:

-
cli <- suppressMessages(
-  covidcast_signal(data_source = "fb-survey", signal = "smoothed_cli",
-                   start_day = "2020-05-01", end_day = "2020-05-07",
-                   geo_type = "county", geo_value = "42003")
-)
-head(cli)
+
+cli <- suppressMessages(
+  covidcast_signal(data_source = "fb-survey", signal = "smoothed_cli", 
+                   start_day = "2020-05-01", end_day = "2020-05-07", 
+                   geo_type = "county", geo_value = "42003")
+)
+head(cli)
##   geo_value       signal time_value direction      issue lag     value
 ## 1     42003 smoothed_cli 2020-05-01        NA 2020-09-03 125 0.3990346
 ## 2     42003 smoothed_cli 2020-05-02        NA 2020-09-03 124 0.3804239
@@ -194,33 +205,39 @@ 

Finding counties and metro areas

The COVIDcast API identifies counties by their 5-digit FIPS code and metropolitan areas by their CBSA ID number. This means that to query a specific county or metropolitan area, we must have some way to quickly find its identifier.

This package includes several utilities intended to make the process easier. For example, if we look at ?county_census, we find that the package provides census data on every county in the United States, including its FIPS code. To find Allegheny County, we could search the data frame however is most convenient to us:

-
library(dplyr)
+
+library(dplyr)
 
-county_census %>%
-  filter(CTYNAME == "Allegheny County") %>%
-  select(FIPS, CTYNAME, STNAME, POPESTIMATE2019)
+county_census %>% + filter(CTYNAME == "Allegheny County") %>% + select(FIPS, CTYNAME, STNAME, POPESTIMATE2019)
##    FIPS          CTYNAME       STNAME POPESTIMATE2019
 ## 1 42003 Allegheny County Pennsylvania         1216045

Hence we see that 42003 is the code to provide to geo_value to select this county. We also see we can obtain a 2019 population estimate, which could be useful in your analyses to normalize or scale values by population.

Similarly, to find the Pittsburgh metropolitan area, we can use the ?msa_census data provided in the package:

-
msa_census %>%
-  filter(startsWith(NAME, "Pittsburgh")) %>%
-  select(CBSA, NAME, LSAD, POPESTIMATE2019)
+
+msa_census %>%
+  filter(startsWith(NAME, "Pittsburgh")) %>%
+  select(CBSA, NAME, LSAD, POPESTIMATE2019)
##    CBSA           NAME                          LSAD POPESTIMATE2019
 ## 1 38300 Pittsburgh, PA Metropolitan Statistical Area         2317600

We can see that the Pittsburgh metropolitan area has CBSA ID 38300, and we can also get its 2019 census-estimated population. We could pass this ID to covidcast_signal() when using geo_type = 'msa'. (Note: the msa_census data includes types of area beyond metropolitan statistical areas, including micropolitan statistical areas. The LSAD column identifies the type of each area. The COVIDcast API only provides estimates for metropolitan statistical areas, not for their divisions or for micropolitan areas.)

Beyond this, the package provides convenience functions name_to_fips() and name_to_cbsa() for grep()-based searching of county or metropolitan area names in order to find FIPS or CBSA codes, respectively:

-
name_to_fips("Allegheny")
+
+name_to_fips("Allegheny")
## Allegheny County 
 ##          "42003"
-
name_to_cbsa("Pittsburgh")
+
+name_to_cbsa("Pittsburgh")
## Pittsburgh, PA 
 ##        "38300"

The package also provides inverse mappings fips_to_name() and cbsa_to_name() that work in the analogous way:

-
fips_to_name("42003")
+
+fips_to_name("42003")
##              42003 
 ## "Allegheny County"
-
cbsa_to_name("38300")
+
+cbsa_to_name("38300")
##            38300 
 ## "Pittsburgh, PA"

See their documentation for more details (for example, the options for handling matches when counties have the same name).

@@ -230,43 +247,46 @@

Signal metadata

If we are interested in exploring the available signals and their metadata, we can use covidcast_meta() to fetch a data frame of the available signals:

-
meta <- covidcast_meta()
-head(meta)
+
+meta <- covidcast_meta()
+head(meta)
##     data_source           signal time_type geo_type   min_time   max_time
-## 1 doctor-visits smoothed_adj_cli       day   county 2020-02-01 2020-09-15
-## 2 doctor-visits smoothed_adj_cli       day      hrr 2020-02-01 2020-09-15
-## 3 doctor-visits smoothed_adj_cli       day      msa 2020-02-01 2020-09-15
-## 4 doctor-visits smoothed_adj_cli       day    state 2020-02-01 2020-09-15
-## 5 doctor-visits     smoothed_cli       day   county 2020-02-01 2020-09-15
-## 6 doctor-visits     smoothed_cli       day      hrr 2020-02-01 2020-09-15
+## 1 doctor-visits smoothed_adj_cli       day   county 2020-02-01 2020-10-04
+## 2 doctor-visits smoothed_adj_cli       day      hrr 2020-02-01 2020-10-04
+## 3 doctor-visits smoothed_adj_cli       day      msa 2020-02-01 2020-10-04
+## 4 doctor-visits smoothed_adj_cli       day    state 2020-02-01 2020-10-04
+## 5 doctor-visits     smoothed_cli       day   county 2020-02-01 2020-10-04
+## 6 doctor-visits     smoothed_cli       day      hrr 2020-02-01 2020-10-04
 ##   num_locations min_value max_value mean_value stdev_value last_update
-## 1          2514         0  87.67083   2.573882    3.358055  1600470729
-## 2           306         0  47.95543   2.997670    3.132400  1600470729
-## 3           380         0  32.76515   2.696446    3.050960  1600470729
-## 4            52         0  32.79256   3.065387    2.743400  1600470729
-## 5          2517         0  76.56961   2.167686    2.872418  1600470729
-## 6           306         0  44.29551   2.823368    3.063776  1600470729
+## 1          2514         0  87.67083   2.694707    3.358173  1602113529
+## 2           306         0  47.95543   3.109883    3.095318  1602113529
+## 3           380         0  36.49523   2.813566    3.063314  1602113529
+## 4            52         0  32.44975   3.156312    2.694015  1602113529
+## 5          2517         0  76.56961   2.269245    2.869633  1602113529
+## 6           306         0  44.29551   2.926747    3.026804  1602113529
 ##    max_issue min_lag max_lag
-## 1 2020-09-18       2     129
-## 2 2020-09-18       3     129
-## 3 2020-09-18       3     129
-## 4 2020-09-18       3     129
-## 5 2020-09-18       2     129
-## 6 2020-09-18       3     129
+## 1 2020-10-07 2 129 +## 2 2020-10-07 3 129 +## 3 2020-10-07 3 129 +## 4 2020-10-07 3 129 +## 5 2020-10-07 2 129 +## 6 2020-10-07 3 129

The covidcast_meta() documentation describes the columns and their meanings. The metadata data frame can be filtered and sliced as desired to obtain information about signals of interest. To get a basic summary of the metadata:

-
summary(meta)
+
+summary(meta)

(We silenced the evaluation because the output of summary() here is still quite long.)

Tracking issues and updates

-

The COVIDcast API records not just each signal’s estimate for a given location on a given day, but also when that estimate was made, and all updates to that estimate.

+

The COVIDcast API records not just each signal's estimate for a given location on a given day, but also when that estimate was made, and all updates to that estimate.

For example, consider using our doctor visits signal, which estimates the percentage of outpatient doctor visits that are COVID-related, and consider a result row with time_value 2020-05-01 for geo_values = "pa". This is an estimate for the percentage in Pennsylvania on May 1, 2020. That estimate was issued on May 5, 2020, the delay being due to the aggregation of data by our source and the time taken by the COVIDcast API to ingest the data provided. Later, the estimate for May 1st could be updated, perhaps because additional visit data from May 1st arrived at our source and was reported to us. This constitutes a new issue of the data.

By default, covidcast_signal() fetches the most recent issue available. This is the best option for users who simply want to graph the latest data or construct dashboards. But if we are interested in knowing when data was reported, we can request specific data versions.

First, we can request the data that was available as of a specific date, using the as_of argument:

-
covidcast_signal(data_source = "doctor-visits", signal = "smoothed_cli",
-                 start_day = "2020-05-01", end_day = "2020-05-01",
-                 geo_type = "state", geo_values = "pa", as_of = "2020-05-07")
+
+covidcast_signal(data_source = "doctor-visits", signal = "smoothed_cli",
+                 start_day = "2020-05-01", end_day = "2020-05-01",
+                 geo_type = "state", geo_values = "pa", as_of = "2020-05-07")
## A `covidcast_signal` data frame with 1 rows and 10 columns.
 ## 
 ## data_source : doctor-visits
@@ -277,10 +297,11 @@ 

## 1 pa smoothed_cli 2020-05-01 -1 2020-05-07 6 2.32192 NA ## sample_size data_source ## 1 NA doctor-visits

-

This shows that an estimate of about 2.3% was issued on May 7. If we don’t specify as_of, we get the most recent estimate available:

-
covidcast_signal(data_source = "doctor-visits", signal = "smoothed_cli",
-                 start_day = "2020-05-01", end_day = "2020-05-01",
-                 geo_type = "state", geo_values = "pa")
+

This shows that an estimate of about 2.3% was issued on May 7. If we don't specify as_of, we get the most recent estimate available:

+
+covidcast_signal(data_source = "doctor-visits", signal = "smoothed_cli",
+                 start_day = "2020-05-01", end_day = "2020-05-01",
+                 geo_type = "state", geo_values = "pa")
## A `covidcast_signal` data frame with 1 rows and 10 columns.
 ## 
 ## data_source : doctor-visits
@@ -293,10 +314,11 @@ 

## 1 NA doctor-visits

Note the substantial change in the estimate, to over 5%, reflecting new data that became available after May 7 about visits occurring on May 1. This illustrates the importance of issue date tracking, particularly for forecasting tasks. To backtest a forecasting model on past data, it is important to use the data that would have been available at the time, not data that arrived much later.

By using the issues argument, we can request all issues in a certain time period:

-
covidcast_signal(data_source = "doctor-visits", signal = "smoothed_cli",
-                 start_day = "2020-05-01", end_day = "2020-05-01",
-                 geo_type = "state", geo_values = "pa",
-                 issues = c("2020-05-01", "2020-05-15"))
+
+covidcast_signal(data_source = "doctor-visits", signal = "smoothed_cli",
+                 start_day = "2020-05-01", end_day = "2020-05-01",
+                 geo_type = "state", geo_values = "pa",
+                 issues = c("2020-05-01", "2020-05-15"))
## A `covidcast_signal` data frame with 7 rows and 10 columns.
 ## 
 ## data_source : doctor-visits
@@ -321,9 +343,10 @@ 

## 7 NA doctor-visits

This estimate was clearly updated many times as new data for May 1st arrived. Note that these results include only data issued or updated between 2020-05-01 and 2020-05-15. If a value was first reported on 2020-04-15, and never updated, a query for issues between 2020-05-01 and 2020-05-15 will not include that value among its results.

Finally, we can use the lag argument to request only data reported with a certain lag. For example, requesting a lag of 7 days means to request only issues 7 days after the corresponding time_value:

-
covidcast_signal(data_source = "doctor-visits", signal = "smoothed_cli",
-                 start_day = "2020-05-01", end_day = "2020-05-07",
-                 geo_type = "state", geo_values = "pa", lag = 7)
+
+covidcast_signal(data_source = "doctor-visits", signal = "smoothed_cli",
+                 start_day = "2020-05-01", end_day = "2020-05-07",
+                 geo_type = "state", geo_values = "pa", lag = 7)
## Warning: Fetching smoothed_cli from doctor-visits for 20200503 in geography
 ## 'pa': no results
## Warning: Fetching smoothed_cli from doctor-visits for 20200504 in geography
@@ -347,10 +370,11 @@ 

## 4 NA doctor-visits ## 5 NA doctor-visits

Note that though this query requested all values between 2020-05-01 and 2020-05-07, May 3rd and May 4th were not included in the results set. This is because the query will only include a result for May 3rd if a value were issued on May 10th (a 7-day lag), but in fact the value was not updated on that day:

-
covidcast_signal(data_source = "doctor-visits", signal = "smoothed_cli",
-                 start_day = "2020-05-03", end_day = "2020-05-03",
-                 geo_type = "state", geo_values = "pa",
-                 issues = c("2020-05-09", "2020-05-15"))
+
+covidcast_signal(data_source = "doctor-visits", signal = "smoothed_cli",
+                 start_day = "2020-05-03", end_day = "2020-05-03",
+                 geo_type = "state", geo_values = "pa",
+                 issues = c("2020-05-09", "2020-05-15"))
## A `covidcast_signal` data frame with 5 rows and 10 columns.
 ## 
 ## data_source : doctor-visits
@@ -387,7 +411,7 @@ 

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/articles/index.html b/docs/covidcastR/articles/index.html index 175be19d..56600e5d 100644 --- a/docs/covidcastR/articles/index.html +++ b/docs/covidcastR/articles/index.html @@ -154,7 +154,7 @@

All vignettes

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/articles/plotting-signals.html b/docs/covidcastR/articles/plotting-signals.html index 8a1fdedf..c5226ef5 100644 --- a/docs/covidcastR/articles/plotting-signals.html +++ b/docs/covidcastR/articles/plotting-signals.html @@ -85,7 +85,7 @@ -
+

Bubble maps

As an alternative to choropleth maps, we can also quickly plot bubble maps. By default, bubble maps have 8 bubble size bins evenly spaced over the range, where zero always means zero bubble size. The legend shows all bins, interpreted as each bubble size meaning at least the corresponding value.

-
plot(df_inum, plot_type = "bubble")
+
+plot(df_inum, plot_type = "bubble")

As before, we can of course set customized breaks. As values to the left of the first bin do not get drawn, this map is much sparser, and highlights areas with larger case counts.

-
plot(df_inum, plot_type = "bubble", bubble_params = list(breaks = seq(20, 200, len = 6)))
+
+plot(df_inum, plot_type = "bubble", 
+     bubble_params = list(breaks = seq(20, 200, len = 6)))

-

As a final example, suppose we want to plot only counties in the state of Texas. We’d like to compare counts per 100,000 against absolute counts, so we fetch the proportion signal:

-
df_iprop <- suppressMessages(
-  covidcast_signal(data_source = "jhu-csse",
-                   signal = "confirmed_7dav_incidence_prop",
-                   start_day = "2020-07-01", end_day = "2020-07-14")
-)
+

As a final example, suppose we want to plot only counties in the state of Texas. We'd like to compare counts per 100,000 against absolute counts, so we fetch the proportion signal:

+
+df_iprop <- suppressMessages(
+  covidcast_signal(data_source = "jhu-csse",
+                   signal = "confirmed_7dav_incidence_prop",
+                   start_day = "2020-07-01", end_day = "2020-07-14")
+)

Then we make two maps side-by-side with custom ranges:

-
library(gridExtra)
+
+library(gridExtra)
 
-breaks1 <- c(0, 1, 10, 100, 1000)
-breaks2 <- c(0, 10, 50, 100, 500)
+breaks1 <- c(1, 10, 100, 1000)
+breaks2 <- c(10, 50, 100, 500)
 
-p1 <- plot(df_inum, plot_type = "bubble", bubble_params = list(breaks = breaks1, max_size = 6),
-           include = "TX", bubble_col = "red",
-           title = paste("Incidence Number on", max(df_inum$time_value)))
-p2 <- plot(df_iprop, plot_type = "bubble", bubble_params = list(breaks = breaks2, max_size = 6),
-           include = "TX", bubble_col = "red",
-           title = paste("Incidence Proportion on", max(df_iprop$time_value)))
+p1 <- plot(df_inum, plot_type = "bubble", 
+           bubble_params = list(breaks = breaks1, max_size = 6),
+           include = "TX", bubble_col = "red",
+           title = paste("Incidence number on", max(df_inum$time_value)))
+p2 <- plot(df_iprop, plot_type = "bubble", 
+           bubble_params = list(breaks = breaks2, max_size = 6),
+           include = "TX", bubble_col = "red",
+           title = paste("Incidence rate on", max(df_iprop$time_value)))
 
-grid.arrange(p1, p2, nrow = 1)
+grid.arrange(p1, p2, nrow = 1)

Time series plots

-

Let’s fetch the combination indicator and case counts, but for all states rather than for all counties. This will make the time series plots more manageable.

-
suppressMessages({
-  df_comb_st <- covidcast_signal(data_source = "indicator-combination",
-                                 signal = "nmf_day_doc_fbc_fbs_ght",
-                                 start_day = "2020-04-15", end_day = "2020-07-01",
-                                 geo_type = "state")
-  df_inum_st <- covidcast_signal(data_source = "jhu-csse",
-                                 signal = "confirmed_7dav_incidence_num",
-                                 start_day = "2020-04-15", end_day = "2020-07-01",
-                                 geo_type = "state")
-})
-

By default, time series plots show all available data, including all geographies. A line for every state would be unmanageable, so let’s select a few states and plot all data for them:

-
library(dplyr)
+

Let's fetch the combination indicator and case counts, but for all states rather than for all counties. This will make the time series plots more manageable.

+
+suppressMessages({
+df_comb_st <- covidcast_signal(data_source = "indicator-combination",
+                               signal = "nmf_day_doc_fbc_fbs_ght",
+                               start_day = "2020-04-15", end_day = "2020-07-01",
+                               geo_type = "state")
+df_inum_st <- covidcast_signal(data_source = "jhu-csse",
+                               signal = "confirmed_7dav_incidence_num",
+                               start_day = "2020-04-15", end_day = "2020-07-01",
+                               geo_type = "state")
+})
+

By default, time series plots show all available data, including all geographies. A line for every state would be unmanageable, so let's select a few states and plot all data for them:

+
+library(dplyr)
 
-states <- c("ca", "pa", "tx", "ny")
-plot(df_comb_st %>% filter(geo_value %in% states), plot_type = "line")
+states <- c("ca", "pa", "tx", "ny") +plot(df_comb_st %>% filter(geo_value %in% states), plot_type = "line")

-
plot(df_inum_st %>% filter(geo_value %in% states), plot_type = "line")
+
+plot(df_inum_st %>% filter(geo_value %in% states), plot_type = "line")

Notice how in Texas, the combined indicator rose several weeks in advance of confirmed cases, suggesting the signal could be predictive. Delphi is investigating these signals for their usefulness in forecasting, as well as hotspot detection and will publish results when they are available.

@@ -228,50 +245,51 @@

Manual plotting

Using ggplot2 or your favorite plotting package, we can easily plot time series manually, without using the plot.covidcast_signal() method. You can use this to customize the appearance of your plots however you choose.

For example:

-
library(ggplot2)
+
+library(ggplot2)
 
-suppressMessages({
-  df_comb_md <- covidcast_signal(data_source = "indicator-combination",
-                                 signal = "nmf_day_doc_fbc_fbs_ght",
-                                 start_day = "2020-06-01", end_day = "2020-07-15",
-                                 geo_values = name_to_fips("Miami-Dade"))
-  df_inum_md <- covidcast_signal(data_source = "jhu-csse",
-                                 signal = "confirmed_7dav_incidence_num",
-                                 start_day = "2020-06-01", end_day = "2020-07-15",
-                                 geo_values = name_to_fips("Miami-Dade"))
-})
+suppressMessages({
+df_comb_md <- covidcast_signal(data_source = "indicator-combination",
+                               signal = "nmf_day_doc_fbc_fbs_ght",
+                               start_day = "2020-06-01", end_day = "2020-07-15",
+                               geo_values = name_to_fips("Miami-Dade"))
+df_inum_md <- covidcast_signal(data_source = "jhu-csse",
+                               signal = "confirmed_7dav_incidence_num",
+                               start_day = "2020-06-01", end_day = "2020-07-15",
+                               geo_values = name_to_fips("Miami-Dade"))
+})
 
 # Compute the ranges of the two signals
-range1 <- df_inum_md %>% select("value") %>% range
-range2 <- df_comb_md %>% select("value") %>% range
+range1 <- df_inum_md %>% select("value") %>% range
+range2 <- df_comb_md %>% select("value") %>% range
 
 # Function to transform from one range to another
-trans <- function(x, from_range, to_range) {
-  (x - from_range[1]) / (from_range[2] - from_range[1]) *
-    (to_range[2] - to_range[1]) + to_range[1]
-}
+trans <- function(x, from_range, to_range) {
+  (x - from_range[1]) / (from_range[2] - from_range[1]) *
+    (to_range[2] - to_range[1]) + to_range[1]
+}
 
 # Convenience functions for our two signal ranges
-trans12 <- function(x) trans(x, range1, range2)
-trans21 <- function(x) trans(x, range2, range1)
+trans12 <- function(x) trans(x, range1, range2)
+trans21 <- function(x) trans(x, range2, range1)
 
 # Transform the combined signal to the incidence range, then stack
 # these rowwise into one data frame
-df <- select(rbind(df_comb_md %>% mutate_at("value", trans21),
-                   df_inum_md), c("time_value", "value"))
-df$signal <- c(rep("Combined indicator", nrow(df_comb_md)),
-               rep("New COVID-19 cases", nrow(df_inum_md)))
+df <- select(rbind(df_comb_md %>% mutate_at("value", trans21),
+                   df_inum_md), c("time_value", "value"))
+df$signal <- c(rep("Combined indicator", nrow(df_comb_md)),
+               rep("New COVID-19 cases", nrow(df_inum_md)))   
 
 # Finally, plot both signals
-ggplot(df, aes(x = time_value, y = value)) +
-  labs(x = "Date", title = "Miami-Dade County") +
-  geom_line(aes(color = signal)) +
-  scale_y_continuous(
-    name = "New COVID-19 cases (7-day trailing avg)",
-    sec.axis = sec_axis(trans12, name = "Combination of COVID-19 indicators")
-  ) +
-  theme(legend.position = "bottom",
-        legend.title = ggplot2::element_blank())
+ggplot(df, aes(x = time_value, y = value)) + + labs(x = "Date", title = "Miami-Dade County") + + geom_line(aes(color = signal)) + + scale_y_continuous( + name = "New COVID-19 cases (7-day trailing average)", + sec.axis = sec_axis(trans12, name = "Combination of COVID-19 indicators") + ) + + theme(legend.position = "bottom", + legend.title = ggplot2::element_blank())

Again, we see that the combined indicator starts rising several days before the new COVID-19 cases do, an exciting phenomenon that Delphi is studying now.

@@ -292,7 +310,7 @@

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-11-1.png b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-11-1.png index 5ebb2209..077465b4 100644 Binary files a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-11-1.png and b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-11-1.png differ diff --git a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-14-1.png b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-14-1.png index 4fc0d50b..50336cdf 100644 Binary files a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-14-1.png and b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-14-1.png differ diff --git a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-14-2.png b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-14-2.png index 0f837be0..c698920e 100644 Binary files a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-14-2.png and b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-14-2.png differ diff --git a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-15-1.png b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-15-1.png index 35172964..2d367fdb 100644 Binary files a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-15-1.png and b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-15-1.png differ diff --git a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-3-1.png b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-3-1.png index c56ecb5d..6f119238 100644 Binary files a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-3-1.png and b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-3-1.png differ diff --git a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-4-1.png b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-4-1.png index 8ecb964b..d58cdf91 100644 Binary files a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-4-1.png and b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-4-1.png differ diff --git a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-5-1.png b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-5-1.png index 1d720545..18c3b3a6 100644 Binary files a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-5-1.png and b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-5-1.png differ diff --git a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-6-1.png b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-6-1.png index 447ef6cf..33c3e8d6 100644 Binary files a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-6-1.png and b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-6-1.png differ diff --git a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-7-1.png b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-7-1.png index bc9a4749..f03a85c7 100644 Binary files a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-7-1.png and b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-7-1.png differ diff --git a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-8-1.png b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-8-1.png index 0fafa2ff..2d92b5e6 100644 Binary files a/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-8-1.png and b/docs/covidcastR/articles/plotting-signals_files/figure-html/unnamed-chunk-8-1.png differ diff --git a/docs/covidcastR/authors.html b/docs/covidcastR/authors.html index 6cfaa941..80e673e0 100644 --- a/docs/covidcastR/authors.html +++ b/docs/covidcastR/authors.html @@ -170,7 +170,7 @@

Authors

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/index.html b/docs/covidcastR/index.html index 20d4d4e2..cb2d8a54 100644 --- a/docs/covidcastR/index.html +++ b/docs/covidcastR/index.html @@ -142,7 +142,7 @@

Developers

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/news/index.html b/docs/covidcastR/news/index.html index e85342f1..54c51336 100644 --- a/docs/covidcastR/news/index.html +++ b/docs/covidcastR/news/index.html @@ -132,9 +132,9 @@

Changelog

Source: NEWS.md
-
+

-covidcast 0.3.0

+covidcast 0.3.0

Released August 22, 2020.

@@ -147,9 +147,9 @@

-
+

-covidcast 0.2.0

+covidcast 0.2.0

Released July 26, 2020.

@@ -169,9 +169,9 @@

-
+

-covidcast 0.1.0

+covidcast 0.1.0
  • First major release.
@@ -193,7 +193,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/pkgdown.css b/docs/covidcastR/pkgdown.css index c01e5923..1273238d 100644 --- a/docs/covidcastR/pkgdown.css +++ b/docs/covidcastR/pkgdown.css @@ -244,14 +244,14 @@ nav[data-toggle='toc'] .nav .nav > .active:focus > a { .ref-index th {font-weight: normal;} -.ref-index td {vertical-align: top;} +.ref-index td {vertical-align: top; min-width: 100px} .ref-index .icon {width: 40px;} .ref-index .alias {width: 40%;} .ref-index-icons .alias {width: calc(40% - 40px);} .ref-index .title {width: 60%;} .ref-arguments th {text-align: right; padding-right: 10px;} -.ref-arguments th, .ref-arguments td {vertical-align: top;} +.ref-arguments th, .ref-arguments td {vertical-align: top; min-width: 100px} .ref-arguments .name {width: 20%;} .ref-arguments .desc {width: 80%;} diff --git a/docs/covidcastR/pkgdown.yml b/docs/covidcastR/pkgdown.yml index 559834ec..eac41722 100644 --- a/docs/covidcastR/pkgdown.yml +++ b/docs/covidcastR/pkgdown.yml @@ -1,9 +1,9 @@ -pandoc: 2.7.3 -pkgdown: 1.5.1 +pandoc: 1.19.2.1 +pkgdown: 1.6.1 pkgdown_sha: ~ articles: correlation-utils: correlation-utils.html covidcast: covidcast.html plotting-signals: plotting-signals.html -last_built: 2020-09-20T02:10Z +last_built: 2020-10-08T21:55Z diff --git a/docs/covidcastR/reference/Rplot001.png b/docs/covidcastR/reference/Rplot001.png new file mode 100644 index 00000000..17a35806 Binary files /dev/null and b/docs/covidcastR/reference/Rplot001.png differ diff --git a/docs/covidcastR/reference/abbr_to_name.html b/docs/covidcastR/reference/abbr_to_name.html index c3c58a6c..22a8b3d2 100644 --- a/docs/covidcastR/reference/abbr_to_name.html +++ b/docs/covidcastR/reference/abbr_to_name.html @@ -142,13 +142,13 @@

Get state names from state abbreviations

regular expressions.

-
abbr_to_name(
-  abbr,
-  ignore.case = FALSE,
-  perl = FALSE,
-  fixed = FALSE,
-  ties_method = c("first", "all")
-)
+
abbr_to_name(
+  abbr,
+  ignore.case = FALSE,
+  perl = FALSE,
+  fixed = FALSE,
+  ties_method = c("first", "all")
+)

Arguments

@@ -180,8 +180,10 @@

See a

Examples

-
abbr_to_name("PA")
#> PA -#> "Pennsylvania"
abbr_to_name(c("PA", "PR", "DC"))
#> PA PR +
abbr_to_name("PA") +
#> PA +#> "Pennsylvania"
abbr_to_name(c("PA", "PR", "DC")) +
#> PA PR #> "Pennsylvania" "Puerto Rico Commonwealth" #> DC #> "District of Columbia"
@@ -201,7 +203,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/county_census.html b/docs/covidcastR/reference/county_census.html index 6fc0443c..d15e84d3 100644 --- a/docs/covidcastR/reference/county_census.html +++ b/docs/covidcastR/reference/county_census.html @@ -138,14 +138,14 @@

County census population data

Data set on county populations, from the 2019 US Census.

-
county_census
+
county_census

Format

A data frame with 3193 rows, one for each county (along with the 50 states and DC). There are many columns. The most crucial are:

- +
FIPS

5-digit county FIPS codes. These are unique identifiers used, for example, as the geo_values argument to covidcast_signal() to request data from a specific county.

@@ -180,7 +180,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/county_geo.html b/docs/covidcastR/reference/county_geo.html index 3616c98f..38d73d6f 100644 --- a/docs/covidcastR/reference/county_geo.html +++ b/docs/covidcastR/reference/county_geo.html @@ -140,13 +140,13 @@

County latitudes and longitudes

Weather Service.

-
county_geo
+
county_geo

Format

Data frame with 3331 rows, each representing one county. Columns:

- +
COUNTYNAME

Name of the county.

FIPS

5-digit county FIPS code.

STATE

Two-letter state abbreviation.

@@ -174,7 +174,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/covidcast_cor.html b/docs/covidcastR/reference/covidcast_cor.html index ce3881ee..63cc5ee7 100644 --- a/docs/covidcastR/reference/covidcast_cor.html +++ b/docs/covidcastR/reference/covidcast_cor.html @@ -6,7 +6,7 @@ -Compute correlations between two <code>covidcast_signal</code> data frames — covidcast_cor • covidcast +Compute correlations between two covidcast_signal data frames — covidcast_cor • covidcast @@ -39,7 +39,7 @@ - + for examples.

-
covidcast_cor(
-  x,
-  y,
-  dt_x = 0,
-  dt_y = 0,
-  by = c("geo_value", "time_value"),
-  use = "na.or.complete",
-  method = c("pearson", "kendall", "spearman")
-)
+
covidcast_cor(
+  x,
+  y,
+  dt_x = 0,
+  dt_y = 0,
+  by = c("geo_value", "time_value"),
+  use = "na.or.complete",
+  method = c("pearson", "kendall", "spearman")
+)

Arguments

@@ -205,7 +205,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/covidcast_meta.html b/docs/covidcastR/reference/covidcast_meta.html index f60f2c08..3b92258f 100644 --- a/docs/covidcastR/reference/covidcast_meta.html +++ b/docs/covidcastR/reference/covidcast_meta.html @@ -140,7 +140,7 @@

Fetch Delphi's COVID-19 Surveillance Streams metadata.

streams from the COVIDcast API.

-
covidcast_meta()
+
covidcast_meta()

Value

@@ -190,7 +190,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/covidcast_signal.html b/docs/covidcastR/reference/covidcast_signal.html index 4b2045fe..01d2f111 100644 --- a/docs/covidcastR/reference/covidcast_signal.html +++ b/docs/covidcastR/reference/covidcast_signal.html @@ -150,17 +150,17 @@

Produce a data frame for one signal.

argument. View vignette("covidcast") for detailed example usage.

-
covidcast_signal(
-  data_source,
-  signal,
-  start_day = NULL,
-  end_day = NULL,
-  geo_type = c("county", "hrr", "msa", "dma", "state"),
-  geo_values = "*",
-  as_of = NULL,
-  issues = NULL,
-  lag = NULL
-)
+
covidcast_signal(
+  data_source,
+  signal,
+  start_day = NULL,
+  end_day = NULL,
+  geo_type = c("county", "hrr", "msa", "dma", "state"),
+  geo_values = "*",
+  as_of = NULL,
+  issues = NULL,
+  lag = NULL
+)

Arguments

@@ -197,7 +197,7 @@

Arg

- @@ -306,22 +306,24 @@

See a state_census

Examples

-
if (FALSE) { +
if (FALSE) { ## Fetch all counties from 2020-05-10 to the most recent available data -covidcast_signal("fb-survey", "raw_cli", start_day = "2020-05-10") +covidcast_signal("fb-survey", "raw_cli", start_day = "2020-05-10") ## Fetch all counties on just 2020-05-10 and no other days -covidcast_signal("fb-survey", "raw_cli", start_day = "2020-05-10", - end_day = "2020-05-10") +covidcast_signal("fb-survey", "raw_cli", start_day = "2020-05-10", + end_day = "2020-05-10") ## Fetch all states on 2020-05-10, 2020-05-11, 2020-05-12 -covidcast_signal("fb-survey", "raw_cli", start_day = "2020-05-10", - end_day = "2020-05-12", geo_type = "state") +covidcast_signal("fb-survey", "raw_cli", start_day = "2020-05-10", + end_day = "2020-05-12", geo_type = "state") ## Fetch all available data for just Pennsylvania and New Jersey -covidcast_signal("fb-survey", "raw_cli", geo_type = "state", - geo_values = c("pa", "nj")) +covidcast_signal("fb-survey", "raw_cli", geo_type = "state", + geo_values = c("pa", "nj")) ## Fetch all available data in the Pittsburgh metropolitan area -covidcast_signal("fb-survey", "raw_cli", geo_type = "msa", - geo_values = name_to_cbsa("Pittsburgh")) -}
+covidcast_signal("fb-survey", "raw_cli", geo_type = "msa", + geo_values = name_to_cbsa("Pittsburgh")) +} + +
-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/earliest_issue.html b/docs/covidcastR/reference/earliest_issue.html index 0734e140..cdff014f 100644 --- a/docs/covidcastR/reference/earliest_issue.html +++ b/docs/covidcastR/reference/earliest_issue.html @@ -142,7 +142,7 @@

Fetch only the earliest issue for each observation in a data frame.

recent for plotting, mapping, or other purposes.

-
earliest_issue(df)
+
earliest_issue(df)

Arguments

geo_values

Which geographies to return. The default, "*", fetches +

Which geographies to return. The default, "*", fetches all geographies. To fetch specific geographies, specify their IDs as a vector or list of strings. See the geographic coding documentation for details on how to specify these IDs.

@@ -173,7 +173,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/fips_to_name.html b/docs/covidcastR/reference/fips_to_name.html index ea7aca7d..670bdd38 100644 --- a/docs/covidcastR/reference/fips_to_name.html +++ b/docs/covidcastR/reference/fips_to_name.html @@ -142,21 +142,21 @@

Get county or metropolitan area names from FIPS or CBSA codes

regular expressions.

-
fips_to_name(
-  code,
-  ignore.case = FALSE,
-  perl = FALSE,
-  fixed = FALSE,
-  ties_method = c("first", "all")
-)
-
-cbsa_to_name(
-  code,
-  ignore.case = FALSE,
-  perl = FALSE,
-  fixed = FALSE,
-  ties_method = c("first", "all")
-)
+
fips_to_name(
+  code,
+  ignore.case = FALSE,
+  perl = FALSE,
+  fixed = FALSE,
+  ties_method = c("first", "all")
+)
+
+cbsa_to_name(
+  code,
+  ignore.case = FALSE,
+  perl = FALSE,
+  fixed = FALSE,
+  ties_method = c("first", "all")
+)

Arguments

@@ -188,9 +188,12 @@

See a

Examples

-
fips_to_name("42003")
#> 42003 -#> "Allegheny County"
cbsa_to_name("38300")
#> 38300 -#> "Pittsburgh, PA"
fips_to_name("4200", ties_method = "all")
#> [[1]] +
fips_to_name("42003") +
#> 42003 +#> "Allegheny County"
cbsa_to_name("38300") +
#> 38300 +#> "Pittsburgh, PA"
fips_to_name("4200", ties_method = "all") +
#> [[1]] #> 42000 42001 42003 42005 #> "Pennsylvania" "Adams County" "Allegheny County" "Armstrong County" #> 42007 42009 @@ -198,7 +201,8 @@

Examp #>

# Count the number of counties, grouped by first two digits of FIPS code # (which identify states): -unlist(lapply(fips_to_name(sprintf("%02d", 1:99), ties = "all"), length))
#> [1] 462 252 286 203 337 228 252 220 245 405 325 207 335 92 178 135 251 154 216 +unlist(lapply(fips_to_name(sprintf("%02d", 1:99), ties = "all"), length)) +
#> [1] 462 252 286 203 337 228 252 220 245 405 325 207 335 92 178 135 251 154 216 #> [20] 311 269 93 107 36 98 95 171 95 193 191 219 73 93 27 103 69 168 59 #> [39] 154 207 107 73 65 11 112 72 158 255 92 229 238 1 97 57 128 24 57 #> [58] 2 57 276 126 1 55 2 54 1 54 4 53 210 231 3 53 1 53 1 @@ -220,7 +224,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/index.html b/docs/covidcastR/reference/index.html index 76a5141d..5614cc00 100644 --- a/docs/covidcastR/reference/index.html +++ b/docs/covidcastR/reference/index.html @@ -272,7 +272,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/latest_issue.html b/docs/covidcastR/reference/latest_issue.html index 1d00fff5..0e1f05c1 100644 --- a/docs/covidcastR/reference/latest_issue.html +++ b/docs/covidcastR/reference/latest_issue.html @@ -142,7 +142,7 @@

Fetch only the latest issue for each observation in a data frame.

recent for plotting, mapping, or other purposes.

-
latest_issue(df)
+
latest_issue(df)

Arguments

@@ -173,7 +173,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/msa_census.html b/docs/covidcastR/reference/msa_census.html index dabbfb42..f32005f3 100644 --- a/docs/covidcastR/reference/msa_census.html +++ b/docs/covidcastR/reference/msa_census.html @@ -144,7 +144,7 @@

Metro area population data

areas.

-
msa_census
+
msa_census

Format

@@ -153,7 +153,7 @@

Format - +
CBSA

Core Based Statistical Area code. These are unique identifiers used, for example, as the geo_values argument to covidcast_signal() when requesting data from specific metro areas (with geo_type = 'msa').

@@ -188,7 +188,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/name_to_abbr.html b/docs/covidcastR/reference/name_to_abbr.html index f0ba33d1..660efa99 100644 --- a/docs/covidcastR/reference/name_to_abbr.html +++ b/docs/covidcastR/reference/name_to_abbr.html @@ -142,13 +142,13 @@

Get state abbreviations from state names

regular expressions.

-
name_to_abbr(
-  name,
-  ignore.case = FALSE,
-  perl = FALSE,
-  fixed = FALSE,
-  ties_method = c("first", "all")
-)
+
name_to_abbr(
+  name,
+  ignore.case = FALSE,
+  perl = FALSE,
+  fixed = FALSE,
+  ties_method = c("first", "all")
+)

Arguments

@@ -180,8 +180,10 @@

See a

Examples

-
name_to_abbr("Penn")
#> Pennsylvania -#> "PA"
name_to_abbr(c("Penn", "New"), ties_method = "all")
#> [[1]] +
name_to_abbr("Penn") +
#> Pennsylvania +#> "PA"
name_to_abbr(c("Penn", "New"), ties_method = "all") +
#> [[1]] #> Pennsylvania #> "PA" #> @@ -205,7 +207,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/name_to_fips.html b/docs/covidcastR/reference/name_to_fips.html index d3657455..ad283dea 100644 --- a/docs/covidcastR/reference/name_to_fips.html +++ b/docs/covidcastR/reference/name_to_fips.html @@ -142,23 +142,23 @@

Get FIPS or CBSA codes from county or metropolitan area names

regular expressions.

-
name_to_fips(
-  name,
-  ignore.case = FALSE,
-  perl = FALSE,
-  fixed = FALSE,
-  ties_method = c("first", "all"),
-  state = NULL
-)
-
-name_to_cbsa(
-  name,
-  ignore.case = FALSE,
-  perl = FALSE,
-  fixed = FALSE,
-  ties_method = c("first", "all"),
-  state = NULL
-)
+
name_to_fips(
+  name,
+  ignore.case = FALSE,
+  perl = FALSE,
+  fixed = FALSE,
+  ties_method = c("first", "all"),
+  state = NULL
+)
+
+name_to_cbsa(
+  name,
+  ignore.case = FALSE,
+  perl = FALSE,
+  fixed = FALSE,
+  ties_method = c("first", "all"),
+  state = NULL
+)

Arguments

@@ -200,13 +200,18 @@

See a

Examples

-
name_to_fips("Allegheny")
#> Allegheny County -#> "42003"
name_to_cbsa("Pittsburgh")
#> Pittsburgh, PA -#> "38300"
name_to_fips("Miami")
#> Warning: Some inputs were not uniquely matched; returning only the first match in each case.
#> Miami-Dade County -#> "12086"
name_to_fips("Miami", ties_method = "all")
#> [[1]] +
name_to_fips("Allegheny") +
#> Allegheny County +#> "42003"
name_to_cbsa("Pittsburgh") +
#> Pittsburgh, PA +#> "38300"
name_to_fips("Miami") +
#> Warning: Some inputs were not uniquely matched; returning only the first match in each case.
#> Miami-Dade County +#> "12086"
name_to_fips("Miami", ties_method = "all") +
#> [[1]] #> Miami-Dade County Miami County Miami County Miami County #> "12086" "18103" "20121" "39109" -#>
name_to_fips(c("Allegheny", "Miami", "New "), ties_method = "all")
#> [[1]] +#>
name_to_fips(c("Allegheny", "Miami", "New "), ties_method = "all") +
#> [[1]] #> Allegheny County #> "42003" #> @@ -238,7 +243,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/plot.covidcast_signal.html b/docs/covidcastR/reference/plot.covidcast_signal.html index 9161b622..3027076c 100644 --- a/docs/covidcastR/reference/plot.covidcast_signal.html +++ b/docs/covidcastR/reference/plot.covidcast_signal.html @@ -6,7 +6,7 @@ -Plot <code>covidcast_signal</code> objects — plot.covidcast_signal • covidcast +Plot covidcast_signal objects — plot.covidcast_signal • covidcast @@ -39,7 +39,7 @@ - + @@ -143,26 +143,24 @@

Plot covidcast_signal objects

# S3 method for covidcast_signal
-plot(
-  x,
-  plot_type = c("choro", "bubble", "line"),
-  time_value = NULL,
-  include = c(),
-  range = NULL,
-  choro_col = c("#FFFFCC", "#FD893C", "#800026"),
-  alpha = 0.5,
-  direction = FALSE,
-  dir_col = c("#6F9CC6", "#E4E4E4", "#C56B59"),
-  bubble_col = "purple",
-  num_bins = 8,
-  line_col = 1:6,
-  line_type = rep(1:6, each = length(line_col)),
-  title = NULL,
-  choro_params = list(),
-  bubble_params = list(),
-  line_params = list(),
-  ...
-)
+plot( + x, + plot_type = c("choro", "bubble", "line"), + time_value = NULL, + include = c(), + range = NULL, + choro_col = c("#FFFFCC", "#FD893C", "#800026"), + alpha = 0.5, + direction = FALSE, + dir_col = c("#6F9CC6", "#E4E4E4", "#C56B59"), + bubble_col = "purple", + num_bins = 8, + title = NULL, + choro_params = list(), + bubble_params = list(), + line_params = list(), + ... +)

Arguments

@@ -198,9 +196,11 @@

Arg

+default, then for the maps, the min and max are set to be the mean +/- 3 +standard deviations, where this mean and standard deviation are as provided +in the metadata for the given data source and signal; and for the time +series plot, they are set to be the observed min and max of the values over +the given time period.

@@ -233,20 +233,11 @@

Arg

- - - - - - - - @@ -260,7 +251,7 @@

Arg

-

Vector of two values: min and max, in this order, to use when defining the color scale for choropleth maps and the size scale for bubble maps, or the range of the y-axis for the time series plot. If NULL, the -default, then the min and max are set to be the mean +/- 3 standard -deviations, where this mean and standard deviation are as provided in the -meta data for the given data source and signal.

choro_col
num_bins

Number of bins for determining the bubble sizes for the -bubble map. Default is 6. These bins are evenly-spaced in between the min +bubble map (here and throughout, to be precise, by bubble size we mean +bubble area). Default is 8. These bins are evenly-spaced in between the min and max as specified through the range parameter. Each bin is assigned the same bubble size. Also, values of zero special: it has its own separate (small) bin, and values mapped to the zero bin are not drawn.

line_col

Vector of colors for the time series plot. This will be -recycled as necessary. Default is 1:6.

line_type

Vector of line types for the time series plot. This will be -recycled as necessary. Default is rep(1:6, each = length(col)).

title
...

Additional arguments, for compatibility with plot(). Currently +

Additional arguments, for compatibility with plot(). Currently unused.

@@ -269,11 +260,12 @@

Details

The following named arguments are supported through the lists choro_params, bubble_params, and line_params.

-

For both choropleth and bubble maps:

+

For both choropleth and bubble maps:

subtitle

Subtitle for the map.

missing_col

Color assigned to missing or NA geo locations.

border_col

Border color for geo locations.

border_size

Border size for geo locations.

+
legend_position

Position for legend; use "none" to hide legend.

legend_height, legend_width

Height and width of the legend.

breaks

Breaks for a custom (discrete) color or size scale. Note that we must set breaks to be a vector of the same length as choro_col @@ -281,23 +273,29 @@

Details choropleth maps, or the ith size for bubble maps, if and only if the given value satisfies breaks[i] <= value < breaks[i+1], where we take by convention breaks[0] = -Inf and breaks[N+1] = Inf for N = length(breaks).

+
legend_digits

Number of decimal places to show for the legend +labels.

-

For choropleth maps only:

-
legend_n

Number of values to label on the color bar.

+

For choropleth maps only:

+
legend_n

Number of values to label on the legend color bar. Ignored +for discrete color scales (when breaks is set manually) and for direction +maps.

-

For bubble maps only:

+

For bubble maps only:

remove_zero

Should zeros be excluded from the size scale (hence effectively drawn as bubbles of zero size)?

min_size, max_size

Min size for the size scale.

-

For line graphs:

+

For line graphs:

xlab, ylab

Labels for the x-axis and y-axis.

+
stderr_bands

Should standard error bands be drawn?

+
stderr_alpha

Transparency level for the standard error bands.

@@ -317,7 +315,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/print.covidcast_meta.html b/docs/covidcastR/reference/print.covidcast_meta.html index 87e55250..de35e116 100644 --- a/docs/covidcastR/reference/print.covidcast_meta.html +++ b/docs/covidcastR/reference/print.covidcast_meta.html @@ -6,7 +6,7 @@ -Print <code>covidcast_meta</code> object — print.covidcast_meta • covidcast +Print covidcast_meta object — print.covidcast_meta • covidcast @@ -39,7 +39,7 @@ - + @@ -141,7 +141,7 @@

Print covidcast_meta object

# S3 method for covidcast_meta
-print(x, ...)
+print(x, ...)

Arguments

@@ -173,7 +173,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/print.covidcast_signal.html b/docs/covidcastR/reference/print.covidcast_signal.html index a1966192..d8ead43f 100644 --- a/docs/covidcastR/reference/print.covidcast_signal.html +++ b/docs/covidcastR/reference/print.covidcast_signal.html @@ -6,7 +6,7 @@ -Print <code>covidcast_signal</code> objects — print.covidcast_signal • covidcast +Print covidcast_signal objects — print.covidcast_signal • covidcast @@ -39,7 +39,7 @@ - + @@ -143,7 +143,7 @@

Print covidcast_signal objects

# S3 method for covidcast_signal
-print(x, ...)
+print(x, ...)

Arguments

@@ -175,7 +175,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/state_census.html b/docs/covidcastR/reference/state_census.html index a2426be6..c364515d 100644 --- a/docs/covidcastR/reference/state_census.html +++ b/docs/covidcastR/reference/state_census.html @@ -138,7 +138,7 @@

State population data

Data set on state populations, from the 2019 US Census.

-
state_census
+
state_census

Format

@@ -146,7 +146,7 @@

FormatData frame with 53 rows (including one for the United States as a whole, plus the District of Columbia and the Puerto Rico Commonwealth). Important columns:

- +
NAME

Name of the state.

POPESTIMATE2019

Estimate of the state's resident population in 2019.

@@ -171,7 +171,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/state_geo.html b/docs/covidcastR/reference/state_geo.html index f5336d8e..3a0ba884 100644 --- a/docs/covidcastR/reference/state_geo.html +++ b/docs/covidcastR/reference/state_geo.html @@ -138,14 +138,14 @@

State latitudes and longitudes

Data set on latitudes and longitudes of state centroids, from Google's DSPL.

-
state_geo
+
state_geo

Format

Data frame with 52 rows, each representing one state (including Puerto Rico and the District of Columbia). Columns:

- +
STATE

Two-letter state abbreviation.

LAT

Latitude of state centroid.

LON

Longitude of state centroid.

@@ -172,7 +172,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/summary.covidcast_meta.html b/docs/covidcastR/reference/summary.covidcast_meta.html index b64ba231..0f6d1e47 100644 --- a/docs/covidcastR/reference/summary.covidcast_meta.html +++ b/docs/covidcastR/reference/summary.covidcast_meta.html @@ -6,7 +6,7 @@ -Summarize <code>covidcast_meta</code> object — summary.covidcast_meta • covidcast +Summarize covidcast_meta object — summary.covidcast_meta • covidcast @@ -39,7 +39,7 @@ - + @@ -139,7 +139,7 @@

Summarize covidcast_meta object

# S3 method for covidcast_meta
-summary(object, ...)
+summary(object, ...)

Arguments

@@ -171,7 +171,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.

diff --git a/docs/covidcastR/reference/summary.covidcast_signal.html b/docs/covidcastR/reference/summary.covidcast_signal.html index c148313f..9ea57057 100644 --- a/docs/covidcastR/reference/summary.covidcast_signal.html +++ b/docs/covidcastR/reference/summary.covidcast_signal.html @@ -6,7 +6,7 @@ -Summarize <code>covidcast_signal</code> objects — summary.covidcast_signal • covidcast +Summarize covidcast_signal objects — summary.covidcast_signal • covidcast @@ -39,7 +39,7 @@ - + @@ -143,7 +143,7 @@

Summarize covidcast_signal objects

# S3 method for covidcast_signal
-summary(object, ...)
+summary(object, ...)

Arguments

@@ -175,7 +175,7 @@

Contents

-

Site built with pkgdown 1.5.1.

+

Site built with pkgdown 1.6.1.