Skip to content

Latest commit

 

History

History
244 lines (185 loc) · 12.8 KB

pvdaq.md

File metadata and controls

244 lines (185 loc) · 12.8 KB

Photovoltaic Field Array Time-Series (PVDAQ)

Description

The Photovoltaic field array (PVDAQ) data is composed of time-series, raw performance data taken through a variety of sensors connected to a PV array. The data is typically taken at 15 sec averaged resolution, but can vary between systems. NREL source data is typically aggregated into the main database every 24 hours. Data is then processed to the NREL PVDAQ data lake on a monthly basis.

Our researchers utilize the data to monitor the durability of PV systems under a wide variety of conditions. Similar data within NREL archives also provides insites into experimental emerging technology systems. Addtionally, the data has proven useful in assisting in the development of data quality assurance software, and data analysis and machine learning tools.

Data Dictionary

The PVDAQ data is partitioned by system_id, year, month and day. Raw data is reported at 15 minute increments in ISO 8601 date and time. The timestamp is striped and data is averaged daily. An example file output is included here.

Data Tables

  • pvdaq_inverters - metadata about the inverter hardware on the system
  • pvdaq_meters - metadata about the meter hardware on the system
  • pvdaq_metrics - metadata about the sensor values captured as part of the PV time-series
  • pvdaq_mount - mounting configuration of the array or subsets of the array
  • pvdaq_other_instruments - metadata about other ancillary equipment fielded on the system
  • pvdaq_site - geo location details of a PV array
  • pvdaq_system - basic details about a PV array
  • pvdaq_pvdata - PV time series data.

Table Schemas

pvdaq_inverters

  • inverter_id (string) - database primary key
  • name (string) - alias given to the inverter by the array owner or autogenerated
  • manufacturer (string)
  • model (string)
  • serial_num (string)
  • num_strings (string)- how many strings are tied to the inverter
  • modules_per_string (string)- how many modules are tied to each string
  • type (string)- indicates type of inverter such as micro, string, central, etc.
  • quantity (string)- number of inverters fielded at the array site
  • time_interval (string)- Is the data left(L), center(C), or right(R) aligned during the acquisition interval
  • site_id (string) - associated site
  • system_id (bigint) - associated system
  • comments (string)- any additional details

pvdaq_meters

  • meter_id (string)- primary key of the meter
  • name (string) - alias given to the meter by the array owner or autogenerated
  • manufacturer (string)
  • model (string)
  • serial_num (string)
  • time_interval (string)- Is the data left(L), center(C), or right(R) aligned during the acquisition interval
  • type (string)- is the type of the meter production, site or revenue
  • site_id (string) - associated site
  • system_id (bigint) - associated system
  • comments (string)- any additional details

pvdaq_metrics

  • system_id (int) - associated system for the metric
  • metric_id (int) - primary key of the metric
  • sensor_name (string) - referenced name produced by the instrumentation or tagged by array owner
  • common_name (string) - a general grouping of sensor types (e.g. DC voltage, AC energy, POA irradiance)
  • raw_units (string) - raw unscaled or uncalibrated units of the values produced by the sensor
  • units (string) - units of the values produced by the sensor. Could be modified raw_units by calc_scale and calc_offset.
  • calc_scale (double) - scaling for adjusting the sensor values (default 1)
  • calc_offset (double) - offset for adjusting the sensor values (default 0)
  • calc_details (string) - mathematical equation used to calculate the sensor value, if needed.
  • aggregation_type (string) - avg, min, max, sample, union, median, or calculated
  • source_type (string) - What is generating the sensor value (Inverters, meters or other instruments). Can be NULL
  • source_id (int) - The assicated primary key of the senor type generating the value. Can be NULL
  • comments (string) - any additional details
  • standard_name (string)- a unique autogenerated name based on either the primary key and sensor_name or a combination of common_name, sensor_type, and sensor_id

pvdaq_modules

  • module_id (string)- the module primary key
  • name (string) - alias given to the module by the array owner or autogenerated
  • inverter_id (string)- the associate inverter primary key tied to this module, if known.
  • manufacturer (string)
  • model (string)
  • serial_num (string)
  • type (string)- what is the technology of the module: CdTe, Crystalline Si, multicrystalline Si, etc.
  • quantity (string) - number of modules installed on system
  • reference_module (string)- is this a reference module
  • start_on (string) - date module was installed
  • end_on (string) - date module was removed
  • site_id (string) - associated site
  • system_id (bigint) - associated system
  • comments (string)- any additional details

pvdaq_mount

  • mount_id (bigint) - the primary key for the mount
  • name (string) - alias given to the mount by the array owner or autogenerated
  • manufacturer (string)
  • model (string)
  • azimuth (string)- pointing of the mount in compass direction decimal degrees. 0 degrees = north, 90 degrees = east
  • tilt (string) - angle of mount pointing in degrees
  • tracking (string)- is the mount tracking or fixed
  • type (string)- configuration of the mount: ground, roof, canopy, etc.
  • site_id (string) - associated site
  • system_id (bigint) - associated system
  • comments (string)- any additional details

pvdaq_other_instruments

  • instrument_id (string) - the primary key of the instrument
  • name (string) - alias given to the other instrument by the array owner or autogenerated
  • manufacturer (string)
  • model (string)
  • serial_num (string)
  • time_interval (string)- Is the data left(L), center(C), or right(R) aligned during the acquisition interval
  • type (string) - identifies what the instrument is: ref cell, weather station, thermocouple, etc.
  • site_id (string) - associated site
  • system_id (bigint) - associated system
  • comments (string)- any additional details

pvdaq_site

  • site_id (string) - primary key of the site
  • system_id (bigint) - associated system
  • public_name (string) - unique given name to the site
  • location (string) - text descriptive name of site location. Could include street address type details
  • latitude (string) - decimal latitude geo location
  • longitude (string) - decimal longitude geo location
  • elevation (string) - distance in meters above sea level, if known
  • av_pressure (string) - average annual atmospheric pressure at site in psi
  • av_temp (string)- average ambient temperature in degrees Celsius at site
  • climate_type (string) - The Koppen-Geiger classifier for the site location

pvdaq_system

  • system_id (bigint) - primary key of the system
  • site_id (bigint)- associated site representing geolocation details for system
  • public_name (string)- unique name given to the array
  • area (string)- covered area of the array in square meters
  • power (string)- maximum calculated or nameplate DC power of the array in kW
  • started_on (string)- date system became active
  • ended_on (string) - day system was deactivated
  • comments (string) - any additional details

pvdaq_pvdata

  • system_id (string) (Partitioned) - associated system for the data
  • measured_on (timestamp) - local timestamp as generated by the instrumentation. Could include DST.
  • utc_measured_on (timestamp) - calculated UTC timestamp from the measured_on value. Could include DST.
  • metric_id (int) - associated metric_id for the data
  • value (double) - value of the data. Join to metric_id table record for units or other details.

Note: not every site or system_id will contain data for each attribute included in the data dictionary.

Data Format

The PVDAQ Dataset is made available in Parquet format on AWS and is partitioned by year, month, day in AWS Glue and Athena. The schema may change across dataset years on S3.

Partition Keys of pvdaq_pvdata table,

  • year (string) (Partitioned)
  • month (string) (Partitioned)
  • day (string) (Partitioned)

S3 Paths

  • s3://oedi-data-lake/pvdaq/inverters/*.parquet
  • s3://oedi-data-lake/pvdaq/meters/*.parquet
  • s3://oedi-data-lake/pvdaq/metrics/*.parquet
  • s3://oedi-data-lake/pvdaq/mount/*.parquet
  • s3://oedi-data-lake/pvdaq/other_instruments/*.parquet
  • s3://oedi-data-lake/pvdaq/site/*.parquet
  • s3://oedi-data-lake/pvdaq/system/*.parquet
  • s3://oedi-data-lake/pvdata/system_id=/year=/month=/day=/*.parquet
  • s3://oedi-data-lake/pvdaq/

Model, Methods, and Analysis Tools

Rd Tools

RdTools is an open-source library to support reproducible technical analysis of time series data from photovoltaic energy systems, particularly degredation effects.
Rd Tools

PV Lib

A toolbox provides a set of well-documented functions for simulating the performance of photovoltaic energy systems.
pv_lib-toolbox

PVAnalytics

PVAnalytics is a python library that supports analytics for PV systems. It provides functions for quality control, filtering, and feature labeling and other tools supporting the analysis of PV system-level data. [PV_Analytics[(https://github.com/pvlib/pvanalytics)

Other Data Sources

DuraMAT

A multi-institution consortium focused on discovery, development, de-risking, and enabling the commercialization of new materials and designs for PV modules.
Main Site

Addtional Resources

https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html

https://www.nrel.gov/docs/fy17osti/69131.pdf

Python Connection Examples

Athena data connection using PyAthena:

import pandas as pd
from pyathena import connect

conn = connect(
    s3_staging_dir='s3://<user-defined>/<>', ##user defined staging directory
    region_name='us-west-2',
    work_group='<USER SPECIFIC WORKGROUP>'  ##specify workgroup if exists
)

Example #1: Querying with a limit:

df = pd.read_sql("SELECT * FROM oedi.<> limit 8;", conn)

For jupyter notebook example see our notebook which includes partitions and data dictionary: examples repository

Disclaimer and Attribution

Copyright (c) 2020, Alliance for Sustainable Energy LLC, All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.