Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FloodScan - NA/NULL values in stats (mean) - admin 2 #20

Open
zackarno opened this issue Nov 27, 2024 · 4 comments
Open

FloodScan - NA/NULL values in stats (mean) - admin 2 #20

zackarno opened this issue Nov 27, 2024 · 4 comments
Assignees

Comments

@zackarno
Copy link

There appears to be NA/NULL values in the zonal stats

here is the SQL query to see them:

SELECT "iso3", "pcode", "valid_date", "adm_level", "band", "mean"
FROM "floodscan"
WHERE ("adm_level" = 2.0) AND ("band" = 'SFED') AND (("mean" IS NULL))

They are all from MOZ and NGA. A lot of occurences over the dates:

iso3 num_occurences
MOZ 9813
NGA 127582

but only 14 pcodes in total with this issue on mean i assume it's the same for other stats except count and sum

iso3 num_pcodes
MOZ 1
NGA 13
@hannahker
Copy link
Collaborator

@zackarno I'll take a look into this! On first look though, it's not unexpected to see some NULL values (especially at the adm2 level). This happens in cases where the admin polygon is too small to have any pixel centroids contained within it.

@zackarno
Copy link
Author

zackarno commented Nov 27, 2024

yea that makes sense. So we'd either need to adjust the raster stats method or develop guidance on how we should deal with this in downstream analysis.

For Floodscan use-case we are publishing datasets at admin 2 level so it seems not ideal to have to exclude admins from the datasets.

@hannahker
Copy link
Collaborator

So we'd either need to adjust the raster stats method or develop guidance on how we should deal with this in downstream analysis.

Yeah @zackarno I think our options would be to:

  1. Switch to weighted calculation method (a la exactextract)
  2. Publish with the disclaimer that some admins will have NA values. IMO, we'd say something like:

"Note that some administrative boundaries may not have summary statistics available. This happens when administrative polygons are sufficiently small relative to the size of the input raster dataset. In this case, we'd recommend performing your analysis across a larger spatial scale. For example, if you find values missing for a particular Admin 2 boundary, you may want to instead consider performing your analysis at the Admin 1 level."

I think we should do 1. eventually, but not prioritize at the moment and for now go with 2.

@zackarno
Copy link
Author

yeah perhaps exactextract can be used in a future iteration/version. I think what you wrote sounds pretty good, but lets leave this open until a decision is made.

There are some additional complexities coming to mind and one is the fact that we will need to use both NA and Inf values in the outputs for different reasons. For example if all values in historical record are 0 or there is 0 variance we need to use something like NA, but we also will have an RP threshold above which values will be Inf..... still trying to think of the best way to do this all given that we want the users with excel-only skill to be easily able to work with the data and this column specifically in a quantitative way (i.e we can't mix in strings etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@zackarno @hannahker @isatotun and others