FloodScan - NA/NULL values in stats (mean) - admin 2 #20

zackarno · 2024-11-27T20:02:20Z

There appears to be NA/NULL values in the zonal stats

here is the SQL query to see them:

SELECT "iso3", "pcode", "valid_date", "adm_level", "band", "mean"
FROM "floodscan"
WHERE ("adm_level" = 2.0) AND ("band" = 'SFED') AND (("mean" IS NULL))

They are all from MOZ and NGA. A lot of occurences over the dates:

iso3	num_occurences
MOZ	9813
NGA	127582

but only 14 pcodes in total with this issue on mean i assume it's the same for other stats except count and sum

iso3	num_pcodes
MOZ	1
NGA	13

The text was updated successfully, but these errors were encountered:

hannahker · 2024-11-27T20:06:36Z

@zackarno I'll take a look into this! On first look though, it's not unexpected to see some NULL values (especially at the adm2 level). This happens in cases where the admin polygon is too small to have any pixel centroids contained within it.

zackarno · 2024-11-27T20:28:32Z

yea that makes sense. So we'd either need to adjust the raster stats method or develop guidance on how we should deal with this in downstream analysis.

For Floodscan use-case we are publishing datasets at admin 2 level so it seems not ideal to have to exclude admins from the datasets.

hannahker · 2024-11-27T23:56:41Z

So we'd either need to adjust the raster stats method or develop guidance on how we should deal with this in downstream analysis.

Yeah @zackarno I think our options would be to:

Switch to weighted calculation method (a la exactextract)
Publish with the disclaimer that some admins will have NA values. IMO, we'd say something like:

"Note that some administrative boundaries may not have summary statistics available. This happens when administrative polygons are sufficiently small relative to the size of the input raster dataset. In this case, we'd recommend performing your analysis across a larger spatial scale. For example, if you find values missing for a particular Admin 2 boundary, you may want to instead consider performing your analysis at the Admin 1 level."

I think we should do 1. eventually, but not prioritize at the moment and for now go with 2.

zackarno · 2024-11-28T19:45:29Z

yeah perhaps exactextract can be used in a future iteration/version. I think what you wrote sounds pretty good, but lets leave this open until a decision is made.

There are some additional complexities coming to mind and one is the fact that we will need to use both NA and Inf values in the outputs for different reasons. For example if all values in historical record are 0 or there is 0 variance we need to use something like NA, but we also will have an RP threshold above which values will be Inf..... still trying to think of the best way to do this all given that we want the users with excel-only skill to be easily able to work with the data and this column specifically in a quantitative way (i.e we can't mix in strings etc)

zackarno assigned hannahker and isatotun Nov 27, 2024

zackarno mentioned this issue Nov 29, 2024

FloodScan - NA/NULL values in stats (mean) - admin 1 #28

Open

zackarno mentioned this issue Dec 20, 2024

correct np.inf values & dealing w/ nan OCHA-DAP/hdx-floodscan#9

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FloodScan - NA/NULL values in stats (mean) - admin 2 #20

FloodScan - NA/NULL values in stats (mean) - admin 2 #20

zackarno commented Nov 27, 2024

hannahker commented Nov 27, 2024

zackarno commented Nov 27, 2024 •

edited

Loading

hannahker commented Nov 27, 2024

zackarno commented Nov 28, 2024

FloodScan - NA/NULL values in stats (mean) - admin 2 #20

FloodScan - NA/NULL values in stats (mean) - admin 2 #20

Comments

zackarno commented Nov 27, 2024

hannahker commented Nov 27, 2024

zackarno commented Nov 27, 2024 • edited Loading

hannahker commented Nov 27, 2024

zackarno commented Nov 28, 2024

zackarno commented Nov 27, 2024 •

edited

Loading