correct `np.inf` values & dealing w/ `nan` #9

zackarno · 2024-12-20T15:44:31Z

Infinity values

I noticed too many blank values in the excel output on the jira stage so I decided it was better to more explicitly set the upper range as np.inf in the empirical rp interpolation function
This seems to take care of it as noted in the update to exploration/add_return_periods.py

nan values

This relates to FloodScan - NA/NULL values in stats (mean) - admin 2 ds-raster-stats#20 & FloodScan - NA/NULL values in stats (mean) - admin 1 ds-raster-stats#28 , but moving those issues here as we need to discuss and deal with it specifically for this project
I've shown how the problem will affect this output in notebook exploration/07_informing_user_of_NA.py, but i think it would be good to discuss options for presenting this information. @hannahker wrote a nice succinct disclaimer that we can use/modify with any of the options here

Options:

We exclude them from the tabular excel data set and include a metadata table that contains the admin missing admins either in the readme or in another tab. Then we need to put some sort of disclaimer like hannahs.
We keep the admins in with blank SFED values and then include a disclaimer

I like option 2 as it avoids both the need to update a table in the in readme as well as placing an outsized emphasis on this relatively smally issue. I think it will also make the ultimate disclaimer less wordy.

@isatotun - i think the implementation of option 2 is also pretty straight forward to implement in the notebook and have added one implementation of it to the bottom of exploration/07_informing_user_of_NA.py

Also l latest version of sample readme is on the Jira ticket

…an and notebook to discuss nan sfed

zackarno · 2024-12-20T16:16:05Z

ugh one annoying complication that affects this issue as well is that currently the Inf values (when RP>10) in the staged excel file (will share link on slack) are recognized as text, converting the whole column to character/string.

So now if we have blank (NA) values representing rasters w/ no zonal stats how do we represent RP values >10 in a way that would preserve them as numeric? We could either a. ) not include the no zonal stat admins and list them in an additional table tab (option 1 above) and then use blanks for those, or b.) if no infinity class just resign to using a string/character value, but instead make it more explicit ">10" and add that to the readme - leaving it to the analyst to parse -- fairly straight forward to add create a conditional numeric column next to it or set as Inf in python or R

zackarno · 2024-12-20T19:37:42Z

ok chatted w/ Hannah and Tristan on slack, in summary:

Best to go w/ option 2 above
Since no infinity class in excel (for >10 RP values) and since w/ option 2 will have NaN values (converted to blank in excel writing) we decided best to reclassify RPs to categorical and simply include a >10 RP class. With this approach blanks will only indicate missing admins and we don't have to list all of them in a table in the readme.
For reclassification we went back and fourth one equal interval by 0.5 (i.e 1-1.5, 1.5-2,...9.5-10) and more geometric breaks. We ultimately decided on a more geometric breaks of 1-1.5,1.5-2,2-3,3-4,4-5,5-7,7-10,>10. Nonetheless, it would be easy to modify breaks as needed in future
We decided to place the information of left/right exclusive vs non-inclusive in readme and follow (lower-upper]

I implemented what was discussed in 559b469

@t-downing - i think you can go ahead an review code.

@isatotun - good to be aware of this reclassification step and admin filling.

t-downing

The classification (in reclassify_rp ()) looks good!

But, in reviewing it, I think I uncovered an issue with the empirical return period calculation... Which is of course ironic because I think it was adapted from a function that I wrote in the first place. The issue is that empirical_rp() still returns RANK values (and therefore RP values) when the value values are tied. In the extreme, if all the value values are 0 (i.e. there has never been flooding in the admin), the RANK is set to 13.5 for all rows (and RP is set to 2.0). Then, when you interpolate with a value of 0 for the current period, it will return 2.0 as well. So you have have a return period of 2 years for a value of 0. This happens in plenty of admins (and less extreme versions happen in many more, where there are several years where the maximum flood extent was 0.0).

Is this the behaviour we want? I feel like we should always return the minimum RP value (i.e. 1) when the value is 0.

zackarno · 2025-01-21T00:29:50Z

nice catch - i pulled this into #10 and will comment from there. This func to reclassify_rp() should continue to work after we properly deal w/ rank ties in the RP calcs themseleves

zackarno added 2 commits December 20, 2024 10:06

add np.inf as upper range to interpolation function rathert than np.n…

8293d9f

…an and notebook to discuss nan sfed

add example of option 2 implementation

3d3de38

zackarno requested a review from t-downing December 20, 2024 15:44

add RP classification step

559b469

t-downing reviewed Jan 20, 2025

View reviewed changes

zackarno mentioned this pull request Jan 21, 2025

rank ties & RP value #10

Closed

zackarno merged commit 92ce970 into main Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

correct `np.inf` values & dealing w/ `nan` #9

correct `np.inf` values & dealing w/ `nan` #9

zackarno commented Dec 20, 2024 •

edited

Loading

zackarno commented Dec 20, 2024 •

edited

Loading

zackarno commented Dec 20, 2024

t-downing left a comment

zackarno commented Jan 21, 2025

correct np.inf values & dealing w/ nan #9

correct np.inf values & dealing w/ nan #9

Conversation

zackarno commented Dec 20, 2024 • edited Loading

Infinity values

nan values

Options:

zackarno commented Dec 20, 2024 • edited Loading

zackarno commented Dec 20, 2024

t-downing left a comment

Choose a reason for hiding this comment

zackarno commented Jan 21, 2025

correct `np.inf` values & dealing w/ `nan` #9

correct `np.inf` values & dealing w/ `nan` #9

zackarno commented Dec 20, 2024 •

edited

Loading

zackarno commented Dec 20, 2024 •

edited

Loading