Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correct np.inf values & dealing w/ nan #9

Merged
merged 3 commits into from
Jan 21, 2025
Merged

correct np.inf values & dealing w/ nan #9

merged 3 commits into from
Jan 21, 2025

Conversation

zackarno
Copy link
Collaborator

@zackarno zackarno commented Dec 20, 2024

Infinity values

  • I noticed too many blank values in the excel output on the jira stage so I decided it was better to more explicitly set the upper range as np.inf in the empirical rp interpolation function
  • This seems to take care of it as noted in the update to exploration/add_return_periods.py

nan values

@t-downing @hannahker

Options:

  1. We exclude them from the tabular excel data set and include a metadata table that contains the admin missing admins either in the readme or in another tab. Then we need to put some sort of disclaimer like hannahs.
  2. We keep the admins in with blank SFED values and then include a disclaimer

I like option 2 as it avoids both the need to update a table in the in readme as well as placing an outsized emphasis on this relatively smally issue. I think it will also make the ultimate disclaimer less wordy.

@isatotun - i think the implementation of option 2 is also pretty straight forward to implement in the notebook and have added one implementation of it to the bottom of exploration/07_informing_user_of_NA.py

Also l latest version of sample readme is on the Jira ticket

@zackarno zackarno requested a review from t-downing December 20, 2024 15:44
@zackarno
Copy link
Collaborator Author

zackarno commented Dec 20, 2024

ugh one annoying complication that affects this issue as well is that currently the Inf values (when RP>10) in the staged excel file (will share link on slack) are recognized as text, converting the whole column to character/string.

So now if we have blank (NA) values representing rasters w/ no zonal stats how do we represent RP values >10 in a way that would preserve them as numeric? We could either a. ) not include the no zonal stat admins and list them in an additional table tab (option 1 above) and then use blanks for those, or b.) if no infinity class just resign to using a string/character value, but instead make it more explicit ">10" and add that to the readme - leaving it to the analyst to parse -- fairly straight forward to add create a conditional numeric column next to it or set as Inf in python or R

@zackarno
Copy link
Collaborator Author

ok chatted w/ Hannah and Tristan on slack, in summary:

  • Best to go w/ option 2 above
  • Since no infinity class in excel (for >10 RP values) and since w/ option 2 will have NaN values (converted to blank in excel writing) we decided best to reclassify RPs to categorical and simply include a >10 RP class. With this approach blanks will only indicate missing admins and we don't have to list all of them in a table in the readme.
  • For reclassification we went back and fourth one equal interval by 0.5 (i.e 1-1.5, 1.5-2,...9.5-10) and more geometric breaks. We ultimately decided on a more geometric breaks of 1-1.5,1.5-2,2-3,3-4,4-5,5-7,7-10,>10. Nonetheless, it would be easy to modify breaks as needed in future
  • We decided to place the information of left/right exclusive vs non-inclusive in readme and follow (lower-upper]

I implemented what was discussed in 559b469

@t-downing - i think you can go ahead an review code.

@isatotun - good to be aware of this reclassification step and admin filling.

Copy link
Collaborator

@t-downing t-downing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classification (in reclassify_rp ()) looks good!

But, in reviewing it, I think I uncovered an issue with the empirical return period calculation... Which is of course ironic because I think it was adapted from a function that I wrote in the first place. The issue is that empirical_rp() still returns RANK values (and therefore RP values) when the value values are tied. In the extreme, if all the value values are 0 (i.e. there has never been flooding in the admin), the RANK is set to 13.5 for all rows (and RP is set to 2.0). Then, when you interpolate with a value of 0 for the current period, it will return 2.0 as well. So you have have a return period of 2 years for a value of 0. This happens in plenty of admins (and less extreme versions happen in many more, where there are several years where the maximum flood extent was 0.0).

Is this the behaviour we want? I feel like we should always return the minimum RP value (i.e. 1) when the value is 0.

@zackarno zackarno mentioned this pull request Jan 21, 2025
@zackarno
Copy link
Collaborator Author

nice catch - i pulled this into #10 and will comment from there. This func to reclassify_rp() should continue to work after we properly deal w/ rank ties in the RP calcs themseleves

@zackarno zackarno merged commit 92ce970 into main Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants