Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add percentage valid points in get_stats() #644

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

vschaffn
Copy link
Contributor

@vschaffn vschaffn commented Jan 24, 2025

Resolves GlacioHack/xdem#679.

Description

  • Add valid points and percentage valid points statistic calculation, as the number of finite values divided by the number of values in the unmasked Raster.
  • Add valid points and percentage valid points in the dict() returned by get_stats(), add aliases.
  • Add valid points and percentage valid points in test_stats() method in test_raster.
  • Add valid points no mask and percentage valid points no mask to compute the same stats as above but without masking values.

@vschaffn vschaffn force-pushed the 679-valid_points_stat branch from 2b4958b to 5368572 Compare January 24, 2025 10:18
@rhugonnet
Copy link
Member

Nice addition, and good catch on the redundancy of the RMSE in this case! 😄

Two small remarks:

Last thought: I didn't see any function to calculate the valid points in the changes, maybe it's not there yet! In that case I would recommend np.count_nonzero applied to np.isfinite, instead of ~np.isnan, as the latter considers only NaNs but not +/- infinity that are often unusable in stats (and I've had some misadventures with those being propagated in raster data before!).

@adehecq
Copy link
Member

adehecq commented Jan 29, 2025

Good addition!
Two other thoughts:

  • I believe your implementation does not exclude masked values. Remember that self.data is a masked array, so invalid pixels are masked instead of being set to NaN. One quick way to get total number of unmasked values is through self.data.compressed().size. Maybe it would be good to add 1-2 tests. For example, the "exploradores_aster_dem" example has data gaps.
  • the name of your variable is incorrect. It should not be "percentile" but "percentage".

Regarding @rhugonnet's comment:

It might be useful to some users to know the total point count (NaNs included)? If we report a "total count" and "valid count", then users can derive the percent of valid points themselves by dividing the two.

Not so easy to figure out which ones are the most useful between total count, valid count and fraction of valid pixels. It does not make so much sense to give all 3 as one can derive the 3rd from the 2 others... But I think the percentage of valid pixel is more useful than a count (which is generally gonna be large and not very convenient). Also, total count can be easily derived from the data shape... So I would go for either just the percentage of valid pixels, or percentage + total valid count.

@vschaffn vschaffn force-pushed the 679-valid_points_stat branch from dc529a5 to 4b35217 Compare February 4, 2025 10:48
@vschaffn
Copy link
Contributor Author

vschaffn commented Feb 4, 2025

@rhugonnet the function to calculate the valid point was already there, but I used ~np.isnan, then I have changed for np.isfinite following your feedback.

@adehecq the stats are applied on the unmasked data :

data = data.compressed()

Following your feedback I have changed percentile to percentage, and I added the valid points stat 😃

@vschaffn vschaffn changed the title Add percentile valid points in get_stats() Add percentage valid points in get_stats() Feb 4, 2025
@vschaffn vschaffn force-pushed the 679-valid_points_stat branch 3 times, most recently from 8089811 to d4d21d0 Compare February 4, 2025 15:59
@vschaffn vschaffn force-pushed the 679-valid_points_stat branch from 8216872 to 30c6300 Compare February 11, 2025 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add percent_valid_points statistic to the get_stats() method in the Raster class
3 participants