Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor remaining functions from Issue #11 #31

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

annalea-albright
Copy link
Collaborator

Added remaining functions from Issue #11

To do's

  • Improve definitions for inversion base height and add @Geet-George's definitions

  • I added inversion_layer_height.py to a folder 'inversion_layers'. Adapt to have similar structure as boundary_layer code, i.e. from eurec4a_environment.variables import boundary_layer and da_rh_peak = boundary_layer.mixed_layer_height.calc_peak_RH(ds=ds) --> inversion analog from eurec4a_environment.variables import inversion_layers and da_ss = inversion_layers.inversion_layer_height.calc_inversion_static_stability(ds) in test_inversion_layers.py file, but it does not work yet

@leosaffin
Copy link
Collaborator

leosaffin commented Aug 20, 2020

You can fix the tests for inversion_layers by adding an __init__.py to that folder and modifying the imports. Copy the changes here (b916922)

Beyond fixing that I'm not sure how useful my review will be as I wasn't too involved in this part of the code. I'll flag a few things that others might want to look at but you/they may decide to ignore it anyway.

Copy link
Collaborator

@leosaffin leosaffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the tests currently test that the output has the right sort of numbers. Would it be worth testing the output to a higher precision? Then it would flag if any small changes to the answer are caused through other changes or we unintentionally change the testing data. I could raise an issue for this and fix myself if it sounds useful?

Note that function is slow and should be optimized
"""

def calculateHmix_var(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to define this outside of the function in case it could be used elsewhere? If you don't want it as a generally visible function you can rename it with an underscore in front (i.e. _calculateHmix_var)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising an issue for higher precision tests sounds great -- thank you!

Copy link
Collaborator Author

@annalea-albright annalea-albright Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes that sounds good with _calculateHmix_var, will do. Thanks for the tip

da.attrs["units"] = "m"
return da

# sns.distplot(inv_base_height_gradients)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this accidental leftovers or something you want to incorporate later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking we could possibly incorporate this as a plot, but it's not very high priority

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing plotting in a different module sounds like a good idea. You could always add a second pull-request with some plots for this :)

Copy link
Collaborator

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks really great in general @annalea-albright! I just have a few comments, what do you think about them?

return hmix

# call inner function
from eurec4a_environment.variables.calc_density import calc_density
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to move this import to top of module, that way you're sure the import is successful before we attempt to call the function


da_density = calc_density(ds)

hmix_vec = np.zeros(len(ds[var]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this slightly so that it doesn't necessarily assume there is just one other dimension (here that is time). You could do

dims = ds.dims
del dims[altiude]

# and then apply your function that works on one column
def _calc_height(ds_column):
   # add function definition here
   
da_height = ds.stack(dict(n=dims)).groupby("n").apply(_calc_height).unstack("n")

This is a general thing across all the functions we've implemented I think. But just wondering what you think?

Copy link
Collaborator Author

@annalea-albright annalea-albright Aug 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I'm wondering how to do something similar to the for-loop to load individual variable and density profiles?

da_height = ds.stack(dict(n=dims)).groupby("n").apply(_calc_height).unstack("n")

vs.

hmix_vec = np.zeros(len(ds[var]))
    for i in range((len(ds[var]))):
        density_profile = da_density.isel({time: i})
        var_profile = ds[var].isel({time: i})
        hmix_vec[i] = _calc_height(
            density_profile,
            var_profile,
            threshold,
            z_min,
            altitude="height",
            time="sounding",
        )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this looks good. I hadn't really considered this earlier, but most of the calculations we do operate on a single-column at a time, so this is probably a general pattern we want to follow. Where the calculation only works if a single column is used at a time we can use stack(..).groupby(...).apply(...).unstack(...).

Also, where you're defining a nested function (here _calc_height) inside another function you don't need to pass in the variables already available (for example z_min). Actually it's best to avoid doing that in case you accidentally redefine what a variable means within a function (by passing in the wrong thing for for example z_min or setting it to something else in the call).

Copy link
Collaborator

@leifdenby leifdenby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you've done looks good to me 😄 I'll try resolve the merge conflicts for you here on github and you can have a look

Copy link
Collaborator

@JuleRadtke JuleRadtke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great for me too!

@leifdenby
Copy link
Collaborator

Hi @annalea-albright I wanted to return to this and get your work added in ☺️ I've been cleaning the code up a bit, and I've gotten to the mixed-layer height estimation using "linear RH peak" and I'm a bit confused 😕 (you can see what the code for that routine looks like here)

Reading through your code I think the idea is to

  1. make a linear fit to part of the boundary layer which assumed to well-mixed
  2. extrapolate that linear fit and find the points where this linear is closest to peaks identified in the original profile

Is this about right?

I've added some plotting routines to the function that does the calculation for one column, and made this:

linfit_peak_RH

I don't quite understand why a peak in RH should happen to reach a peak value that is close to the linear fit. Isn't it quite possible that the peak will be larger or smaller than a linear extrapolation from the mixed layer? Also, very small wriggles in the RH profile could create peaks lower down too which might fit with the linear fit, which would make this very sensitive to how you define a peak, no? Sorry if I'm missing something obvious about the physics here... :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants