You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Groundhog currently every layer has self.params: list of parameters its output depends on. As Jan thoughtfully pointed out about a month ago, it is not necessary since they can all be all retrieved by traversing computation graph. Then self.params_grad_scale elements should be attached to the parameters, which could be probably done by subclassing shared variable class (the problem is not quite clear what to subclass...).
The text was updated successfully, but these errors were encountered:
All Theano expressions (shared variables and regular expressions) have a tag attribute to which we can add such information.
Another option I am testing right now is to decouple the computation, from optimization tricks (gradient scaling) and regularization (weight decay, column norms). Since parameters have unique and often meaningful names, it is easy to write regexps or something similar to set rules such as: all weights of layer X are decayed by...
In general I like the idea. However, a question arises where should all this information (gradient scaling constants, weight decay constants, etc.) be stored. I still think that layers are good candidates for that.
We could do it like that:
every annotated variable keeps a reference to the layer whose output it is, like theano variables refer to Apply nodes
we provide the user with simple recursive function that scans the computation graphs and returns all layers used in it
user can do whatever he wants to select layers and apply modification to them
In Groundhog currently every layer has self.params: list of parameters its output depends on. As Jan thoughtfully pointed out about a month ago, it is not necessary since they can all be all retrieved by traversing computation graph. Then self.params_grad_scale elements should be attached to the parameters, which could be probably done by subclassing shared variable class (the problem is not quite clear what to subclass...).
The text was updated successfully, but these errors were encountered: