-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of state of the art deep network tools into PetaVision #22
Comments
For more information on the standalone implementation, here is the final writeup and presentation of the project. https://docs.google.com/document/d/1ahbASGYgRrncbBUHdT38h13L7Vy9bdLJJmGaDrKnL_c/edit?usp=sharing https://docs.google.com/presentation/d/1eSMSeFLNq2ul-bIx1BEhgC1vTBurL842IUpwTkzBpUk/edit?usp=sharing |
This is great, Sheng! Would you mind elaborating more on what you mean by the "automatic gradient layer generator"? I know that TensorFlow and Theano both have the capability of auto-differentiating a function for back prop. Is this what you are describing? I am not making the connection for how this solves your problem of implementing back-prop in PetaVision. Of the three options you listed, I actually like the first the best. I think adding an ability for any layer & conn to propagate a signal backwards down the network would be valuable. This one upgrade will make implementing standard ML algorithms easier, as well as novel semi-supervised models that combine LCA dynamics with back-propagated label error signals. Of course the default back-prop functionality would have to allow for all of the current models to run uninhibited, which might be difficult to do. Whatever you chose, make sure you document it well. Let me know if there is anything I can do to help! |
I think it's related. In Tensorflow, you build the feedforward net, and gradients are automatically calculated based on the feedforward net. I assumed it was an equivalent backwards computation for every feedforward computation that automatically gets added in as needed, but it could very well be based on an empirical calculation of a gradient. |
After much thought into how exactly to implement this, I seem to be stuck. While encapsulating backprop into layers and connections will make building deep convolutional nets much easier, such a model does not fit well into the current implementation, which is a simple data delivery model between layers using connections. By encapsulating backprop into special layers and connections, we create a massive dependencies between these layers and connections, which makes it hard for future users to combine the two (for example, backprop a classification error along with a sparse reconstruction error, although plasticCloneConns may still be able to make this possible). The alternative, however, is the current way we're trying to do deep networks in PetaVision, with the complicated set of connections to achieve backprop, which also makes it very hard to utilize cudnn's gradient calculations. From here, I see two options. We can separate the backprop architecture from the core part of PV (and create lots of dependencies between specific backprop layers and connections), or we can try to extend the current implementation for backprop (with very complicated networks that would probably get more complicated to incorporate the GPU). What do you guys think? |
Talking to Will, it seems that a third option is to incorporate backprop functionality into all of PetaVision (where the default option is to not do any backprop). I'm thinking this could actually work. All plastic connections have the option of learning either off of the activity (what is currently being done now), or learning off of the gradient (backprop). This way, the user has the option of building a backprop network using the layers and connections we have now. Furthermore, each layer would incorporate the gradient calculations as well. This would also mean we would take Dylan's backing of using a backwardsUpdateState. I like this option, and unless someone says otherwise or brings up any caveats to this plan, I will start flushing out the details and start implementing. |
This sounds like the best option. Weights can implement a forward learning rule and/or a backward learning rule. The conns receive forward activity and backward gradients. Love it! I can't wait for you to get this done :-D Let me know what I can do to help. |
Hi all,
As many of you know, I have implemented a standalone AlexNet implementation in C++, and I would like to incorporate this functionality into PetaVision. However, there are some fundamental differences between my implementation and the current PetaVision implementation that needs to be addressed. I'm starting this tread to pick your brains for implementation details, as well as an informative thread as to what to expect.
First off, here are the advantages of my implementation over the current attempted AlexNet implementation in PetaVision.
I plan on keeping this part of the code encapsulated in the mlearning auxlib, but I can see how many of these features can be useful in the core toolbox as well (pooling on GPUs, updating weights on GPUs, etc). I'll have to figure out which parts go where.
One major design decision that I have to solve has to do with the encapsulation of gradient calculations into a single layer. As AlexNet and the like depend on a feedforward stage and a backprop stage, encapsulating both in a single layer is tricky. Currently, we control stages via the phase parameter. However, backprop must execute in the opposite order of the feedforward stage. When these 2 computations were separate, a user explicitly set phases to achieve the desired result, which is no longer possible when these computations are combined into a single layer. Here are several possible solutions to this problem.
One final minor thing would be to split up GPU timing info to separate memcpys and computations. I know Nvidia has a timing toolbox that they provide with Cuda, which we might be able to integrate into our current timing implementation.
I'm sure this post is nowhere near a through review of all the problems I will run into, but it's a good start, and a good place for further discussion of new issues that may come up later as well.
Sheng
The text was updated successfully, but these errors were encountered: