Initial Commit.

princeton-vl · Sep 15, 2021 · 5c13878 · 5c13878
commit 5c13878
Show file tree

Hide file tree

Showing 24 changed files with 2,682 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2021 Princeton Vision & Learning Lab
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/RAFTStereo.png b/RAFTStereo.png
diff --git a/README.md b/README.md
@@ -0,0 +1,113 @@
+# RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching
+This repository contains the source code for our paper:
+
+[RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching](https://www.google.com)<br/>
+Lahav Lipson, Zachary Teed and Jia Deng<br/>
+
+<img src="RAFTStereo.png">
+
+## Requirements
+The code has been tested with PyTorch 1.7 and Cuda 10.2.
+```Shell
+conda env create -f environment.yaml
+conda activate raftstereo
+```
+
+
+
+
+## Required Data
+To evaluate/train RAFT, you will need to download the required datasets. 
+* [Sceneflow](https://lmb.informatik.uni-freiburg.de/resources/datasets/SceneFlowDatasets.en.html#:~:text=on%20Academic%20Torrents-,FlyingThings3D,-Driving) (Includes FlyingThings3D, Driving & Monkaa
+* [Middlebury](https://vision.middlebury.edu/stereo/data/)
+* [ETH3D](https://www.eth3d.net/datasets#low-res-two-view-test-data)
+* [KITTI](http://www.cvlibs.net/datasets/kitti/eval_scene_flow.php?benchmark=stereo)
+
+To download the ETH3D and Middlebury test datasets for the [demos](#demos), run 
+```Shell
+chmod ug+x download_datasets.sh && ./download_datasets.sh
+```
+
+By default `stereo_datasets.py` will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the `datasets` folder
+
+```Shell
+├── datasets
+    ├── FlyingThings3D
+        ├── frames_cleanpass
+        ├── frames_finalpass
+        ├── disparity
+    ├── Monkaa
+        ├── frames_cleanpass
+        ├── frames_finalpass
+        ├── disparity
+    ├── Driving
+        ├── frames_cleanpass
+        ├── frames_finalpass
+        ├── disparity
+    ├── KITTI
+        ├── testing
+        ├── training
+        ├── devkit
+    ├── Middlebury
+        ├── MiddEval3
+    ├── ETH3D
+        ├── lakeside_1l
+        ├── ...
+        ├── tunnel_3s
+```
+
+## Demos
+Pretrained models can be downloaded by running
+```Shell
+chmod ug+x download_models.sh && ./download_models.sh
+```
+or downloaded from [google drive](https://drive.google.com/drive/folders/1booUFYEXmsdombVuglatP0nZXb5qI89J)
+
+You can demo a trained model on pairs of images. To predict stereo for Middlebury, run
+```Shell
+python demo.py --restore_ckpt models/raftstereo-sceneflow.pth
+```
+Or for ETH3D:
+```Shell
+python demo.py --restore_ckpt models/raftstereo-eth3d.pth -l=datasets/ETH3D/*/im0.png -r=datasets/ETH3D/*/im1.png
+```
+Using our fastest model:
+```Shell
+python demo.py --restore_ckpt models/raftstereo-realtime.pth  --shared_backbone --n_downsample 3 --n_gru_layers 2 --slow_fast_gru 
+```
+
+To save the disparity values as `.npy` files, run any of the demos with the `--save_numpy` flag. 
+
+## Converting Disparity to Depth 
+
+If the camera focal length and camera baseline are known, disparity predictions can be converted to depth values using
+
+<img src="depth_eq.png" width="320">
+
+Note that the units of the focal length are _pixels_ not millimeters.
+
+## Evaluation
+
+To evaluate a trained model on a validation set (e.g. Middlebury), run
+```Shell
+python evaluate_stereo.py --restore_ckpt models/raftstereo-middlebury.pth --dataset middlebury_H
+```
+
+## Training
+
+Our model is trained on two RTX-6000 GPUs using the following command. Training logs will be written to `runs/` which can be visualized using tensorboard.
+
+```Shell
+python train_stereo.py --batch_size 8 --train_iters 22 --valid_iters 32 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2 --num_steps 200000 --mixed_precision
+```
+To train using significantly less memory, change `--n_downsample 2` to `--n_downsample 3`. This will slightly reduce accuracy.
+
+## (Optional) Faster Implementation
+
+We provide a faster CUDA implementation of the correlation volume which works with mixed precision feature maps.
+```Shell
+cd sampler && python setup.py install && cd ..
+```
+Running demo.py, train_stereo.py or evaluate.py with `--corr_implementation reg_cuda` together with `--mixed_precision` will speed up the model without impacting performance.
+
+To significantly decrease memory consumption on high resolution images, use `--corr_implementation alt`. This implementation is slower than the default, however.
diff --git a/core/__init__.py b/core/__init__.py
diff --git a/core/corr.py b/core/corr.py
@@ -0,0 +1,188 @@
+import torch
+import torch.nn.functional as F
+from core.utils.utils import bilinear_sampler
+
+try:
+    import corr_sampler
+except:
+    pass
+
+try:
+    import alt_cuda_corr
+except:
+    # alt_cuda_corr is not compiled
+    pass
+
+
+class CorrSampler(torch.autograd.Function):
+    @staticmethod
+    def forward(ctx, volume, coords, radius):
+        ctx.save_for_backward(volume,coords)
+        ctx.radius = radius
+        corr, = corr_sampler.forward(volume, coords, radius)
+        return corr
+    @staticmethod
+    def backward(ctx, grad_output):
+        volume, coords = ctx.saved_tensors
+        grad_output = grad_output.contiguous()
+        grad_volume, = corr_sampler.backward(volume, coords, grad_output, ctx.radius)
+        return grad_volume, None, None
+
+class CorrBlockFast1D:
+    def __init__(self, fmap1, fmap2, num_levels=4, radius=4):
+        self.num_levels = num_levels
+        self.radius = radius
+        self.corr_pyramid = []
+        # all pairs correlation
+        corr = CorrBlockFast1D.corr(fmap1, fmap2)
+        batch, h1, w1, dim, w2 = corr.shape
+        corr = corr.reshape(batch*h1*w1, dim, 1, w2)
+        for i in range(self.num_levels):
+            self.corr_pyramid.append(corr.view(batch, h1, w1, -1, w2//2**i))
+            corr = F.avg_pool2d(corr, [1,2], stride=[1,2])
+
+    def __call__(self, coords):
+        out_pyramid = []
+        bz, _, ht, wd = coords.shape
+        coords = coords[:, [0]]
+        for i in range(self.num_levels):
+            corr = CorrSampler.apply(self.corr_pyramid[i].squeeze(3), coords/2**i, self.radius)
+            out_pyramid.append(corr.view(bz, -1, ht, wd))
+        return torch.cat(out_pyramid, dim=1)
+
+    @staticmethod
+    def corr(fmap1, fmap2):
+        B, D, H, W1 = fmap1.shape
+        _, _, _, W2 = fmap2.shape
+        fmap1 = fmap1.view(B, D, H, W1)
+        fmap2 = fmap2.view(B, D, H, W2)
+        corr = torch.einsum('aijk,aijh->ajkh', fmap1, fmap2)
+        corr = corr.reshape(B, H, W1, 1, W2).contiguous()
+        return corr / torch.sqrt(torch.tensor(D).float())
+
+
+class PytorchAlternateCorrBlock1D:
+    def __init__(self, fmap1, fmap2, num_levels=4, radius=4):
+        self.num_levels = num_levels
+        self.radius = radius
+        self.corr_pyramid = []
+        self.fmap1 = fmap1
+        self.fmap2 = fmap2
+
+    def corr(self, fmap1, fmap2, coords):
+        B, D, H, W = fmap2.shape
+        # map grid coordinates to [-1,1]
+        xgrid, ygrid = coords.split([1,1], dim=-1)
+        xgrid = 2*xgrid/(W-1) - 1
+        ygrid = 2*ygrid/(H-1) - 1
+
+        grid = torch.cat([xgrid, ygrid], dim=-1)
+        output_corr = []
+        for grid_slice in grid.unbind(3):
+            fmapw_mini = F.grid_sample(fmap2, grid_slice, align_corners=True)
+            corr = torch.sum(fmapw_mini * fmap1, dim=1)
+            output_corr.append(corr)
+        corr = torch.stack(output_corr, dim=1).permute(0,2,3,1)
+
+        return corr / torch.sqrt(torch.tensor(D).float())
+
+    def __call__(self, coords):
+        r = self.radius
+        coords = coords.permute(0, 2, 3, 1)
+        batch, h1, w1, _ = coords.shape
+        fmap1 = self.fmap1
+        fmap2 = self.fmap2
+        out_pyramid = []
+        for i in range(self.num_levels):
+            dx = torch.zeros(1)
+            dy = torch.linspace(-r, r, 2*r+1)
+            delta = torch.stack(torch.meshgrid(dy, dx), axis=-1).to(coords.device)
+            centroid_lvl = coords.reshape(batch, h1, w1, 1, 2).clone()
+            centroid_lvl[...,0] = centroid_lvl[...,0] / 2**i
+            coords_lvl = centroid_lvl + delta.view(-1, 2)
+            corr = self.corr(fmap1, fmap2, coords_lvl)
+            fmap2 = F.avg_pool2d(fmap2, [1, 2], stride=[1, 2])
+            out_pyramid.append(corr)
+        out = torch.cat(out_pyramid, dim=-1)
+        return out.permute(0, 3, 1, 2).contiguous().float()
+
+
+class CorrBlock1D:
+    def __init__(self, fmap1, fmap2, num_levels=4, radius=4):
+        self.num_levels = num_levels
+        self.radius = radius
+        self.corr_pyramid = []
+
+        # all pairs correlation
+        corr = CorrBlock1D.corr(fmap1, fmap2)
+
+        batch, h1, w1, dim, w2 = corr.shape
+        corr = corr.reshape(batch*h1*w1, dim, 1, w2)
+
+        self.corr_pyramid.append(corr)
+        for i in range(self.num_levels):
+            corr = F.avg_pool2d(corr, [1,2], stride=[1,2])
+            self.corr_pyramid.append(corr)
+
+    def __call__(self, coords):
+        r = self.radius
+        coords = coords[:, :1].permute(0, 2, 3, 1)
+        batch, h1, w1, _ = coords.shape
+
+        out_pyramid = []
+        for i in range(self.num_levels):
+            corr = self.corr_pyramid[i]
+            dx = torch.linspace(-r, r, 2*r+1)
+            dx = dx.view(1, 1, 2*r+1, 1).to(coords.device)
+            x0 = dx + coords.reshape(batch*h1*w1, 1, 1, 1) / 2**i
+            y0 = torch.zeros_like(x0)
+
+            coords_lvl = torch.cat([x0,y0], dim=-1)
+            corr = bilinear_sampler(corr, coords_lvl)
+            corr = corr.view(batch, h1, w1, -1)
+            out_pyramid.append(corr)
+
+        out = torch.cat(out_pyramid, dim=-1)
+        return out.permute(0, 3, 1, 2).contiguous().float()
+
+    @staticmethod
+    def corr(fmap1, fmap2):
+        B, D, H, W1 = fmap1.shape
+        _, _, _, W2 = fmap2.shape
+        fmap1 = fmap1.view(B, D, H, W1)
+        fmap2 = fmap2.view(B, D, H, W2)
+        corr = torch.einsum('aijk,aijh->ajkh', fmap1, fmap2)
+        corr = corr.reshape(B, H, W1, 1, W2).contiguous()
+        return corr / torch.sqrt(torch.tensor(D).float())
+
+
+class AlternateCorrBlock:
+    def __init__(self, fmap1, fmap2, num_levels=4, radius=4):
+        raise NotImplementedError
+        self.num_levels = num_levels
+        self.radius = radius
+
+        self.pyramid = [(fmap1, fmap2)]
+        for i in range(self.num_levels):
+            fmap1 = F.avg_pool2d(fmap1, 2, stride=2)
+            fmap2 = F.avg_pool2d(fmap2, 2, stride=2)
+            self.pyramid.append((fmap1, fmap2))
+
+    def __call__(self, coords):
+        coords = coords.permute(0, 2, 3, 1)
+        B, H, W, _ = coords.shape
+        dim = self.pyramid[0][0].shape[1]
+
+        corr_list = []
+        for i in range(self.num_levels):
+            r = self.radius
+            fmap1_i = self.pyramid[0][0].permute(0, 2, 3, 1).contiguous()
+            fmap2_i = self.pyramid[i][1].permute(0, 2, 3, 1).contiguous()
+
+            coords_i = (coords / 2**i).reshape(B, 1, H, W, 2).contiguous()
+            corr, = alt_cuda_corr.forward(fmap1_i, fmap2_i, coords_i, r)
+            corr_list.append(corr.squeeze(1))
+
+        corr = torch.stack(corr_list, dim=1)
+        corr = corr.reshape(B, -1, H, W)
+        return corr / torch.sqrt(torch.tensor(dim).float())