An implementation of Apache's svndumpfilter that solves some common problems.
Latest version of svndumpfilterIN was updated on:
- Python 3.9.18
- Pytest 8.0.0
- Ubuntu 20.04.06 LTS
WARNING Before starting the filter, make sure that the user running it has sufficient permissions to perform svnlook on your target directory.
Usage: svndumpfilter.py [OPTIONS] <input_dump> <SUBCOMMAND> [args]
Example Usage:
sudo python3 svndumpfilter.py input_name.dump -r repo_path -o output_name.dump include directory_name
Runs the svndumpfilter on input_name.dump
from repo_path
to carve out directory_name
and save the result to output_name.dump
.
See also python3 svndumpfilter.py --help
.
The filter relies on svnlook to pull excluded files/directories that are eventually moved into included directories.
- Drops empty revision record when all node records are excluded from revision
Example : You have a revision record, but all the paths are for excluded directories. The result is that the node record will not show up in the final dump file.
- Renumbers revisions based on revisions that were dropped.
Example: There are 5 revisions and 3 revisions are empty because all their node records are for excluded paths. You will have an output dump file with 2 revisions, numbered Revision 1 and Revision 2.
- Scan-only mode where a quick scan of the dump file is done to detect whether untangling repositories will be necessary.
Example of untangling: You have a node record that has a copyfrom-path
that refers to an excluded directory.
You will need to untangle this by retrieving information about the file that you are copying from and add
it to a prior node record.
- Ability to strip
svn:mergeinfo
properties.
Example: You can strip svn:mergeinfo properties. Svnadmin tries to resolve merge info from svn:mergeinfo
properties,
and in case of heavy filtering they are broken because of the dropped revisions. Such dumps cause svnadmin to fail import.
Arguments: -x
, --strip-mergeinfo
.
- Ability to start filtering at any revision.
Example: You can start filtering at revision 100 if you have already loaded the first 100 previously from another dump file.
- Automatically untangles revisions.
Example: Whenever you reference an excluded path from an included node-path, you will automatically have the excluded data loaded in a prior record.
- Path matching is done on more than just the top-level.
Example: You can match to repo/dir1/dir2
which is more than the repo/dir1/
which is as deep as some filters
can match to.
- Added functionality to add dependent directories due to matching at more than the top-level.
Example: If you match at more than a top-level, you will need to add dependents for paths that are more than 1
level deep. For example, if you only include repo/dir1
, you will need to have a node add repo
before the
node record that adds repo/dir1
.
- Paths to include/exclude can now be read from a file.
Example: You can now add --file
to specify a file to read matched paths from.
- Property tags are added to differentiate dump filter generated items.
Example: For the property header, a key, K 23
as svndumpfilter generated
, is appended with a value, V 4
as True
.
- A custom log message can be added when exclusions remove all nodes from a revision making the revision
empty. If this option,
--empty-rev-message
is used, only the date and log properties will remain.
To file issue reports, use the project's issue tracker on GitHub.
When creating an issue, please provide a sample of the dump file that is creating the problem or provide a method to reproduce it.
PRs are also welcome to the problems you encounter.
-
Use python version 3.10.13
-
Set up a virtual environment:
% cd <your_git_workspace> % python3 -m venv venv % source venv/bin/activate
See Virtual Environment and Packages for more details.
-
Use pytest for running unit tests. To set up (once in your virutal environment):
% pip install pytest
See the pytest documentation for more details.
-
Use pycodestyle for adherence to style guidelines. To set up (once in your virutal environment):
% pip install pycodestyle
See the pycodestyle documentation for more details.
5. Use Bandit to locate common secuity issues. To set up
(once in your virtual environment):
% pip install bandit
See the [Bandit documentation](https://bandit.readthedocs.io/en/latest/)
for more details.
-
Check your changes adhere to style guidelines by ensuring the following passes with no complaints:
% pycodestyle *.py test/*.py
-
Add unit test(s) to test_svndumpfilter.py demonstrating the problem and fix in your patch.
-
Ensure all unit tests pass using pytest:
% pytest test
-
Ensure no security issues have been introduced using Bandit:
% bandit --ini tox.ini --exclude ./venv -r .
- The above prior requirements for submitting a PR will also be checked by an equivalent set of GitHub actions
- The most useful documentation on the svndump format can be on the Apache Subversion Server