A Python tool that processes scanned images to automatically detect and crop multiple photos. Specifically designed for scanned images with white borders (as commonly produced by scanners), where multiple photos were scanned together on a single page. The tool will identify each individual photo and save them as separate files.
- Detects multiple photos in scanned images with white borders
- Automatically crops and saves individual photos
- Supports multi-threading for faster processing
- Configurable via command-line arguments or environment variables
- Supports various image formats (PNG, JPG, JPEG)
-
Make sure you have Poetry installed (Poetry Installation Guide)
-
Clone this repository:
git clone https://github.com/hspedro/crop-scanned-photos.git
cd crop-scanned-photos
- Install dependencies:
poetry install
poetry run python crop.py --help
You can also configure the tool using environment variables:
Available arguments:
--input-folder
: Input folder containing scanned images (default: "raw")--output-folder
: Output folder for cropped images (default: "output_images")--threads
: Number of processing threads (default: 1)--threshold-value
: Threshold value for image processing (default: 240)--threshold-max
: Maximum threshold value (default: 255)--min-contour-width
: Minimum contour width to process (default: 50)--min-contour-height
: Minimum contour height to process (default: 50)--allowed-extensions
: Comma-separated list of allowed file extensions (default: .png,.jpg,.jpeg)
export INPUT_FOLDER="scans"
export OUTPUT_FOLDER="cropped"
export THREADS="4"
export THRESHOLD_VALUE="230"
export THRESHOLD_MAX="255"
export MIN_CONTOUR_WIDTH="100"
export MIN_CONTOUR_HEIGHT="100"
export ALLOWED_EXTENSIONS=".png,.jpg,.jpeg"
poetry run python crop.py
Note: Command-line arguments take precedence over environment variables.
crop-scanned-photos/
├── raw/ # Default input directory for scanned images
├── output_images/ # Default output directory for cropped images
├── examples/ # Default output directory for test images
├── crop.py # Main script
├── create_test_image.py # Script to generate test images
├── pyproject.toml # Poetry configuration
└── README.md # This file
The project includes a script to generate test images that simulate scanned photos with white borders. These test images are useful for development and testing of the cropping functionality.
Use the create_test_image.py
script to generate test images:
poetry run python create_test_image.py -n 4 -o test_4_photos.jpg -f examples
This will create a test image with 4 photos in the examples
directory.
To run the cropping script on the test images, use the following command:
poetry run python crop.py --input-folder examples --output-folder examples/cropped
This will process the test images in the examples
directory and save the cropped images in the examples/cropped
directory.
-n, --num_photos
: Number of photos to include (default: 4)-o, --output
: Output filename (default: test_N_scan.jpg where N is number of photos)-f, --folder
: Output folder path (default: examples)
The generated test images have the following characteristics:
- A4 scan proportions (2000x2800 pixels)
- White background simulating scanner bed
- Colored rectangles representing photos (for easy visual testing)
- Automatic grid layout based on number of photos
- 100px white margins between photos
After running the cropping script, the output directory will contain the following files:
- test_1_scan.jpg (blue region)
- test_2_scan.jpg (red region)
- test_3_scan.jpg (green region)
- test_4_scan.jpg (yellow region)
When processing these test images, the cropping script should:
- Detect the boundaries between the white background and colored regions
- Create separate image files for each detected region
- Remove the white borders around each photo
- Preserve the original aspect ratio of each photo
- Name the output files based on the input filename with sequential numbering
For example, processing test_4_scan.jpg
should produce:
- test_4_scan_1.jpg (blue region)
- test_4_scan_2.jpg (red region)
- test_4_scan_3.jpg (green region)
- test_4_scan_4.jpg (yellow region)
Each output image should contain only the colored rectangle without any white borders.
- Python 3.8 or higher
- OpenCV (installed automatically via Poetry)
MIT License
Copyright (c) 2025 Pedro Soares