From 5862f914dc7635aefdb05d5c5ba5ae8d69b6e218 Mon Sep 17 00:00:00 2001 From: Dinne Kopelevich Date: Fri, 17 Jan 2025 07:46:17 -0700 Subject: [PATCH 1/4] Add bulk_extractor doc Signed-off-by: Dinne Kopelevich --- outbound/GitLeaks vs. Bulk Extractor.md | 127 ++++++++++++++++++++++++ 1 file changed, 127 insertions(+) create mode 100644 outbound/GitLeaks vs. Bulk Extractor.md diff --git a/outbound/GitLeaks vs. Bulk Extractor.md b/outbound/GitLeaks vs. Bulk Extractor.md new file mode 100644 index 0000000..d522d9a --- /dev/null +++ b/outbound/GitLeaks vs. Bulk Extractor.md @@ -0,0 +1,127 @@ +# **Outbound Reviews: Scanning Tools for PII and Secrets** + +**Bulk\_extractor** is a forensic tool used to extract useful information from disk images, files and other digital media. It looks for strings of interest, such as email addresses, URLs, credit card numbers and other artifacts. Unlike traditional forensic tools, **bulk\_extractor** does not mount or parse the file system, making it faster and more suitable for unstructured data. It is commonly used for cybersecurity investigations, data recovery and digital forensics. + +**Gitleaks** is an open-source tool designed to detect and prevent secrets (like API keys, tokens and passwords) from being committed to Git repositories. It scans repositories, including their entire history, for hardcoded secrets and generates reports to help developers address potential security vulnerabilities. **Gitleaks** is specifically tailored to source code repositories, making it highly effective in development environments. + +### **Uses:** + +| Checks for | GitLeaks | Bulk-Extractor | +| :---- | :---- | :---- | +| Emails | | ✅ | +| Credit Cards | | ✅ | +| File fragments (JPEGs, PDF headers) | | ✅ | +| JSON snippets | | ✅ | +| Phone numbers | | ✅ | +| URLs | | ✅ | +| IP addresses | | ✅ | +| DNS queries | | ✅ | +| Base64-encoded data | | ✅ | +| PII | | ✅ | +| API keys | ✅ | | +| SSH keys | ✅ | | +| Hardcoded passwords | ✅ | | +| Tokens | ✅ | | +| Secrets | ✅ | ✅ | + +### **Key Differences:** + +Bulk-extractor: + +* Analyzes raw data and disk images +* Digital forensics and data recovery +* Source: Files, disk images, raw data + +GitLeaks: + +* Scan for Git repo secrets +* Source code and secrets +* Used for repos + +### **Bulk-Extractor Installation for Mac:** + +Ensure you are installing bulk\_extractor in your local drive not within the repository or disk you are checking. + +* \`brew install autoconf\` +* \`brew install automake\` +* \`brew install libtool\` +* \`brew install pkg-config\` + +* \`git clone \--recurse-submodules [https://github.com/simsong/bulk\_extractor.git](https://github.com/simsong/bulk_extractor.git)\` + +* \`cd bulk\_extractor\` + +* \`./bootstrap.sh\` +* \`./configure\` +* \`make\` +* \`make install\` + +### **Bulk-Extractor Installation for Windows** + +* Download the Bulk Extractor Windows binary from the [official website](https://github.com/simsong/bulk_extractor). +* Extract the downloaded zip file to your preferred directory. +* Add the Bulk Extractor binary folder to your system PATH: + * Press Win \+ R, type sysdm.cpl, and hit Enter. + * Navigate to the "Advanced" tab and click on "Environment Variables." + * Under "System Variables," locate and edit the Path variable. + * Add the path to the Bulk Extractor folder and click OK. + + or + +* Start with a clean Virtual Machine (VM) and use these commands: + * $ git clone \--recurse-submodules [https://github.com/simsong/bulk\_extractor.git](https://github.com/simsong/bulk_extractor.git) + * $ cd bulk\_extractor/etc + * $ bash CONFIGURE\_FEDORA36\_win64.bash + * $ cd .. + * $ make win64 + +* Open Command Prompt and test the installation: + * \`bulk\_extractor.exe \-h\` + +Note: Currently bulk\_extractor 2.1 does not build on windows, but bulk\_extractor 2.0 does. + +### **Bulk-Extractor Installation for Linux** + +* Update your package manager: + * \`sudo apt-get update\` +* Install required dependencies + * \`sudo apt-get install \-y build-essential cmake\` +* Clone the Bulk Extractor repository and build the source code: + * \`git clone [https://github.com/simsong/bulk\_extractor.git](https://github.com/simsong/bulk_extractor.git)\` + * \`cd bulk\_extractor\` + * \`mkdir build && cd build\` + * \`cmake ..\` + * \`make\` + * \`sudo make install\` +* Verify installation: + * \`bulk\_extractor \-h\` + +### **How to use:** + +* Create the directory you would like to save the output into. This can be in the repository being checked but the location is up to you. +* Alternatively, if the repository you are checking has a directory already available for storing test results, you can use that. + +You can run the basic: +\`bulk\_extractor \-o \ \\` + +You can also extract only certain files: +Example: +\`bulk\_extractor \-o \ \-E email,input \\` + +Outputs are saved as \`.txt\` files. + +--- + +### **When Not to Use Bulk Extractor:** + +* **Highly Structured File Systems**: When specific file system metadata needs to be preserved. +* **Real-Time Analysis**: It’s not designed for active/live systems. +* **Deep Contextual Analysis**: Bulk Extractor focuses on raw data extraction and lacks contextual file relationships. + +### **Language use:** + +Python \- Great +R \- Meh +C++ \- Great +Java \- Ok +JavaScript \- Ok From a8320dbc054709f8b5fb04a02094923527980669 Mon Sep 17 00:00:00 2001 From: Dinne Kopelevich Date: Fri, 17 Jan 2025 07:47:40 -0700 Subject: [PATCH 2/4] Add bulk_extractor doc Signed-off-by: Dinne Kopelevich --- outbound/{GitLeaks vs. Bulk Extractor.md => bulk_extractor.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename outbound/{GitLeaks vs. Bulk Extractor.md => bulk_extractor.md} (100%) diff --git a/outbound/GitLeaks vs. Bulk Extractor.md b/outbound/bulk_extractor.md similarity index 100% rename from outbound/GitLeaks vs. Bulk Extractor.md rename to outbound/bulk_extractor.md From f1f1c16a333058660b41d751ec02fac7141de1a7 Mon Sep 17 00:00:00 2001 From: Dinne Kopelevich Date: Fri, 17 Jan 2025 15:13:37 -0700 Subject: [PATCH 3/4] Add bulk extractor doc Signed-off-by: Dinne Kopelevich --- outbound/bulk_extractor.md | 70 +++++++++++++++++++------------------- 1 file changed, 35 insertions(+), 35 deletions(-) diff --git a/outbound/bulk_extractor.md b/outbound/bulk_extractor.md index d522d9a..817b2d2 100644 --- a/outbound/bulk_extractor.md +++ b/outbound/bulk_extractor.md @@ -1,6 +1,6 @@ # **Outbound Reviews: Scanning Tools for PII and Secrets** -**Bulk\_extractor** is a forensic tool used to extract useful information from disk images, files and other digital media. It looks for strings of interest, such as email addresses, URLs, credit card numbers and other artifacts. Unlike traditional forensic tools, **bulk\_extractor** does not mount or parse the file system, making it faster and more suitable for unstructured data. It is commonly used for cybersecurity investigations, data recovery and digital forensics. +**Bulk_extractor** is a forensic tool used to extract useful information from disk images, files and other digital media. It looks for strings of interest, such as email addresses, URLs, credit card numbers and other artifacts. Unlike traditional forensic tools, **bulk_extractor** does not mount or parse the file system, making it faster and more suitable for unstructured data. It is commonly used for cybersecurity investigations, data recovery and digital forensics. **Gitleaks** is an open-source tool designed to detect and prevent secrets (like API keys, tokens and passwords) from being committed to Git repositories. It scans repositories, including their entire history, for hardcoded secrets and generates reports to help developers address potential security vulnerabilities. **Gitleaks** is specifically tailored to source code repositories, making it highly effective in development environments. @@ -42,19 +42,19 @@ GitLeaks: Ensure you are installing bulk\_extractor in your local drive not within the repository or disk you are checking. -* \`brew install autoconf\` -* \`brew install automake\` -* \`brew install libtool\` -* \`brew install pkg-config\` +`$ brew install autoconf` +`$ brew install automake` +`$ brew install libtool` +`$ brew install pkg-config` -* \`git clone \--recurse-submodules [https://github.com/simsong/bulk\_extractor.git](https://github.com/simsong/bulk_extractor.git)\` +`$ git clone --recurse-submodules https://github.com/simsong/bulk_extractor.git` -* \`cd bulk\_extractor\` +`$ cd bulk\_extractor` -* \`./bootstrap.sh\` -* \`./configure\` -* \`make\` -* \`make install\` +`$ ./bootstrap.sh` +`$ ./configure` +`$ make` +`$ make install` ### **Bulk-Extractor Installation for Windows** @@ -69,32 +69,32 @@ Ensure you are installing bulk\_extractor in your local drive not within the rep or * Start with a clean Virtual Machine (VM) and use these commands: - * $ git clone \--recurse-submodules [https://github.com/simsong/bulk\_extractor.git](https://github.com/simsong/bulk_extractor.git) - * $ cd bulk\_extractor/etc - * $ bash CONFIGURE\_FEDORA36\_win64.bash - * $ cd .. - * $ make win64 + `$ git clone --recurse-submodules https://github.com/simsong/bulk_extractor.git` + `$ cd bulk_extractor/etc` + `$ bash CONFIGURE_FEDORA36_win64.bash` + `$ cd ..` + `$ make win64` * Open Command Prompt and test the installation: - * \`bulk\_extractor.exe \-h\` +`bulk_extractor.exe -h` -Note: Currently bulk\_extractor 2.1 does not build on windows, but bulk\_extractor 2.0 does. +**Note:** Currently bulk\_extractor 2.1 does not build on windows, but bulk\_extractor 2.0 does. ### **Bulk-Extractor Installation for Linux** * Update your package manager: - * \`sudo apt-get update\` + `$ sudo apt-get update` * Install required dependencies - * \`sudo apt-get install \-y build-essential cmake\` + `$ sudo apt-get install -y build-essential cmake` * Clone the Bulk Extractor repository and build the source code: - * \`git clone [https://github.com/simsong/bulk\_extractor.git](https://github.com/simsong/bulk_extractor.git)\` - * \`cd bulk\_extractor\` - * \`mkdir build && cd build\` - * \`cmake ..\` - * \`make\` - * \`sudo make install\` + `$ git clone https://github.com/simsong/bulk_extractor.git` + `$ cd bulk_extractor` + `$ mkdir build && cd build` + `$ cmake ..` + `$ make` + `$ sudo make install` * Verify installation: - * \`bulk\_extractor \-h\` + `$ bulk_extractor -h` ### **How to use:** @@ -102,13 +102,13 @@ Note: Currently bulk\_extractor 2.1 does not build on windows, but bulk\_extract * Alternatively, if the repository you are checking has a directory already available for storing test results, you can use that. You can run the basic: -\`bulk\_extractor \-o \ \\` +`$ bulk_extractor -o ` You can also extract only certain files: Example: -\`bulk\_extractor \-o \ \-E email,input \\` +`$ bulk_extractor -o -E email,input ` -Outputs are saved as \`.txt\` files. +Outputs are saved as `.txt` files. --- @@ -120,8 +120,8 @@ Outputs are saved as \`.txt\` files. ### **Language use:** -Python \- Great -R \- Meh -C++ \- Great -Java \- Ok -JavaScript \- Ok +Python - Great +R - Meh +C++ - Great +Java - Ok +JavaScript - Ok From aa730266f53cd3a6b5e064ab95d255ae07fcbb93 Mon Sep 17 00:00:00 2001 From: Dinne Kopelevich Date: Fri, 17 Jan 2025 15:16:55 -0700 Subject: [PATCH 4/4] Change code blocks Signed-off-by: Dinne Kopelevich --- outbound/bulk_extractor.md | 117 ++++++++++++++++++++++--------------- 1 file changed, 70 insertions(+), 47 deletions(-) diff --git a/outbound/bulk_extractor.md b/outbound/bulk_extractor.md index 817b2d2..2d99f1a 100644 --- a/outbound/bulk_extractor.md +++ b/outbound/bulk_extractor.md @@ -1,10 +1,10 @@ # **Outbound Reviews: Scanning Tools for PII and Secrets** -**Bulk_extractor** is a forensic tool used to extract useful information from disk images, files and other digital media. It looks for strings of interest, such as email addresses, URLs, credit card numbers and other artifacts. Unlike traditional forensic tools, **bulk_extractor** does not mount or parse the file system, making it faster and more suitable for unstructured data. It is commonly used for cybersecurity investigations, data recovery and digital forensics. +**[Bulk_extractor](https://github.com/simsong/bulk_extractor)** is a forensic tool used to extract useful information from disk images, files and other digital media. It looks for strings of interest, such as email addresses, URLs, credit card numbers and other artifacts. Unlike traditional forensic tools, **bulk_extractor** does not mount or parse the file system, making it faster and more suitable for unstructured data. It is commonly used for cybersecurity investigations, data recovery and digital forensics. -**Gitleaks** is an open-source tool designed to detect and prevent secrets (like API keys, tokens and passwords) from being committed to Git repositories. It scans repositories, including their entire history, for hardcoded secrets and generates reports to help developers address potential security vulnerabilities. **Gitleaks** is specifically tailored to source code repositories, making it highly effective in development environments. +**[Gitleaks](https://github.com/gitleaks/gitleaks)** is an open-source tool designed to detect and prevent secrets (like API keys, tokens and passwords) from being committed to Git repositories. It scans repositories, including their entire history, for hardcoded secrets and generates reports to help developers address potential security vulnerabilities. **Gitleaks** is specifically tailored to source code repositories, making it highly effective in development environments. -### **Uses:** +### **Uses** | Checks for | GitLeaks | Bulk-Extractor | | :---- | :---- | :---- | @@ -24,7 +24,7 @@ | Tokens | ✅ | | | Secrets | ✅ | ✅ | -### **Key Differences:** +### **Key Differences** Bulk-extractor: @@ -38,90 +38,113 @@ GitLeaks: * Source code and secrets * Used for repos -### **Bulk-Extractor Installation for Mac:** +### **Bulk-Extractor Installation for Mac** Ensure you are installing bulk\_extractor in your local drive not within the repository or disk you are checking. -`$ brew install autoconf` -`$ brew install automake` -`$ brew install libtool` -`$ brew install pkg-config` +``` + $ brew install autoconf + $ brew install automake + $ brew install libtool + $ brew install pkg-config -`$ git clone --recurse-submodules https://github.com/simsong/bulk_extractor.git` + $ git clone --recurse-submodules https://github.com/simsong/bulk_extractor.git -`$ cd bulk\_extractor` + $ cd bulk\_extractor -`$ ./bootstrap.sh` -`$ ./configure` -`$ make` -`$ make install` + $ ./bootstrap.sh + $ ./configure + $ make + $ make install + ``` ### **Bulk-Extractor Installation for Windows** +**Note:** Currently bulk_extractor 2.1 does not build on windows, but bulk\_extractor 2.0 does. * Download the Bulk Extractor Windows binary from the [official website](https://github.com/simsong/bulk_extractor). * Extract the downloaded zip file to your preferred directory. * Add the Bulk Extractor binary folder to your system PATH: - * Press Win \+ R, type sysdm.cpl, and hit Enter. + * Press Win + R, type sysdm.cpl, and hit Enter. * Navigate to the "Advanced" tab and click on "Environment Variables." * Under "System Variables," locate and edit the Path variable. * Add the path to the Bulk Extractor folder and click OK. - or + Or * Start with a clean Virtual Machine (VM) and use these commands: - `$ git clone --recurse-submodules https://github.com/simsong/bulk_extractor.git` - `$ cd bulk_extractor/etc` - `$ bash CONFIGURE_FEDORA36_win64.bash` - `$ cd ..` - `$ make win64` +``` + $ git clone --recurse-submodules https://github.com/simsong/bulk_extractor.git + $ cd bulk_extractor/etc + $ bash CONFIGURE_FEDORA36_win64.bash + $ cd .. + $ make win64 + ``` * Open Command Prompt and test the installation: -`bulk_extractor.exe -h` +``` +bulk_extractor.exe -h +``` -**Note:** Currently bulk\_extractor 2.1 does not build on windows, but bulk\_extractor 2.0 does. ### **Bulk-Extractor Installation for Linux** -* Update your package manager: - `$ sudo apt-get update` +* Update your package manager: +``` + $ sudo apt-get update +``` * Install required dependencies - `$ sudo apt-get install -y build-essential cmake` -* Clone the Bulk Extractor repository and build the source code: - `$ git clone https://github.com/simsong/bulk_extractor.git` - `$ cd bulk_extractor` - `$ mkdir build && cd build` - `$ cmake ..` - `$ make` - `$ sudo make install` -* Verify installation: - `$ bulk_extractor -h` +``` + $ sudo apt-get install -y build-essential cmake +``` -### **How to use:** +* Clone the Bulk Extractor repository and build the source code: +``` + $ git clone https://github.com/simsong/bulk_extractor.git + $ cd bulk_extractor + $ mkdir build && cd build + $ cmake .. + $ make + $ sudo make install +``` +* Verify installation: +``` + $ bulk_extractor -h +``` + +### **How to use** * Create the directory you would like to save the output into. This can be in the repository being checked but the location is up to you. * Alternatively, if the repository you are checking has a directory already available for storing test results, you can use that. You can run the basic: -`$ bulk_extractor -o ` +``` +$ bulk_extractor -o +``` You can also extract only certain files: -Example: -`$ bulk_extractor -o -E email,input ` +Example: +``` +$ bulk_extractor -o -E email,input +``` Outputs are saved as `.txt` files. --- -### **When Not to Use Bulk Extractor:** +### **When Not to Use Bulk Extractor** * **Highly Structured File Systems**: When specific file system metadata needs to be preserved. * **Real-Time Analysis**: It’s not designed for active/live systems. * **Deep Contextual Analysis**: Bulk Extractor focuses on raw data extraction and lacks contextual file relationships. -### **Language use:** +### **Language use** +| Language | Suitability | Notes | +| :------- | :---------- | :---- | +| Python | ✅ Great | Great for libraries and easy to use. | +| Java | ✅ Great | Well suited. | +| C++ | ✅ Great | High performance and control over resources. | +| JavaScript | ⚠️ Moderate | Good for web based extraction, but less efficient for bulk extraction. | +| Ruby | ⚠️ Moderate | Can handle small to medium scale tasks. | +| R | ⚠️ Moderate | Excellent for data analysis, but less efficient for bulk extraction. | +| Bash | ❌ Poor | Limited for large-scale processing. | -Python - Great -R - Meh -C++ - Great -Java - Ok -JavaScript - Ok