Skip to content

Commit

Permalink
Adding files
Browse files Browse the repository at this point in the history
  • Loading branch information
mshamsrainey committed Jul 27, 2021
1 parent b82e2a4 commit db042fe
Show file tree
Hide file tree
Showing 4,035 changed files with 10,968,784 additions and 0 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
74 changes: 74 additions & 0 deletions create_executable_instructions.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
Directions to Create Description Audit Executable:

------------------------

# FAQ:

## Who is this guide for?

This guide is intended for interested parties with at least working knowledge of Python programming,
command line interfaces, and Git. Users interested in re-producing the executable program should
first check if there is already an up-to-date executable for the operating system on their computer

## Why do operating systems matter?

Executables in Python can only be packaged from the operating system that will be used to run them.
Machines running a Linux operating system, for example, can create an executable that will run on other
Linux machines, but there is no easily accessible way to create an executable for other operating systems
from that one machine. Especially with intellectual property laws pertaining to the use of Mac OS X by
non-Apple users, the best workaround that we've found is to provide directions for prospective users/interested
parties to compile the application as an executable for their respective operating systems.

If you create an executable for your operating system and the project doesn't have it available yet,
we'd love it if you submitted the executable back to us with a merge request for others to use! This
project is approaching a natural closing point at the Rubenstein Library, but updates, additions, etc.
are welcomed from other interested parties.

## Why is this project distributed as an executable?

This project is primarily intended to support library archivists and other members of the library community
focused on anti-racism and social justice in archival work. Although library archivists have a number of
niche skills which are very valuable on this project, a complex knowledge of computer science isn't exactly
in their job descriptions! We have chosen to create an executable with a graphic user interface (GUI) for
ease of use by the people most likely to be engaging with this work. An executable created with PyInstaller
allows users to click on the executable and run the project in minutes, without having to independently download
Python or any other dependencies used in the project. All of the code is 'under the hood' to present an easy,
accessible interface to end users with limited computer science background.

This project can still be run from the command line or from a Python IDE. Please see our README for installation instructions
for the full project, command line argument descriptions, etc.

------------------------

**If you have already forked the project onto your local machine and installed dependencies, skip to step 5**

1. Fork and clone project to local machine from Git.
2. Create a virtual environment to house dependencies for this project. I typically use venv on the command line within
the project directory for this, as seen below, but feel free to use your favorite virtual environment:
on Linux or macOS: python3 -m venv env
on Windows: py -m venv env
(If you don't have venv installed, you can install it via pip from macOS/or Linux and Windows using
python3 -m pip install --user virtualenv OR py -m pip install --user virtualenv, respectively)
3. Activate virtual environment. As long as this virtual environment is activated, pip will install packages into
this specific virtual environment, preventing any possible interaction between dependencies for this project and others
on your device. Deactivate this virtual environment by tying 'deactivate' in your command line, and activate as follows:
on Linux or macOS: source env/bin/activate
on Windows: .\env\Scripts\activate
4. Install project dependencies from requirements.txt. This will install all libraries and packages mentioned in this file,
therefore establishing needed dependencies to run project normally:
pip install -r requirements.txt
5. Install PyInstaller to create project executable:
pip install PyInstaller
6. PyInstaller generally runs in two steps, first creating the executable using a directory. Do this while adding making sure to indicate
that PyInstaller should check for the manually created hook-spacy.py used to install spaCy's hidden dependencies:
pyinstaller CLI.py --additional-hooks-dir=.
7. PyInstaller output is very verbose, and it can take a few minutes to run depending on your machine, but once this has completed, repeat
using the --onefile tag to create a single executable for distribution:
pyinstaller CLI.py --additional-hooks-dir=. --onefile
8. The up-to-date executable for your operating system should be in the 'dist' folder in your local project directory. Depending on your
machine/operating system, the file extension may vary, but on Windows there is a single executable file CLI.exe that will run the entire project.

------------------------

Good luck! Raise any issues on Git with questions.

69 changes: 69 additions & 0 deletions description_audit.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import argparse
import os
import sys
from scripts.description_audit_driver import main
from scripts.description_audit_GUI import main as run_gui

guiparser = argparse.ArgumentParser()
guiparser.add_argument('--nogui', default=False, action="store_true")

if __name__ == '__main__':

preargs = guiparser.parse_known_args()

if not preargs[0].nogui:
# launch GUI
args_from_gui = run_gui()
lexicon_csv_path = args_from_gui[0]
lexicon_test = args_from_gui[1]
hatebase_include = args_from_gui[2]
output_path = args_from_gui[3]
ead_path = args_from_gui[4]
marcxml_path = args_from_gui[5]

else:
noguiparser = guiparser
noguiparser.add_argument('lexicon_csv_path', type=str, help="Path to CSV file containing lexicons")
noguiparser.add_argument('lexicon_test', type=str, help="Headers to CSV indicating lexicons to match to. "
"To use multiple, separate by underscores. To use all, "
"type 'ALL'.")
# If you have any particularly lengthy or false positive-prone lexicons that you want to only include if
# they are explicitly declared, modify references to the below variable in parse_lexicon() driver function.
noguiparser.add_argument('hatebase_include', type=int, help="Boolean True or False indicating "
"whether the lengthy HateBase "
"lexicons should be included. "
"Default is False.")
noguiparser.add_argument('output_path', type=str, help="Path to folder where CSV reports should be stored")
noguiparser.add_argument('ead_path', type=str, help="Path to folder comprised of EAD archive files in XML.")
noguiparser.add_argument('marcxml_path', type=str, help="Path to XML file containing MARCXML archive.")

args = noguiparser.parse_args()
print(args)
lexicon_csv_path = args.lexicon_csv_path
lexicon_test = args.lexicon_test
hatebase_include = args.hatebase_include
output_path = args.output_path
ead_path = args.ead_path
marcxml_path = args.marcxml_path

if not os.path.isfile(lexicon_csv_path):
print("The lexicon CSV file specified does not exist on this path.")
sys.exit()

if not os.path.isdir(output_path):
print("The output path given is not a file directory.")
sys.exit()

if marcxml_path == ead_path:
print("Path to at least one archival structure must be specified")
sys.exit()

if not (os.path.isdir(ead_path) or (ead_path == "NONE")):
print("The EAD path given does not lead to a directory of archival information.")
sys.exit()

if not (os.path.isfile(marcxml_path) or (marcxml_path == "NONE")):
print("The MARCXML archival structure does not exist on this path.")
sys.exit()

main(lexicon_csv_path, lexicon_test, hatebase_include, output_path, ead_path, marcxml_path)
34 changes: 34 additions & 0 deletions description_audit.spec
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# -*- mode: python ; coding: utf-8 -*-


block_cipher = None


a = Analysis(['description_audit.py'],
pathex=['C:\\Users\\msham\\OneDrive\\Documents\\description-audit'],
binaries=[],
datas=[],
hiddenimports=[],
hookspath=['.'],
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
cipher=block_cipher)
exe = EXE(pyz,
a.scripts,
a.binaries,
a.zipfiles,
a.datas,
[],
name='description_audit',
debug=False,
bootloader_ignore_signals=False,
strip=False,
upx=True,
upx_exclude=[],
runtime_tmpdir=None,
console=True )
3 changes: 3 additions & 0 deletions dist/description_audit.exe
Git LFS file not shown
42 changes: 42 additions & 0 deletions hook-spacy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
from PyInstaller.utils.hooks import collect_all

# ----------------------------- SPACY -----------------------------
data = collect_all('spacy')

datas = data[0]
binaries = data[1]
hiddenimports = data[2]

# ----------------------------- THINC -----------------------------
data = collect_all('thinc')

datas += data[0]
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- CYMEM -----------------------------
data = collect_all('cymem')

datas += data[0]
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- PRESHED -----------------------------
data = collect_all('preshed')

datas += data[0]
binaries += data[1]
hiddenimports += data[2]

# ----------------------------- BLIS -----------------------------

data = collect_all('blis')

datas += data[0]
binaries += data[1]
hiddenimports += data[2]
# This hook file is a bit of a hack - really, all of the libraries should be in seperate

# ----------------------------- OTHER -----------------------------

hiddenimports += ['bs4', 'pandas', 'srsly.msgpack.util']
54 changes: 54 additions & 0 deletions lexicons/FullLexiconCSV_May12.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
Aggrandizement,RaceEuphemisms,RaceTerms,SlaveryTerms,GenderTerms
acclaimed,color blind,aboriginal,abolition,miss
ambitious,colored,aboriginals,abolitionist,mistress
celebrated,coloured,aborigines,antislavery,mrs.
distinguished,negro,aliens,anti-slavery,muse
eminent,race relations,arab,bill of sale,spouse
esteemed,race situation,arabs,bills of sale,wife
expert,race-based,asians,enslaved,
father of,racial,asiatic,freed slave,
foremost,racism,blacks,freed slaves,
founding father,riot,bushman,freedman,
genius,troubles,bushmen,freedmen,
gentleman,unruly,bushwoman,manumission,
important,,chink,manumitted,
influential,,civilized,negro,
man of letters,,coolie,overseer,
masterpiece,,coolies,plantation,
notable,,creole,planter,
pioneer,,creoles,runaway slave,
plantation owner,,dyke,runaway slaves,
planter,,ethnic,slave,
preeminent,,exotic,slave holder,
prestigious,,fag,slave master,
prolific,,gook,slave owner,
prominent,,gypsies,slave owner,
renowned,,gypsy,slaveholder,
respected,,hispanics,slavery,
revolutionary,,illegal alien,slaves,
seminal,,illegal aliens,,
successful,,illegal immigrant,,
wealthy,,illegal immigrants,,
,,illegals,,
,,indian,,
,,indians,,
,,japs,,
,,mammy,,
,,mulatto,,
,,mulattoes,,
,,mulattos,,
,,native americans,,
,,natives,,
,,negoes,,
,,negro,,
,,negros,,
,,oriental,,
,,primitive people,,
,,primitives,,
,,pygmies,,
,,pygmy,,
,,sambo,,
,,savages,,
,,squaw,,
,,squaws,,
,,uncivilized,,
Loading

0 comments on commit db042fe

Please sign in to comment.