Skip to content

Examples of how convolutional neural networks can work with data from Aleph.

Notifications You must be signed in to change notification settings

occrp/clcnn-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

clcnn-classifier

Examples of how convolutional neural networks can work with data from Aleph.

To start playing clone the folder, install Miniconda, import and activate included environment. Small raw dataset with names of Kyrgyz companies and people is included in the data folder. The same datset mixed, encoded and split is in prepared_data folder.

Pretrained models are:

  • company_person_99.h5 - distinguishes between most European and former Soviet Union personal and comapny names.
  • company_person_kg.h5 - trained on the dataset provided distinguishes well between Kyrgyz personal and company names.
  • male_female_96.h5 - distinguishes between most European and former Soviet Union male and female full names.

Model architecture and training example can be found at notebooks/clcnn_classifier_model.ipynb Examples of trained model classifying single and multiple unlabeled inputs are at notebooks/predict.ipynb Example of data preparation and encoding are at notebooks/mixed_labeled_list_preparation.ipynb and notebooks/data_tokenization.ipynb

Installation

  1. Install Miniconda
  2. Go to environments folder
  3. In the terminla enter: conda env create -f clcnn.yml (you can try clcnn_gpu.yml to run nneural network on your GPU, but it will work only if you have installed drivers for compatible NVIDIA GPU)
  4. Activate environment with source activate clcnn command (for Windows: activate clcnn)
  5. Go to notebooks folder and run jupyter notebook command, it will open new browser window
  6. Open notebook of your choice

About

Examples of how convolutional neural networks can work with data from Aleph.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published