Skip to content

Python-based program for training an agent for text localization using reinforcement learning with ChainerRL

Notifications You must be signed in to change notification settings

midl19t3/text-localization-agent

 
 

Repository files navigation

text-localization-agent

The code to train and evaluate the agent.

Prerequisites

You need Python 3 (preferably 3.6) installed, as well as the requirements from requirements.txt:

$ pip install -r requirements.txt 

To evaluate the agent, we require Object-Detection-Metrics. Because is not available as a pip package, you need to make it available as a module detection_metrics in the agent directory:

git clone https://github.com/rafaelpadilla/Object-Detection-Metrics
cp -r Object-Detection-Metrics/lib text-localization-agent/detection_metrics

Furthermore, you need to install the text-localization-environment by following its Installation instructions.

Usage

Training an agent requires different files based on the dataset used. For the simple dataset, you need two files:

  1. A textfile where each line contains the path to one image in the training dataset
  2. A numpy file (.npy) that contains the bounding boxes associated with each image. For n images this file contains a list with n entries where each entry is a list of bounding boxes in the format ((xtopleft, ytopleft), (xbottomright, ybottomright))

Datasets generated by the dataset generator fullfill these requirements.

You specify the dataset in a config file similar to example.ini using the dataset and dataset_path options. The dataset loader currently supports simple, sign, synthtext.

Overview of python executables:

File Purpose Example Command
train_agent.py Train the agent. This creates a new folder in ./experiments/<experiment_id> using the experiment_id specified in the config. This output directory contains saved models, logfiles (incl. tensorboard logs) and training plots. python train_agent.py --config ./config/example.ini
eval_agent.py Evaluate the agent. Tests the agent with n images and creates a report under <path-to-experiment>/evaluation containing metrics and visualizations. python eval_agent.py -e <path-to-experiment> --config ./config/example.ini
scripts/render_graph.py Creates a .dot file of the computational graph of the agent python ./scripts/visualize_agent_graph.py --config ./config/example.ini

Monitoring with TensorBoard

If you would like the program to generate log-files appropriate for visualization in TensorBoard, you need to:

  • Install tensorflow

    pip install tensorflow

    (If you use Python 3.7 and the installation fails, use: pip install --upgrade https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-1.12.0-py3-none-any.whl instead. See here, why.)

  • Run the text-localization-agent program with the --tensorboard flag (or use_tensorboard = True in the config)

    python train-agent.py --config .config/example.ini
  • Start TensorBoard pointing to the tensorboard/ directory inside the experiment's directory

    tensorboard --logdir=<path-to-text-localization-agent>/experiments/<experiment>/tensorboard/

    When running tensorboard on the chair's server, you need to additionally pass --bind_all with the command.

    If you want to compare multiple experiments, you need to copy over their tensorboard logs into a single folder & point tensorboard to it.

  • Open the TensorBoard UI via the link that is provided when the tensorboard program is started (http://localhost:600 or http://:6006)

Training on the chair's servers

To run the training on one of the chair's servers you need to:

  • Clone the necessary repositories

  • Create a new virtual environment. Note that the Python version needs to be at least 3.6 for everything to run. The default might be a lower version so if that is the case you must make sure that the correct version is used. You can pass the correct python version to virtualenv via the -p parameter, for example

    $ virtualenv -p python3.6 <envname>
  • Activate the environment via

    $ source <envname>/bin/activate
  • Install the required packages (see section "Prerequisites"). Don't forget cupy, tb_chainer and tensorflow!

  • Prepare the training data (either generate it using the dataset-generator or transfer existing data on the server)

  • To avoid stopping the training after disconnecting from the server, you might want to use a terminal-multiplexer such as tmux or screen

  • Set the CUDA_PATH and LD_LIBRARY_PATH variables if they are not already set. The command should be something like

    $ export CUDA_PATH=/usr/local/cuda
    $ export LD_LIBRARY_PATH=$CUDA_PATH/lib64:$LD_LIBRARY_PATH
  • To download the ResNet-50 caffemodel (it isn't downloaded automatically) see link and save it where necessary (an error will tell you where if you try to create a TextLocEnv).

  • Start training!

These instructions are for starting from scratch, for example if there is already a suitable virtual environment you obviously don't need to create a new one.

About

Python-based program for training an agent for text localization using reinforcement learning with ChainerRL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%