Skip to content

Commit

Permalink
Add pagereactor target
Browse files Browse the repository at this point in the history
  • Loading branch information
Jakob Bauer committed Sep 19, 2014
1 parent 858132d commit a5b347e
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 6 deletions.
15 changes: 13 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ unsupervised2 := $(outdir)/unsupervised2.txt
semi_supervised0 := $(outdir)/semi_supervised0.txt
semi_supervised1 := $(outdir)/semi_supervised1.txt
semi_supervised2 := $(outdir)/semi_supervised2.txt
pagereactor0 := $(outdir)/pagereactor0.txt

# ------------------------------------------------------------------------------

Expand All @@ -101,9 +102,9 @@ EM_MAIN := "try, All_BIC_ExplEM_Main; catch, end, exit"

# ==============================================================================

# perform baseline and exploratory clustering
# perform baseline, exploratory and pagereactor clustering
.DELETE_ON_ERROR:
all: baseline explore
all: baseline explore pagereactor

# ==============================================================================

Expand Down Expand Up @@ -346,6 +347,16 @@ $(semi_supervised2): $(rid_fid_weight) $(rid_lid_score) $(qid_rid) \
$(PYTHON) $(srcdir)/exploratory.py $(FORMATTING_FLAGS) \
$(assgn_suffix) $(qid_rid) $(qid_eid) > $@

# ------------------------------------------------------------------------------

# pagereactor clustering
.PHONY: pagereactor
pagereactor: $(pagereactor0)

# pagereactor output grouped by string
$(pagereactor0): $(qid_tacid) | $(outdir)
cp $(qid_tacid) $@

# ==============================================================================

# create virtualenv
Expand Down
23 changes: 19 additions & 4 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -190,15 +190,30 @@ following exploreEM versions:
Detailed information regarding the clustering algorithm and its parameters can
be found in the ExploreEM documentation.

- "make" or "make all" is shorthand for calling "make raw", "make baseline" and
"make explore".
- "make pagereactor" performs clustering based on the pagereactor output. At the
moment, there is only one version:
-- pagereactor0: String only. The assignment is performed as a two step
process. In a first step, pagereactor entities whose
tacid is NULL and whose generic type is not NULL and not
OTHER are grouped by their wp14 name and assigned a nid.
In a second step, those pagereactor entitites that have
either an eid or a nid are matched with their qid. The
matching is performed based on string identity and
appearance, e.g. if there are three tac queries with the
string "abraham_lincoln" and five pagereactor queries
with the string "abraham_lincoln" then the first three
of those pagereactor queries will be matched with the
qid's from the three tac queries.

- "make" or "make all" is shorthand for calling "make raw", "make baseline",
"make explore", and "make pagereactor".

Cleaning up
-----------

- "make clean" removes all the input generated by "make raw" (i.e., the data
directory) and the output generated by "make baseline" and "make explore" (i.e.,
the output directory).
directory) and the output generated by "make baseline", "make explore", and
"make pagereactor" (i.e., the output directory).

- "make cleandist" removes the same targets as "make clean" and, in addition
to that, also removes the virtual environment. Note that in general "make clean"
Expand Down

0 comments on commit a5b347e

Please sign in to comment.