Skip to content

Commit

Permalink
Merge pull request #264 from JohnSnowLabs/162-release-candidate
Browse files Browse the repository at this point in the history
162 release candidate
  • Loading branch information
saif-ellafi authored Aug 20, 2018
2 parents 4cfae9d + 0a55b3f commit 64c421a
Show file tree
Hide file tree
Showing 9 changed files with 92 additions and 48 deletions.
44 changes: 44 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,47 @@
========
1.6.2
========
---------------
Overview
---------------
In this release, we focused on reviewing out streaming performance, buy measuring our amount of sentences processed by second, through a LightPipeline.
We increased Norvig Spell Checker by more than 300% by disabling DoubleVariants and improving algorithm orders. It is now reported capable of 42K sentences per second.
Symmetric Delete Spell checker is more performance, although it has been reported to process 2K sentences per second.
NerCRF has been reported to process 300 hundred sentences per second, while NerDL can do twice fast (about 700 sentences per second).
Vivekn Sentiment Analysis was improved and is now capable to processing 100K sentences per sentence (before it was below 500).
Finally, SentenceDetector performance was improved by a 40% from ~30K rows processed per second to ~40K. But, we have now enabled Abbreviation processing by default which reduces final speed to 22K rows per second with a negative net but better accuracy.
Again, thanks for the community for helping with feedback. We welcome everyone asking questions or giving feedback in our Slack channel or reporting issues on Github.

---------------
Enhancements
---------------
* OCR now features kernel segmentation. Significantly improves image based PDF processing
* Vivekn Sentiment Analysis prediction performance improved by better data structures
* Both Norvig and Symmetric Delete spell checkers now have improved performance
* SentenceDetector improved accuracy by better handling abbreviations. UseAbbreviations now also by default turned ON
* SentenceDetector improved performance significantly by improved preloading of rules

---------------
Bug fixes
---------------
* Fixed NerDL not training correctly (broken since 1.6.0). Pretrained models not affected
* Fixed NerConverter not properly considering multiple sentences per row (after using SentenceDetector), causing an unhandled exception to occur in some scenarios.
* Tensorflow sessions now all support allow_soft_placement, supporting GPU based graphs to work with and without GPU
* Norvig Spell Checker fixed a missing step from the algorithm to check for additional variants. May improve accuracy
* Norvig Spell Checker disabled DoubleVariants by default. Was not improving accuracy significantly and was hitting performance very hard

---------------
Developer API
---------------
* New FeatureSet allows HashSet params

---------------
Models
---------------
* Vivekn Sentiment Pipeline doesn't have Spell Checker anymore
* Fixed Vivekn Sentiment pretrained improved accuracy


========
1.6.1
========
Expand Down
40 changes: 20 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,18 @@ Questions? Feedback? Request access sending an email to [email protected]

This library has been uploaded to the spark-packages repository https://spark-packages.org/package/JohnSnowLabs/spark-nlp .

To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:1.6.1` to you spark command
To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:1.6.2` to you spark command

```sh
spark-shell --packages JohnSnowLabs:spark-nlp:1.6.1
spark-shell --packages JohnSnowLabs:spark-nlp:1.6.2
```

```sh
pyspark --packages JohnSnowLabs:spark-nlp:1.6.1
pyspark --packages JohnSnowLabs:spark-nlp:1.6.2
```

```sh
spark-submit --packages JohnSnowLabs:spark-nlp:1.6.1
spark-submit --packages JohnSnowLabs:spark-nlp:1.6.2
```

## Jupyter Notebook
Expand All @@ -35,23 +35,23 @@ export SPARK_HOME=/path/to/your/spark/folder
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
pyspark --packages JohnSnowLabs:spark-nlp:1.6.1
pyspark --packages JohnSnowLabs:spark-nlp:1.6.2
```

## Apache Zeppelin
This way will work for both Scala and Python
```
export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:1.6.1"
export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:1.6.2"
```
Alternatively, add the following Maven Coordinates to the interpreter's library list
```
com.johnsnowlabs.nlp:spark-nlp_2.11:1.6.1
com.johnsnowlabs.nlp:spark-nlp_2.11:1.6.2
```

## Python without explicit Spark installation
If you installed pyspark through pip, you can now install sparknlp through pip
```
pip install --index-url https://test.pypi.org/simple/ spark-nlp==1.6.1
pip install --index-url https://test.pypi.org/simple/ spark-nlp==1.6.2
```
Then you'll have to create a SparkSession manually, for example:
```
Expand Down Expand Up @@ -84,11 +84,11 @@ sparknlp {

## Pre-compiled Spark-NLP and Spark-NLP-OCR
You may download fat-jar from here:
[Spark-NLP 1.6.1 FAT-JAR](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/spark-nlp-assembly-1.6.1.jar)
[Spark-NLP 1.6.2 FAT-JAR](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/spark-nlp-assembly-1.6.2.jar)
or non-fat from here
[Spark-NLP 1.6.1 PKG JAR](http://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.11/1.6.1/spark-nlp_2.11-1.6.1.jar)
[Spark-NLP 1.6.2 PKG JAR](http://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.11/1.6.2/spark-nlp_2.11-1.6.2.jar)
Spark-NLP-OCR Module (Requires native Tesseract 4.x+ for image based OCR. Does not require Spark-NLP to work but highly suggested)
[Spark-NLP-OCR 1.6.1 FAT-JAR](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/spark-nlp-ocr-assembly-1.6.1.jar)
[Spark-NLP-OCR 1.6.2 FAT-JAR](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/spark-nlp-ocr-assembly-1.6.2.jar)

## Maven central

Expand All @@ -100,19 +100,19 @@ Our package is deployed to maven central. In order to add this package as a depe
<dependency>
<groupId>com.johnsnowlabs.nlp</groupId>
<artifactId>spark-nlp_2.11</artifactId>
<version>1.6.1</version>
<version>1.6.2</version>
</dependency>
```

#### SBT
```sbtshell
libraryDependencies += "com.johnsnowlabs.nlp" % "spark-nlp_2.11" % "1.6.1"
libraryDependencies += "com.johnsnowlabs.nlp" % "spark-nlp_2.11" % "1.6.2"
```

If you are using `scala 2.11`

```sbtshell
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "1.6.1"
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "1.6.2"
```

## Using the jar manually
Expand All @@ -133,17 +133,17 @@ The preferred way to use the library when running spark programs is using the `-

If you have troubles using pretrained() models in your environment, here a list to various models (only valid for latest versions).
If there is any older than current version of a model, it means they still work for current versions.
### Updated for 1.6.1
### Updated for 1.6.2
### Pipelines
* [Basic Pipeline](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline_basic_en_1.6.1_2_1533856444797.zip)
* [Advanced Pipeline](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline_advanced_en_1.6.1_2_1533856478690.zip)
* [Vivekn Sentiment Pipeline](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline_vivekn_en_1.6.1_2_1533942424443.zip)
* [Advanced Pipeline](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline_advanced_en_1.6.2_2_1534781366259.zip)
* [Vivekn Sentiment Pipeline](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline_vivekn_en_1.6.2_2_1534781342094.zip)

### Models
* [PerceptronModel (POS)](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pos_fast_en_1.6.1_2_1533853928168.zip)
* [ViveknSentimentModel (Sentiment)](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vivekn_fast_en_1.6.1_2_1533942419063.zip)
* [SymmetricDeleteModel (Spell Checker)](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spell_sd_fast_en_1.6.1_2_1533854712643.zip)
* [NorvigSweetingModel (Spell Checker)](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spell_fast_en_1.6.1_2_1533854544551.zip)
* [ViveknSentimentModel (Sentiment)](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/vivekn_fast_en_1.6.2_2_1534781337758.zip)
* [SymmetricDeleteModel (Spell Checker)](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spell_sd_fast_en_1.6.2_2_1534781178138.zip)
* [NorvigSweetingModel (Spell Checker)](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spell_fast_en_1.6.2_2_1534781328404.zip)
* [AssertionDLModel (Assertion Status)](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/as_fast_dl_en_1.6.1_2_1533855787457.zip)
* [NerCRFModel (NER)](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_fast_en_1.6.1_2_1533854463219.zip)
* [LemmatizerModel (Lemmatizer)](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lemma_fast_en_1.6.1_2_1533854538211.zip)
Expand Down
4 changes: 2 additions & 2 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ name := "spark-nlp"

organization := "com.johnsnowlabs.nlp"

version := "1.6.1"
version := "1.6.2"

scalaVersion in ThisBuild := scalaVer

Expand Down Expand Up @@ -138,7 +138,7 @@ assemblyMergeStrategy in assembly := {
lazy val ocr = (project in file("ocr"))
.settings(
name := "spark-nlp-ocr",
version := "1.6.1",
version := "1.6.2",
libraryDependencies ++= ocrDependencies ++
analyticsDependencies ++
testDependencies,
Expand Down
4 changes: 2 additions & 2 deletions docs/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ <h2 class="title">High Performance NLP with Apache Spark </h2>
</p>
<a class="btn btn-info btn-cta" style="float: center;margin-top: 10px;" href="mailto:[email protected]?subject=SparkNLP%20Slack%20access" target="_blank"> Questions? Join our Slack</a>
<b/><p/><p/>
<p><span class="label label-warning">2018 Aug 9th - Update!</span> 1.6.1 Released! Fixed S3-based clusters support, new CHUNK type annotation and more!
Learn changes <a href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.1/CHANGELOG">HERE</a> and check out for updated documentation below</p>
<p><span class="label label-warning">2018 Aug 20th - Update!</span> 1.6.2 Released! Annotation performance revisited! Check our changelog
Learn changes <a href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.2/CHANGELOG">HERE</a> and check out for updated documentation below</p>
</div>
<div id="cards-wrapper" class="cards-wrapper row">
<div class="item item-green col-md-4 col-sm-6 col-xs-6">
Expand Down
18 changes: 9 additions & 9 deletions docs/notebooks.html
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ <h4 id="scala-vivekn-notebook" class="section-block"> Vivekn Sentiment Analysis<
Since we are dealing with small amounts of data, we put in practice LightPipelines.
</p>
<p>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.1/example/src/TrainViveknSentiment.scala" target="_blank"> Take me to notebook!</a>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.2/example/src/TrainViveknSentiment.scala" target="_blank"> Take me to notebook!</a>
</p>
</div>
</section>
Expand Down Expand Up @@ -135,7 +135,7 @@ <h4 id="vivekn-notebook" class="section-block"> Vivekn Sentiment Analysis</h4>
better Sentiment Analysis accuracy
</p>
<p>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.1/python/example/vivekn-sentiment/sentiment.ipynb" target="_blank"> Take me to notebook!</a>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.2/python/example/vivekn-sentiment/sentiment.ipynb" target="_blank"> Take me to notebook!</a>
</p>
</div>
<div>
Expand All @@ -157,7 +157,7 @@ <h4 id="sentiment-notebook" class="section-block"> Rule-based Sentiment Analysis
Each of these sentences will be used for giving a score to text
</p>
</p>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.1/python/example/dictionary-sentiment/sentiment.ipynb" target="_blank"> Take me to notebook!</a>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.2/python/example/dictionary-sentiment/sentiment.ipynb" target="_blank"> Take me to notebook!</a>
</p>
</div>
<div>
Expand All @@ -177,7 +177,7 @@ <h4 id="crfner-notebook" class="section-block"> CRF Named Entity Recognition</h4
approach to use the same pipeline for tagging external resources.
</p>
<p>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.1/python/example/crf-ner/ner.ipynb" target="_blank"> Take me to notebook!</a>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.2/python/example/crf-ner/ner.ipynb" target="_blank"> Take me to notebook!</a>
</p>
</div>
<div>
Expand All @@ -196,7 +196,7 @@ <h4 id="dlner-notebook" class="section-block"> CNN Deep Learning NER</h4>
and it will leverage batch-based distributed calls to native TensorFlow libraries during prediction.
</p>
<p>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.1/python/example/dl-ner/ner.ipynb" target="_blank"> Take me to notebook!</a>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.2/python/example/dl-ner/ner.ipynb" target="_blank"> Take me to notebook!</a>
</p>
</div>
<div>
Expand All @@ -211,7 +211,7 @@ <h4 id="text-notebook" class="section-block"> Simple Text Matching</h4>
This annotator is an AnnotatorModel and does not require training.
</p>
<p>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.1/python/example/text-matcher/extractor.ipynb" target="_blank"> Take me to notebook!</a>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.2/python/example/text-matcher/extractor.ipynb" target="_blank"> Take me to notebook!</a>
</p>
</div>
<div>
Expand All @@ -226,7 +226,7 @@ <h4 id="assertion-notebook" class="section-block"> Assertion Status with LogReg<
dataset will return the appropriate result.
</p>
<p>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.1/python/example/logreg-assertion/assertion.ipynb" target="_blank"> Take me to notebook!</a>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.2/python/example/logreg-assertion/assertion.ipynb" target="_blank"> Take me to notebook!</a>
</p>
</div>
<div>
Expand All @@ -241,7 +241,7 @@ <h4 id="dlassertion-notebook" class="section-block"> Deep Learning Assertion Sta
graphs may be redesigned if needed.
</p>
<p>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.1/python/example/dl-assertion/assertion.ipynb" target="_blank"> Take me to notebook!</a>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.2/python/example/dl-assertion/assertion.ipynb" target="_blank"> Take me to notebook!</a>
</p>
</div>
<div>
Expand All @@ -260,7 +260,7 @@ <h4 id="downloader-notebook" class="section-block"> Retrieving Pretrained models
Such components may then be injected seamlessly into further pipelines, and so on.
</p>
<p>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.1/python/example/model-downloader/ModelDownloaderExample.ipynb" target="_blank"> Take me to notebook!</a>
<a class="btn btn-warning btn-cta" style="float: center;margin-top: 10px;" href="https://github.com/JohnSnowLabs/spark-nlp/blob/1.6.2/python/example/model-downloader/ModelDownloaderExample.ipynb" target="_blank"> Take me to notebook!</a>
</p>
</div>
</section>
Expand Down
Loading

0 comments on commit 64c421a

Please sign in to comment.