Skip to content

Commit

Permalink
Models hub (#14470)
Browse files Browse the repository at this point in the history
Co-authored-by: ahmedlone127 <[email protected]>
Co-authored-by: Maziyar Panahi <[email protected]>

* 2024-11-01-distilbart_xsum_12_6_en (#14447)

* Add model 2024-11-01-distilbart_xsum_12_6_en

* Add model 2024-11-03-gpt2_en

* Add model 2024-11-08-hubert_ukrainian_uk

* Add model 2024-11-08-hubert_ukrainian_pipeline_uk

* Add model 2024-11-08-unitku_hubert_japanese_asr_ja

* Add model 2024-11-08-unitku_hubert_japanese_asr_pipeline_ja

* Add model 2024-11-08-hubert_large_japanese_asr_ja

* Add model 2024-11-08-hubert_large_japanese_asr_pipeline_ja

---------

Co-authored-by: ahmedlone127 <[email protected]>

* 2024-11-10-rubert_address_elements_ru (#14452)

* Add model 2024-11-11-sent_bowdpr_wiki_en

* Add model 2024-11-11-cc_uffs_ppc_ft_test_multiqa_pipeline_en

* Add model 2024-11-11-unified_skill_ner_echo_en

* Add model 2024-11-11-mountain_ner_model_en

* Add model 2024-11-11-mountain_ner_model_pipeline_en

* Add model 2024-11-11-msu_wiki_ner_ru

* Add model 2024-11-11-bert_xomlac_ner_pipeline_zh

* Add model 2024-11-11-bert_base_cased_finetuned_ner_pipeline_en

* Add model 2024-11-11-bert_base_cased_finetuned_ner_en

* Add model 2024-11-11-ner_tokenclassification_persian_pipeline_en

* Add model 2024-11-11-persian_text_ner_bert_v1_fa

* Add model 2024-11-11-sent_flang_spanbert_pipeline_en

* Add model 2024-11-11-sent_gww_pipeline_en

* Add model 2024-11-11-software_ner_prod_en

* Add model 2024-11-11-quote_model_bertm_v1_pipeline_en

* Add model 2024-11-11-classify_bluesky_1000_v2_pipeline_en

* Add model 2024-11-11-msu_wiki_ner_pipeline_ru

* Add model 2024-11-11-hardware_ner_prod_en

* Add model 2024-11-11-auto_adver_pipeline_en

* Add model 2024-11-11-bert_finetuned_ner_viktoryes_pipeline_en

* Add model 2024-11-11-bert_finetuned_ner_viktoryes_en

* Add model 2024-11-11-quote_model_bertm_v1_en

* Add model 2024-11-11-software_ner_prod_pipeline_en

* Add model 2024-11-11-sent_tiny_mlm_glue_qnli_en

* Add model 2024-11-11-sent_cocodr_large_pipeline_en

* Add model 2024-11-11-ner_tokenclassification_persian_en

* Add model 2024-11-11-hardware_ner_prod_pipeline_en

* Add model 2024-11-11-embedded_e5_base_50_pipeline_en

* Add model 2024-11-11-bert_finetuned_tmvar_corpus_pipeline_en

* Add model 2024-11-11-e5_base_pipeline_en

* Add model 2024-11-11-e5_large_en

* Add model 2024-11-11-rupunct_small_ru

* Add model 2024-11-11-spanish_medical_ner_pipeline_es

* Add model 2024-11-11-nepal_bhasa_biored_model_pipeline_en

* Add model 2024-11-11-unified_skill_ner_echo_pipeline_en

* Add model 2024-11-11-e5_large_pipeline_en

* Add model 2024-11-11-e5_small_en

* Add model 2024-11-11-cleaned_e5_base_unsupervised_pipeline_en

* Add model 2024-11-11-keybert_bulgarian_pipeline_bg

* Add model 2024-11-11-bert_xomlac_ner_zh

* Add model 2024-11-11-bert_finetuned_tmvar_corpus_en

* Add model 2024-11-11-cleaned_e5_large_unsupervised_en

* Add model 2024-11-11-sent_tiny_mlm_snli_en

* Add model 2024-11-11-embedded_e5_base_50_en

* Add model 2024-11-11-cleaned_e5_base_unsupervised_en

* Add model 2024-11-11-results_pipeline_en

* Add model 2024-11-11-xlm_cebinary_vmo2_large_3_en

* Add model 2024-11-11-xlm_cebinary_vmo2_large_3_pipeline_en

* Add model 2024-11-11-southern_sotho_mpnet_base_normal_en

* Add model 2024-11-11-persian_text_ner_bert_v1_pipeline_fa

* Add model 2024-11-11-results_en

* Add model 2024-11-11-autotrain_nzog3_ca819_pipeline_en

* Add model 2024-11-11-sentence_similarity_finetuned_mpnet_adrta_pipeline_en

* Add model 2024-11-11-sentence_similarity_finetuned_mpnet_adrta_en

* Add model 2024-11-11-sentencetransformer_mpnet_base_on_chemical_dataset_en

* Add model 2024-11-11-keybert_bulgarian_bg

* Add model 2024-11-11-southern_sotho_mpnet_base10_en

* Add model 2024-11-11-sentencetransformer_mpnet_base_on_chemical_dataset_pipeline_en

* Add model 2024-11-11-e5_base_en

* Add model 2024-11-11-southern_sotho_mpnet_base_normal_pipeline_en

* Add model 2024-11-11-finetuned_sentence_similarity_en

* Add model 2024-11-11-nepal_bhasa_biored_model_en

* Add model 2024-11-11-whisper_tiny_amharic_en

* Add model 2024-11-11-cleaned_e5_large_unsupervised_pipeline_en

* Add model 2024-11-11-sent_bert_base_english_french_arabic_cased_pipeline_en

* Add model 2024-11-11-fund_embedder_en

* Add model 2024-11-11-whisper_tiny_v2_2_romanian_pipeline_en

* Add model 2024-11-11-southern_sotho_mpnet_base20_pipeline_en

* Add model 2024-11-11-auto_adver_en

* Add model 2024-11-11-whisper_small_arabic_augmentation_en

* Add model 2024-11-11-linshoufanfork_whisper_small_nan_twi_pinyin_pipeline_en

* Add model 2024-11-11-whisper_small_arabic_augmentation_pipeline_en

* Add model 2024-11-11-whisper_tiny_amharic_pipeline_en

* Add model 2024-11-11-whisper_tiny_arabic_pipeline_ar

* Add model 2024-11-11-linshoufanfork_whisper_small_nan_twi_pinyin_en

* Add model 2024-11-11-checkpoints_almino_pipeline_en

* Add model 2024-11-11-whisper_tiny_v2_2_romanian_en

* Add model 2024-11-11-autotrain_nzog3_ca819_en

* Add model 2024-11-11-whisper_omg_hi

* Add model 2024-11-11-whisper_omg_pipeline_hi

* Add model 2024-11-11-checkpoints_almino_en

* Add model 2024-11-11-whisper_small_western_frisian_dutch_transfer_from_english_fy

* Add model 2024-11-11-whisper_tiny_nob_en

* Add model 2024-11-11-whisper_tiny_nob_pipeline_en

* Add model 2024-11-11-whisper_small_western_frisian_dutch_transfer_from_english_pipeline_fy

* Add model 2024-11-11-whisper_tiny_arabic_ar

* Add model 2024-11-11-e5_small_pipeline_en

* Add model 2024-11-11-whisper_small_english_crossdelenna_en

* Add model 2024-11-11-finetuned_sentence_similarity_pipeline_en

* Add model 2024-11-11-whisper_small_malay_pipeline_my

* Add model 2024-11-11-whisper_small_malay_my

* Add model 2024-11-11-rupunct_small_pipeline_ru

* Add model 2024-11-11-southern_sotho_mpnet_base20_en

* Add model 2024-11-11-whisper_small_english_crossdelenna_pipeline_en

* Add model 2024-11-11-whisper_small_russian_f_ru

* Add model 2024-11-11-whisper_small_yt_en

* Add model 2024-11-11-whisper_small_russian_f_pipeline_ru

* Add model 2024-11-11-whisper_small_yt_pipeline_en

* Add model 2024-11-11-whisper_base_common_voice_arabic11_0_en

* Add model 2024-11-11-southern_sotho_mpnet_base10_pipeline_en

* Add model 2024-11-11-spanish_medical_ner_es

* Add model 2024-11-11-whisper_base_common_voice_arabic11_0_pipeline_en

* Add model 2024-11-11-whisper_base_hungarian_v1_hu

* Add model 2024-11-11-whisper_base_hungarian_v1_pipeline_hu

* Add model 2024-11-11-whisper_finetuned_atcosim_en

* Add model 2024-11-11-whisper_finetuned_atcosim_pipeline_en

* Add model 2024-11-11-whisper_medium_latvian_ver2_lv

* Add model 2024-11-11-whisper_medium_latvian_ver2_pipeline_lv

* Add model 2024-11-11-whisper_small_french_uncased_fr

* Add model 2024-11-11-whisper_small_french_uncased_pipeline_fr

* Add model 2024-11-11-whisper_tiny_chinese_antares28_en

* Add model 2024-11-11-whisper_tiny_chinese_antares28_pipeline_en

* Add model 2024-11-11-malaysian_whisper_tiny_ms

* Add model 2024-11-11-malaysian_whisper_tiny_pipeline_ms

* Add model 2024-11-11-whisper_medium_luluw_en

* Add model 2024-11-11-whisper_small_dutch_en

* Add model 2024-11-11-whisper_small_greek_modern_finetune_el

* Add model 2024-11-11-whisper_small_dutch_pipeline_en

* Add model 2024-11-11-whisper_small_greek_modern_finetune_pipeline_el

* Add model 2024-11-11-deberta_v3_large_lemon_spell_5k_en

* Add model 2024-11-11-deberta_v3_large_lemon_spell_5k_pipeline_en

* Add model 2024-11-11-bert_finetuned_squad_dokyoungkim_en

* Add model 2024-11-11-bert_finetuned_squad_dokyoungkim_pipeline_en

* Add model 2024-11-11-bert_large_uncased_whole_word_masking_finetuned_squad_dev_i_en

* Add model 2024-11-11-bert_large_uncased_whole_word_masking_finetuned_squad_dev_i_pipeline_en

* Add model 2024-11-11-banglabert_qa_en

* Add model 2024-11-11-mi_chatbotv3_en

* Add model 2024-11-11-mi_chatbotv3_pipeline_en

* Add model 2024-11-11-bert_sliding_window_epoch_3_en

* Add model 2024-11-11-hebert_finetuned_precedents_he

* Add model 2024-11-11-bert_sliding_window_epoch_3_pipeline_en

* Add model 2024-11-11-bert_base_uncased_finetuned_triviaqa_en

* Add model 2024-11-11-mbert_finetuned_mlqa_dev_spanish_chinese_hindi_en

* Add model 2024-11-11-bert_base_uncased_figurative_language_en

* Add model 2024-11-11-bert_base_uncased_finetuned_triviaqa_pipeline_en

* Add model 2024-11-11-bert_finetuned_squad_accelerate_3_en

* Add model 2024-11-11-banglabert_qa_pipeline_en

* Add model 2024-11-11-bert_base_uncased_figurative_language_pipeline_en

* Add model 2024-11-11-mbert_finetuned_mlqa_dev_spanish_chinese_hindi_pipeline_en

* Add model 2024-11-11-hebert_finetuned_precedents_pipeline_he

* Add model 2024-11-11-bert_finetuned_squad_accelerate_3_pipeline_en

* Add model 2024-11-11-beto_sentiment_analysis_finetuned_en

* Add model 2024-11-11-beto_sentiment_analysis_finetuned_pipeline_en

* Add model 2024-11-11-personalinfoclassifier_en

* Add model 2024-11-11-fine_tuned_metaphor_detection_en

* Add model 2024-11-11-personalinfoclassifier_pipeline_en

* Add model 2024-11-11-hs_arabic_translate_syn_4class_for_tool_en

* Add model 2024-11-11-fine_tuned_metaphor_detection_pipeline_en

* Add model 2024-11-11-clinical_trial_termination_en

* Add model 2024-11-11-factuality_model_pipeline_en

* Add model 2024-11-11-factuality_model_en

* Add model 2024-11-11-bert_classifier_spanish_news_classification_headlines_pipeline_es

* Add model 2024-11-11-kaggle_detect_generated_text_pipeline_en

* Add model 2024-11-11-bert_base_uncased_sba_clf_pipeline_en

* Add model 2024-11-11-e5_small_lora_ai_generated_detector_en

* Add model 2024-11-11-bert_340m_ft_first_1000_pref_en

* Add model 2024-11-11-kaggle_detect_generated_text_en

* Add model 2024-11-11-bert_news_class_en

* Add model 2024-11-11-politeness_model_pipeline_en

* Add model 2024-11-11-politeness_model_en

* Add model 2024-11-11-biomednlp_pubmedbert_base_uncased_abstract_fulltext_finetuned_pubmedqa_pipeline_en

* Add model 2024-11-11-scenario_nepal_bhasa_pipeline_en

* Add model 2024-11-11-bio_clinicalbert_medical_en

* Add model 2024-11-11-bert_classifier_spanish_news_classification_headlines_es

* Add model 2024-11-11-bert_base_cased_mnli_en

* Add model 2024-11-11-bert_large_finetuned_phishing_junginkim_en

* Add model 2024-11-11-popbert_pipeline_de

* Add model 2024-11-11-aspect_based_sentiment_analyzer_using_bert_en

* Add model 2024-11-11-bert_base_cased_mnli_pipeline_en

* Add model 2024-11-11-workprocess_24_10_01_en

* Add model 2024-11-11-bert_model_news_aggregator_pipeline_en

* Add model 2024-11-11-bert_base_uncased_emotion_prikshit7766_en

* Add model 2024-11-11-clinical_trial_termination_pipeline_en

* Add model 2024-11-11-nasa_smd_ibm_v0_1_uat_labeler_en

* Add model 2024-11-11-hs_arabic_translate_syn_4class_for_tool_pipeline_en

* Add model 2024-11-11-flash_italian_ns_classifier_fpt_en

* Add model 2024-11-11-bert_large_finetuned_phishing_junginkim_pipeline_en

* Add model 2024-11-11-e5_small_lora_ai_generated_detector_pipeline_en

* Add model 2024-11-11-biomednlp_pubmedbert_base_uncased_abstract_fulltext_finetuned_pubmedqa_en

* Add model 2024-11-11-climateattention_ctw_pipeline_en

* Add model 2024-11-11-climateattention_ctw_en

* Add model 2024-11-11-bio_clinicalbert_medical_pipeline_en

* Add model 2024-11-11-bert_340m_ft_first_1000_pref_pipeline_en

* Add model 2024-11-11-sst2_benign_bert_uncased_pipeline_en

* Add model 2024-11-11-roberta_base_finetuned_ner_cadec_pipeline_en

* Add model 2024-11-11-roberta_combined_generated_v1_1_epoch_7_en

* Add model 2024-11-11-roberta_base_ainu_sayula_popoluca_en

* Add model 2024-11-11-roberta_large_lemon_spell_5k_pipeline_en

* Add model 2024-11-11-roberta_test_training_pipeline_en

* Add model 2024-11-11-roberta_test_training_en

* Add model 2024-11-11-securebert_finetuned_ner_pipeline_en

* Add model 2024-11-11-bert_base_uncased_sba_clf_en

* Add model 2024-11-11-sst2_benign_bert_uncased_en

* Add model 2024-11-11-biomed_roberta_all_deep_en

* Add model 2024-11-11-bert_model_news_aggregator_en

* Add model 2024-11-11-indonesian_roberta_base_nerp_tagger_pipeline_en

* Add model 2024-11-11-indonesian_roberta_base_nerp_tagger_en

* Add model 2024-11-11-flash_italian_ns_classifier_fpt_pipeline_en

* Add model 2024-11-11-popbert_de

* Add model 2024-11-11-roberta_base_ainu_sayula_popoluca_pipeline_en

* Add model 2024-11-11-roberta_base_finetuned_ner_cadec_en

* Add model 2024-11-11-nasa_smd_ibm_v0_1_uat_labeler_pipeline_en

* Add model 2024-11-11-scenario_nepal_bhasa_en

* Add model 2024-11-11-affilgood_ner_en

* Add model 2024-11-11-bge_large_zhtw_v1_5_en

* Add model 2024-11-11-bge_small_english_v1_5_ft_orc_0930_dates_en

* Add model 2024-11-11-bge_base_legal_matryoshka_v1_pipeline_en

* Add model 2024-11-11-bsc_bio_ehr_spanish_distemist_es

* Add model 2024-11-11-finetuned_baai_bge_base_english_pipeline_en

* Add model 2024-11-11-bge_micro_smiles_pipeline_en

* Add model 2024-11-11-bge_micro_smiles_en

* Add model 2024-11-11-securebert_finetuned_ner_en

* Add model 2024-11-11-bsc_bio_ehr_spanish_distemist_pipeline_es

* Add model 2024-11-11-bge_tuned_en

* Add model 2024-11-11-bge_base_english_v1_5_course_recommender_v2_en

* Add model 2024-11-11-bge_base_legal_matryoshka_v1_en

* Add model 2024-11-11-roberta_combined_generated_v1_1_epoch_8_en

* Add model 2024-11-11-bge_small_english_v1_5_ft_orc_0930_dates_pipeline_en

* Add model 2024-11-11-roberta_base_bne_capitel_ner_bsc_lt_pipeline_es

* Add model 2024-11-11-fine_tuned_bge_large_en

* Add model 2024-11-11-bge_99gpt_v1_en

* Add model 2024-11-11-affilgood_ner_pipeline_en

* Add model 2024-11-11-roberta_large_finetuned_abbr_filtered_plod_en

* Add model 2024-11-11-roberta_base_bne_capitel_ner_plantl_gob_es_pipeline_es

* Add model 2024-11-11-bge_tuned_pipeline_en

* Add model 2024-11-11-roberta_base_absa_ate_sentiment_en

* Add model 2024-11-11-bsc_bio_ehr_spanish_medprocner_pipeline_es

* Add model 2024-11-11-lettuce_sayula_popoluca_dutch_mono_en

* Add model 2024-11-11-ruroberta_large_ner_pipeline_en

* Add model 2024-11-11-bge_base_english_v1_5_course_recommender_v2_pipeline_en

* Add model 2024-11-11-roberta_combined_generated_epoch_7_pipeline_en

* Add model 2024-11-11-roberta_combined_generated_epoch_7_en

* Add model 2024-11-11-bge_small_english_v1_5_rirag_obliqa_en

* Add model 2024-11-11-bge_99gpt_v1_pipeline_en

* Add model 2024-11-11-bert_base_uncased_emotion_prikshit7766_pipeline_en

* Add model 2024-11-11-roberta_large_finetuned_ner_finetuned_ner_en

* Add model 2024-11-11-lettuce_sayula_popoluca_dutch_mono_pipeline_en

* Add model 2024-11-11-roberta_large_finetuned_ner_finetuned_ner_pipeline_en

* Add model 2024-11-11-bge_base_english_v1_5_finetuned_osllmai_v1_pipeline_en

* Add model 2024-11-11-bert_finetuned_semantic_augmentation_ner_en

* Add model 2024-11-11-bge_large_zhtw_v1_5_pipeline_en

* Add model 2024-11-11-roberta_combined_generated_v1_1_epoch_8_pipeline_en

* Add model 2024-11-11-ruroberta_large_ner_en

* Add model 2024-11-11-roberta_spanish_clinical_trials_neg_spec_ner_en

* Add model 2024-11-11-bert_news_class_pipeline_en

* Add model 2024-11-11-roberta_base_absa_ate_sentiment_pipeline_en

* Add model 2024-11-11-finetuned_bge_base_english_pipeline_en

* Add model 2024-11-11-roberta_combined_generated_v1_1_epoch_7_pipeline_en

* Add model 2024-11-11-fine_tuned_bge_large_pipeline_en

* Add model 2024-11-11-workprocess_24_10_01_pipeline_en

---------

Co-authored-by: ahmedlone127 <[email protected]>

* Add model 2024-11-13-roberta_embeddings_legal_roberta_base_en (#14456)

Co-authored-by: gadde5300 <[email protected]>

* Add model 2024-11-20-bert_embeddings_sec_bert_base_en (#14460)

Co-authored-by: gadde5300 <[email protected]>

* 2024-11-26-mini_cpm_2b_8bit_xx (#14466)

* Add model 2024-11-26-mini_cpm_2b_8bit_xx

* Add model 2024-11-26-mini_cpm_2b_8bit_xx

* Add model 2024-11-27-nllb_distilled_600M_8int_xx

* Add model 2024-11-27-nomic_embed_v1_en

* Add model 2024-11-29-nomic_embed_v1_en

* Delete docs/_posts/ahmedlone127/2024-11-29-nomic_embed_v1_en.md

* Update 2024-11-27-nllb_distilled_600M_8int_xx.md

* Update 2024-11-27-nllb_distilled_600M_8int_xx.md

* Add model 2024-11-29-phi_3_mini_128k_instruct_en

* Update 2024-11-29-phi_3_mini_128k_instruct_en.md

* Add model 2024-11-29-qwen_7.5b_chat_en

---------

Co-authored-by: ahmedlone127 <[email protected]>

---------

Co-authored-by: jsl-models <[email protected]>
Co-authored-by: ahmedlone127 <[email protected]>
Co-authored-by: DevinTDHa <[email protected]>
Co-authored-by: Devin Ha <[email protected]>
  • Loading branch information
5 people authored Dec 2, 2024
1 parent 6653546 commit 180a3de
Show file tree
Hide file tree
Showing 6 changed files with 535 additions and 0 deletions.
86 changes: 86 additions & 0 deletions docs/_posts/ahmedlone127/2024-11-26-mini_cpm_2b_8bit_xx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
layout: model
title: mini_cpm_2b_8bit model from
author: John Snow Labs
name: mini_cpm_2b_8bit
date: 2024-11-26
tags: [en, open_source, pipeline, openvino, xx]
task: Text Generation
language: xx
edition: Spark NLP 5.5.1
spark_version: 3.0
supported: true
engine: openvino
annotator: CPMTransformer
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained CPMTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mini_cpm_2b_8bit` is a multilingual model originally trained by openbmb.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mini_cpm_2b_8bit_xx_5.5.1_3.0_1732658809236.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mini_cpm_2b_8bit_xx_5.5.1_3.0_1732658809236.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python

documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

seq2seq = CPMTransformer.pretrained("mini_cpm_2b_8bit","xx") \
.setInputCols(["documents"]) \
.setOutputCol("generation")

pipeline = Pipeline().setStages([documentAssembler, seq2seq])
data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text")
pipelineModel = pipeline.fit(data)
pipelineDF = pipelineModel.transform(data)

```
```scala

val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val seq2seq = CPMTransformer.pretrained("mini_cpm_2b_8bit","xx")
.setInputCols(Array("documents"))
.setOutputCol("generation")

val pipeline = new Pipeline().setStages(Array(documentAssembler, seq2seq))
val data = Seq("I love spark-nlp").toDF("text")
val pipelineModel = pipeline.fit(data)
val pipelineDF = pipelineModel.transform(data)

```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|mini_cpm_2b_8bit|
|Compatibility:|Spark NLP 5.5.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[generation]|
|Language:|xx|
|Size:|3.0 GB|

## References

https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16
86 changes: 86 additions & 0 deletions docs/_posts/ahmedlone127/2024-11-27-nllb_distilled_600M_8int_xx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
layout: model
title: nllb_distilled_600M_8int model from Facebook
author: John Snow Labs
name: nllb_distilled_600M_8int
date: 2024-11-27
tags: [en, open_source, pipeline, openvino, xx]
task: Text Generation
language: xx
edition: Spark NLP 5.5.1
spark_version: 3.0
supported: true
engine: openvino
annotator: NLLBTransformer
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained NLLBTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nllb_distilled_600M_8int` is a Multilingual model originally trained by facebook.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nllb_distilled_600M_8int_xx_5.5.1_3.0_1732741416718.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nllb_distilled_600M_8int_xx_5.5.1_3.0_1732741416718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python

documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

seq2seq = NLLBTransformer.pretrained("mini_cpm_2b_8bit","xx") \
.setInputCols(["documents"]) \
.setOutputCol("generation")

pipeline = Pipeline().setStages([documentAssembler, seq2seq])
data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text")
pipelineModel = pipeline.fit(data)
pipelineDF = pipelineModel.transform(data)

```
```scala

val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val seq2seq = NLLBTransformer.pretrained("mini_cpm_2b_8bit","xx")
.setInputCols(Array("documents"))
.setOutputCol("generation")

val pipeline = new Pipeline().setStages(Array(documentAssembler, seq2seq))
val data = Seq("I love spark-nlp").toDF("text")
val pipelineModel = pipeline.fit(data)
val pipelineDF = pipelineModel.transform(data)

```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|nllb_distilled_600M_8int|
|Compatibility:|Spark NLP 5.5.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[generation]|
|Language:|xx|
|Size:|842.9 MB|

## References

https://huggingface.co/facebook/nllb-200-distilled-600M
86 changes: 86 additions & 0 deletions docs/_posts/ahmedlone127/2024-11-27-nomic_embed_v1_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
layout: model
title: nomic_embed_v1 model from nomic-ai
author: John Snow Labs
name: nomic_embed_v1
date: 2024-11-27
tags: [en, open_source, openvino]
task: Embeddings
language: en
edition: Spark NLP 5.5.1
spark_version: 3.0
supported: true
engine: openvino
annotator: NomicEmbeddings
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained NomicEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mini_cpm_2b_8bit` is a multilingual model originally trained by openbmb.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nomic_embed_v1_en_5.5.1_3.0_1732743647389.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nomic_embed_v1_en_5.5.1_3.0_1732743647389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python

documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

embeddings = NomicEmbeddings.pretrained("nomic_embed_v1","en") \
.setInputCols(["document"]) \
.setOutputCol("embeddings")

pipeline = Pipeline().setStages([documentAssembler, embeddings])
data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text")
pipelineModel = pipeline.fit(data)
pipelineDF = pipelineModel.transform(data)

```
```scala

val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val embeddings = NomicEmbeddings.pretrained("nomic_embed_v1","en")
.setInputCols(Array("document"))
.setOutputCol("embeddings")

val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings))
val data = Seq("I love spark-nlp").toDF("text")
val pipelineModel = pipeline.fit(data)
val pipelineDF = pipelineModel.transform(data)

```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|nomic_embed_v1|
|Compatibility:|Spark NLP 5.5.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[generation]|
|Language:|en|
|Size:|255.0 MB|

## References

https://huggingface.co/nomic-ai/nomic-embed-text-v1
86 changes: 86 additions & 0 deletions docs/_posts/ahmedlone127/2024-11-29-phi_3_mini_128k_instruct_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
---
layout: model
title: phi_3_mini_128k_instruct model from microsoft
author: John Snow Labs
name: phi_3_mini_128k_instruct
date: 2024-11-29
tags: [en, open_source, openvino]
task: Text Generation
language: en
edition: Spark NLP 5.5.1
spark_version: 3.0
supported: true
engine: openvino
annotator: Phi3Transformer
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

Pretrained Phi3Transformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phi_3_mini_128k_instruct` is a english model originally trained by openbmb.

{:.btn-box}
<button class="button button-orange" disabled>Live Demo</button>
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phi_3_mini_128k_instruct_en_5.5.1_3.0_1732897700551.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phi_3_mini_128k_instruct_en_5.5.1_3.0_1732897700551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python

documentAssembler = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

seq2seq = Phi3Transformer.pretrained("phi_3_mini_128k_instruct","en") \
.setInputCols(["document"]) \
.setOutputCol("generation")

pipeline = Pipeline().setStages([documentAssembler, seq2seq])
data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text")
pipelineModel = pipeline.fit(data)
pipelineDF = pipelineModel.transform(data)

```
```scala

val documentAssembler = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val seq2seq = Phi3Transformer.pretrained("phi_3_mini_128k_instruct","en")
.setInputCols(Array("document"))
.setOutputCol("generation")

val pipeline = new Pipeline().setStages(Array(documentAssembler, seq2seq))
val data = Seq("I love spark-nlp").toDF("text")
val pipelineModel = pipeline.fit(data)
val pipelineDF = pipelineModel.transform(data)

```
</div>

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|phi_3_mini_128k_instruct|
|Compatibility:|Spark NLP 5.5.1+|
|License:|Open Source|
|Edition:|Official|
|Input Labels:|[documents]|
|Output Labels:|[generation]|
|Language:|en|
|Size:|3.5 GB|

## References

https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
Loading

0 comments on commit 180a3de

Please sign in to comment.