diff --git a/data/datasets/medium_articles_posts/README.md b/data/datasets/medium_articles_posts/README.md new file mode 100644 index 0000000000..1b915cc296 --- /dev/null +++ b/data/datasets/medium_articles_posts/README.md @@ -0,0 +1,45 @@ +# Medium Articles Posts Dataset + +## Description + +The Medium Articles Posts dataset contains a collection of articles published on +the Medium platform. Each article entry includes information such as the +article's title, main content or text, associated URL or link, authors' names, +timestamps, and tags or categories. + +## Dataset Info + +The dataset consists of the following features: + +- **title**: _(string)_ The title of the Medium article. +- **text**: _(string)_ The main content or text of the Medium article. +- **url**: _(string)_ The URL or link to the Medium article. +- **authors**: _(string)_ The authors or contributors of the Medium article. +- **timestamp**: _(string)_ The timestamp or date when the Medium article was + published. +- **tags**: _(string)_ Tags or categories associated with the Medium article. + +## Dataset Size + +- **Total Dataset Size**: 1,044,746,687 bytes (approximately 1000 MB) + +## Splits + +The dataset is split into the following part: + +- **Train**: + - Number of examples: 192,368 + - Size: 1,044,746,687 bytes (approximately 1000 MB) + +## Download Size + +- **Compressed Download Size**: 601,519,297 bytes (approximately 600 MB) + +### Usage example + +```python +from datasets import load_dataset +#Load the dataset +dataset = load_dataset("Falah/medium_articles_posts") + +``` diff --git a/data/datasets/medium_articles_posts/__init__.py b/data/datasets/medium_articles_posts/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/data/datasets/medium_articles_posts/load_dataset.py b/data/datasets/medium_articles_posts/load_dataset.py new file mode 100644 index 0000000000..d8b750a3b8 --- /dev/null +++ b/data/datasets/medium_articles_posts/load_dataset.py @@ -0,0 +1,4 @@ +from datasets import load_dataset + +# Load the dataset +dataset = load_dataset("Falah/medium_articles_posts") diff --git a/data/datasets/medium_articles_posts/requirements.txt b/data/datasets/medium_articles_posts/requirements.txt new file mode 100644 index 0000000000..76de43c3ed --- /dev/null +++ b/data/datasets/medium_articles_posts/requirements.txt @@ -0,0 +1 @@ +datasets==2.9.0 diff --git a/data/datasets/research_papers_dataset/ReadME.md b/data/datasets/research_papers_dataset/ReadME.md new file mode 100644 index 0000000000..6dd05432a2 --- /dev/null +++ b/data/datasets/research_papers_dataset/ReadME.md @@ -0,0 +1,139 @@ +--- +dataset_info: + features: + - name: title + dtype: string + - name: abstract + dtype: string + splits: + - name: train + num_bytes: 2363569633 + num_examples: 2311491 + download_size: 1423881564 + dataset_size: 2363569633 +--- + + + +## Research Paper Dataset 2023 + +[Check out this website](https://huggingface.co/datasets/Falah/research_paper2023) + +### Dataset Information + +The "Research Paper Dataset 2023" contains information related to research +papers. It includes the following features: + +- Title (dtype: string): The title of the research paper. +- Abstract (dtype: string): The abstract of the research paper. + +### Dataset Splits + +The dataset is divided into one split: + +- Train Split: + - Name: train + - Number of Bytes: 2,363,569,633 + - Number of Examples: 2,311,491 + +### Download Information + +- Download Size: 1,423,881,564 bytes +- Dataset Size: 2,363,569,633 bytes + +### Dataset Citation + +If you use this dataset in your research or project, please cite it as follows: + +``` +@dataset{Research Paper Dataset 2023, + author = {Falah.G.Salieh}, + title = {Research Paper Dataset 2023,}, + year = {2023}, + publisher = {Hugging Face}, + version = {1.0}, + location = {Online}, + url = {Falah/research_paper2023} +} + + +``` + +### Apache License + +The "Research Paper Dataset 2023" is distributed under the Apache License 2.0. +You can find a copy of the license in the LICENSE file of the dataset +repository. + +The specific licensing and usage terms for this dataset can be found in the +dataset repository or documentation. Please make sure to review and comply with +the applicable license and usage terms before downloading and using the dataset. + +### Example Usage + +To load the "Research Paper Dataset 2023" using the Hugging Face Datasets +Library in Python, you can use the following code: + +```python +from datasets import load_dataset + +dataset = load_dataset("Falah/research_paper2023") +``` + +### Application of "Research Paper Dataset 2023" for NLP Text Classification and Chatbot Models + +The "Research Paper Dataset 2023" can be a valuable resource for various Natural +Language Processing (NLP) tasks, including text classification and generating +titles for books in the context of chatbot models. Here are some ways this +dataset can be utilized for these applications: + +1. **Text Classification**: The dataset's features, such as the title and + abstract of research papers, can be used to train a text classification + model. By assigning appropriate labels to the research papers based on their + topics or fields of study, the model can learn to classify new research + papers into different categories. For example, the model can predict whether + a research paper is related to computer science, biology, physics, etc. This + text classification model can then be adapted for other applications that + require categorizing text. + +2. **Book Title Generation for Chatbot Models**: By utilizing the research paper + titles in the dataset, a natural language generation model, such as a + sequence-to-sequence model or a transformer-based model, can be trained to + generate book titles. The model can be fine-tuned on the research paper + titles to learn patterns and structures in generating relevant and meaningful + book titles. This can be a useful feature for chatbot models that recommend + books based on specific research topics or areas of interest. + +### Potential Benefits + +- Improved Chatbot Recommendations: With the ability to generate book titles + related to specific research topics, chatbot models can provide more relevant + and personalized book recommendations to users. +- Enhanced User Engagement: By incorporating the text classification model, the + chatbot can better understand user queries and respond more accurately, + leading to a more engaging user experience. +- Knowledge Discovery: Researchers and students can use the text classification + model to efficiently categorize large collections of research papers, enabling + quicker access to relevant information in specific domains. + +### Considerations + +- Data Preprocessing: Before training the NLP models, appropriate data + preprocessing steps may be required, such as text cleaning, tokenization, and + encoding, to prepare the dataset for model input. +- Model Selection and Fine-Tuning: Choosing the right NLP model architecture and + hyperparameters, and fine-tuning the model on the specific tasks, can + significantly impact the model's performance and generalization ability. +- Ethical Use: Ensure that the generated book titles and text classification + predictions are used responsibly and ethically, respecting copyright and + intellectual property rights. + +### Conclusion + +The "Research Paper Dataset 2023" holds great potential for enhancing NLP text +classification models and chatbot systems. By leveraging the dataset's features +and information, NLP applications can be developed to aid researchers, students, +and readers in finding relevant research papers and generating meaningful book +titles for their specific interests. Proper utilization of this dataset can lead +to more efficient information retrieval and improved user experiences in the +domain of research and academic literature exploration. diff --git a/data/datasets/research_papers_dataset/__init__.py b/data/datasets/research_papers_dataset/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/data/datasets/research_papers_dataset/load_dataset.py b/data/datasets/research_papers_dataset/load_dataset.py new file mode 100644 index 0000000000..4602f0d253 --- /dev/null +++ b/data/datasets/research_papers_dataset/load_dataset.py @@ -0,0 +1,3 @@ +from datasets import load_dataset + +dataset = load_dataset("Falah/research_paper2023") diff --git a/data/datasets/research_papers_dataset/requirements.txt b/data/datasets/research_papers_dataset/requirements.txt new file mode 100644 index 0000000000..76de43c3ed --- /dev/null +++ b/data/datasets/research_papers_dataset/requirements.txt @@ -0,0 +1 @@ +datasets==2.9.0 diff --git a/data/datasets/sentiments-dataset-381-classes/README.md b/data/datasets/sentiments-dataset-381-classes/README.md new file mode 100644 index 0000000000..6d4983e27a --- /dev/null +++ b/data/datasets/sentiments-dataset-381-classes/README.md @@ -0,0 +1,363 @@ +--- +dataset_info: + features: + - name: text + dtype: string + - name: sentiment + dtype: string + splits: + - name: train + num_bytes: 104602 + num_examples: 1061 + download_size: 48213 + dataset_size: 104602 +license: apache-2.0 +task_categories: + - text-classification +language: + - en +pretty_name: sentiments-dataset-381-classes +size_categories: + - 1K + +# Sentiments Dataset (381 Classes) + +## Dataset Description + +This dataset contains a collection of labeled sentences categorized into 381 +different sentiment classes. The dataset provides a wide range of sentiment +labels to facilitate fine-grained sentiment analysis tasks. Each sentence is +associated with a sentiment class name. + +## Dataset Information + +- Number of classes: 381 +- Features: `text` (string), `sentiment` (string) +- Number of examples: 1,061 + +## Class Names + +The dataset includes the following sentiment class names as examples: + +- Positive +- Negative +- Neutral +- Joyful +- Disappointed +- Worried +- Surprised +- Grateful +- Indifferent +- Sad +- Angry +- Relieved +- Sentiment +- Excited +- Hopeful +- Anxious +- Satisfied +- Happy +- Nostalgic +- Inspired +- Impressed +- Amazed +- Touched +- Proud +- Intrigued +- Relaxed +- Content +- Comforted +- Motivated +- Frustrated +- Delighted +- Moved +- Curious +- Fascinated +- Engrossed +- Addicted +- Eager +- Provoked +- Energized +- Controversial +- Significant +- Revolutionary +- Optimistic +- Impactful +- Compelling +- Enchanted +- Peaceful +- Disillusioned +- Thrilled +- Consumed +- Engaged +- Trendy +- Informative +- Appreciative +- Enthralled +- Enthusiastic +- Influenced +- Validated +- Reflective +- Emotional +- Concerned +- Promising +- Empowered +- Memorable +- Transformative +- Inclusive +- Groundbreaking +- Evocative +- Respectful +- Outraged +- Unity +- Enlightening +- Artistic +- Cultural +- Diverse +- Vibrant +- Prideful +- Captivated +- Revealing +- Inspiring +- Admiring +- Empowering +- Connecting +- Challenging +- Symbolic +- Immersed +- Evolving +- Insightful +- Reformative +- Celebratory +- Validating +- Diversity +- Eclectic +- Comprehensive +- Uniting +- Influential +- Honoring +- Transporting +- Resonating +- Chronicle +- Preserving +- Replicated +- Impressive +- Fascinating +- Tributary +- Momentum +- Awe-inspiring +- Unearthing +- Exploratory +- Immersive +- Transportive +- Personal +- Resilient +- Mesmerized +- Legendary +- Awareness +- Evidence-based +- Contemporary +- Connected +- Valuable +- Referencing +- Camaraderie +- Inspirational +- Evoke +- Emotive +- Chronicling +- Educational +- Serene +- Colorful +- Melodious +- Dramatic +- Enlivened +- Wonderstruck +- Enchanting +- Grandiose +- Abundant +- Harmonious +- Captivating +- Mesmerizing +- Dedicated +- Powerful +- Mystical +- Picturesque +- Opulent +- Revitalizing +- Fragrant +- Spellbinding +- Lush +- Breathtaking +- Passionate +- Melodic +- Wonderland +- Invigorating +- Dappled +- Flourishing +- Ethereal +- Elaborate +- Kaleidoscope +- Harmonizing +- Tragic +- Transforming +- Marveling +- Enveloped +- Reverberating +- Sanctuary +- Graceful +- Spectacular +- Golden +- Melancholic +- Transcendent +- Delicate +- Awakening +- Intertwined +- Indelible +- Verdant +- Heartrending +- Fiery +- Inviting +- Majestic +- Lullaby-like +- Kissed +- Behold +- Soulful +- Splendid +- Whispering +- Masterpiece +- Moving +- Crystalline +- Tapestry +- Haunting +- Renewal +- Wisdom-filled +- Stunning +- Sun-kissed +- Symphony +- Awestruck +- Dancing +- Heart-wrenching +- Magical +- Gentle +- Emotion-evoking +- Embracing +- Floating +- Tranquil +- Celestial +- Breathless +- Symphonic +- Stillness +- Delightful +- Flawless +- Commanding +- Embraced +- Heartfelt +- Precise +- Adorned +- Beautiful +- Scattering +- Timeless +- Radiant +- Regal +- Sparkling +- Resilience +- Recognized +- Echoing +- Rebirth +- Cradled +- Tirelessly +- Glowing +- Icy +- Brilliant +- Anticipation +- Awakened +- Blossoming +- Enthralling +- Excitement +- Vivid +- Spellbound +- Mellifluous +- Intricate +- Silent +- Contrasting +- Poignant +- Perfumed +- Pure +- Magnificent +- Exquisite +- Anguished +- Harmonic +- Kaleidoscopic +- Gripping +- Soothing +- Intense +- Poetic +- Fragile +- Unwavering +- Intriguing +- Fairy-tale +- Ephemeral +- Joyous +- Resplendent +- Elegant +- Coaxing +- Illuminating +- Thunderous +- Cool +- Exciting +- Teeming +- Blissful +- Enduring +- Raw +- Adventurous +- Mysterious +- Enrapturing +- Marvelous +- Swirling +- Resonant +- Careful +- Whimsical +- Intertwining +- - and more + +## Usage example + +```python +from datasets import load_dataset +#Load the dataset +dataset = load_dataset("Falah/sentiments-dataset-381-classes") +#Convert the dataset to a pandas DataFrame +df = pd.DataFrame(dataset['train']) +#Get the unique class names from the "sentiment" column +class_names = df['sentiment'].unique() +#Print the unique class names +for name in class_names: + print(f"Class Name: {name}") + +``` + +## Application + +The Sentiments Dataset (381 Classes) can be applied in various NLP applications, +such as sentiment analysis and text classification. + +## Citation + +If you use this dataset in your research or publication, please cite it as +follows: + +For more information or inquiries about the dataset, please contact the dataset +author(s) mentioned in the citation. + +``` +@dataset{sentiments_dataset_381_classes), + author = {Falah.G.Salieh}, + title = {Sentiments Dataset (381 Classes)}, + year = {2023}, + publisher = {Hugging Face}, + url = {https://huggingface.co/datasets/Falah/sentiments-dataset-381-classes}, +} +``` diff --git a/data/datasets/sentiments-dataset-381-classes/__init__.py b/data/datasets/sentiments-dataset-381-classes/__init__.py new file mode 100644 index 0000000000..e69de29bb2 diff --git a/data/datasets/sentiments-dataset-381-classes/load_dataset.py b/data/datasets/sentiments-dataset-381-classes/load_dataset.py new file mode 100644 index 0000000000..f7200c1656 --- /dev/null +++ b/data/datasets/sentiments-dataset-381-classes/load_dataset.py @@ -0,0 +1,3 @@ +from datasets import load_dataset + +dataset = load_dataset("Falah/sentiments-dataset-381-classes") diff --git a/data/datasets/sentiments-dataset-381-classes/requirements.txt b/data/datasets/sentiments-dataset-381-classes/requirements.txt new file mode 100644 index 0000000000..76de43c3ed --- /dev/null +++ b/data/datasets/sentiments-dataset-381-classes/requirements.txt @@ -0,0 +1 @@ +datasets==2.9.0