ChatGPT_Chemistry_Assistant

ChatGPT Chemistry Assistant

Step-by-step illustrations of setting up Processes 1, 2, and 3 were shown in the Supporting Information file of this article in the cookbook style.

Thank you!

Contents

· Text Mining: PDF Text Processing and Analysis with OpenAI's gpt-3.5-turbo API or gpt-4 API

· MOF Chatbot: a chatbot answers question based on post text mining data

· Predictive Model: A RF classfifier trained on post text mining data

Features

This text mining assistant includes the following main functions:

· Extraction of text from PDF files and its division into smaller chunks.

· Classfication of text segments.

· Processing and summarization of the extracted text data.

· Conversion of summarized data into a tabular format.

· Calculation of text embeddings using the OpenAI API.

· Selection of top similarity sections and their neighbors in the data.

· Calculation of text token count using the tiktoken library.

This MOF Synthesis Assistant tool provides the following core functionalities:

· Extraction of synthesis information and embeddings from a CSV file.

· Calculation of similarity scores.

· Sorting of text segments based on their similarity scores.

· Selection of top similar synthesis conditions from the sorted data.

· Processing of multiple user questions to maintain a conversational context.

· Use of the OpenAI API to generate text embeddings for user's questions based on the selected synthesis conditions.

· Maintenance of a conversation history for better contextually accurate responses in a conversational interface.

· A user-friendly conversational interface for asking questions related to MOF synthesis conditions.

This machine learning tool includes the following primary functions:

· Data Preprocessing: Reads, processes, and drops unused data columns from CSV file.

· Feature Selection: Applies RFECV for robust feature selection.

· Data Splitting: Splits data into training and testing sets with various sizes.

· Hyperparameter Tuning: Performs tuning via RandomizedSearchCV for RandomForestClassifier.

· Model Evaluation: Computes several performance metrics for each model configuration.

· Optimal Model Selection: Selects the best performing model based on balanced accuracy.

· Random Splits: Supports multiple random states for data splitting.

· Reporting: Records all performance metrics in an organized format for model comparison.

Dependencies

· This project is built on Python and requires the following libraries:

openai

requests

PyPDF2

pandas

tiktoken

sklearn

numpy

mendeleev

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.vscode		.vscode
chatbot		chatbot
random_forest		random_forest
text_mining		text_mining
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatGPT_Chemistry_Assistant

About

Releases

Packages

Languages

hsz0403/Electrolyte_mining

Folders and files

Latest commit

History

Repository files navigation

ChatGPT_Chemistry_Assistant

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages