Skip to content

hsz0403/Electrolyte_mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ChatGPT_Chemistry_Assistant

ChatGPT Chemistry Assistant

Step-by-step illustrations of setting up Processes 1, 2, and 3 were shown in the Supporting Information file of this article in the cookbook style.

Thank you!

Contents

· Text Mining: PDF Text Processing and Analysis with OpenAI's gpt-3.5-turbo API or gpt-4 API

· MOF Chatbot: a chatbot answers question based on post text mining data

· Predictive Model: A RF classfifier trained on post text mining data

Features

This text mining assistant includes the following main functions:

· Extraction of text from PDF files and its division into smaller chunks.

· Classfication of text segments.

· Processing and summarization of the extracted text data.

· Conversion of summarized data into a tabular format.

· Calculation of text embeddings using the OpenAI API.

· Selection of top similarity sections and their neighbors in the data.

· Calculation of text token count using the tiktoken library.

This MOF Synthesis Assistant tool provides the following core functionalities:

· Extraction of synthesis information and embeddings from a CSV file.

· Calculation of similarity scores.

· Sorting of text segments based on their similarity scores.

· Selection of top similar synthesis conditions from the sorted data.

· Processing of multiple user questions to maintain a conversational context.

· Use of the OpenAI API to generate text embeddings for user's questions based on the selected synthesis conditions.

· Maintenance of a conversation history for better contextually accurate responses in a conversational interface.

· A user-friendly conversational interface for asking questions related to MOF synthesis conditions.

This machine learning tool includes the following primary functions:

· Data Preprocessing: Reads, processes, and drops unused data columns from CSV file.

· Feature Selection: Applies RFECV for robust feature selection.

· Data Splitting: Splits data into training and testing sets with various sizes.

· Hyperparameter Tuning: Performs tuning via RandomizedSearchCV for RandomForestClassifier.

· Model Evaluation: Computes several performance metrics for each model configuration.

· Optimal Model Selection: Selects the best performing model based on balanced accuracy.

· Random Splits: Supports multiple random states for data splitting.

· Reporting: Records all performance metrics in an organized format for model comparison.

Dependencies

· This project is built on Python and requires the following libraries:

openai

requests

PyPDF2

pandas

tiktoken

sklearn

numpy

mendeleev

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published