Generative Social Choice: early chatbot personalization experiment

This repo contains the code and data associated with an early pilot experiment on chatbot personalization from the project Generative Social Choice (paper, general audience report). This experiment was conducted in November 2023 as part of OpenAI's Democratic Inputs to AI program. We've since updated and improved our entire experimental pipeline, and conducted a follow-up experiment. If you want to build on our framework, we strongly recommend you use our new code and data (public link forthcoming). This repo only contains the necessary code to replicate the early pilot experiment on chatbot personalization.

Authors of Generative Social Choice: Sara Fish, Paul Gölz, David Parkes, Ariel Procaccia, Gili Rusak, Itai Shapira, and Manuel Wüthrich.

Setup instructions

In the folder where this README.md file is located, call pip install -e .
Install dependencies: pipenv install
Create a file OPENAI_API_KEY in utils/, and write in it your (personal) API key.

Overview of repo

data/ has all cleaned and anonymized data associated with the experiments in the paper:
- chatbot_personalization_data.csv: our cleaned and anonymized survey data, collected on Prolific. Also available at the dedicated repo
- validate_disc_query_logs.csv: logs from our discriminative query validation experiment (Figure 1, replicate with paper_replication/validate_discriminative_query.py)
- gen_query_eval/: logs from our generative query evaluation experiment (Figure 2, replicate with paper_replication/gen_query_eval.py)
- user_summaries_generation.csv and user_summaries_generation_raw_output.csv: the user summaries (and logs) used in our slate generation (replicate with paper_replication/generate_summaries.py)
- ratings_and_matching.csv: assignments of validation users to statements (Figures 4-5, replicate with paper_replication/compute_matching.py)
paper_replication/ has scripts for replicating the experiments in the paper:
- validate_discriminative_query.py: validating discriminative queries (Figure 1)
- gen_query_eval.py: evaluating generative queries (Figure 2)
- generate_summaries.py: generating user summaries
- generate_slate.py: generating slate
- compute_matching.py: compute assignment of users to statements in slate (Figures 4-5)
plots/ has code for generating each of the plots in the paper, and the plots themselves
queries/ has implementation of the queries:
- query_chatbot_personalization.py contains all of the chatbot personalization specific implementation
- query_interface.py describes the interface for agents and generators. Anything that implements this interface should automatically work with our slate generation code.
slates/ has our implementation of the slate generation algorithm in generate_slate_ensemble_greedy.py
test/ has unit tests
utils/ has miscellaneous tools
- gpt_wrapper.py contains code for making LLM calls
- helper_functions.py has get_base_dir_path() and get_time_string()
- dataframe_completion.py contains code for df-completion style LLM calls, used for our summary generation and generative query.

Generating plots from paper

Each figure in the paper can be generated using a dedicated notebook:

Figure 1: plots/fig1_disc_query_eval.ipynb
Figure 2: plots/fig2_slate_composition.ipynb
Figure 3: N/A
Figure 4: plots/fig4_assigned_utilities_pie_chart.ipynb
Figure 5: plots/fig5_assigned_utilities_histogram.ipynb

Testing instructions

Quick testing

To run unit tests with gpt-4o-mini, run the following command.

python -m unittest -k fast -v

Slower testing for replication

To run unit tests using the exact LLMs used in the paper (for replication purposes), run the following command. This requires access to gpt-4-base and gpt-4-32k-0613.

python -m unittest -k replication -v

To run all unit tests, run the following command. This requires access to gpt-4-base and gpt-4-32k-0613.

python -m unittest -v

Replicating paper instructions

Quick replication

The quickest and cheapest way to rerun our experiments is to use a more modern LLM such as gpt-4o. The below commands run the exact experiments from our paper, except gpt-4o is used in place of gpt-4-base and gpt-4-32k-0613.

Generate summaries of users

Generate summaries of all users:

python paper_replication/generate_summaries.py --model gpt-4o

Generate summary for a single user (for testing):

python paper_replication/generate_summaries.py --model gpt-4o --num_agents 1

Empirical validation of discriminative query

To run the full experiment empirically validating the discriminative query (600 LLM calls):

python paper_replication/validate_discriminative_query.py --model gpt-4o

To empirically validate a single discriminative query (for testing):

python paper_replication/validate_discriminative_query.py --model gpt-4o --num_samples 1

Empirical evaluation of generative query

To run the full experiment empirically evaluating the generative query:

python paper_replication/gen_query_eval.py --model gpt-4o

To evaluate a single ensemble round (for testing):

python paper_replication/gen_query_eval.py --model gpt-4o --num_rounds 1

Generating slate

To generate a slate for all users:

python paper_replication/generate_slate.py --model gpt-4o

To generate a slate for only 10 users (for testing):

python paper_replication/generate_slate.py --model gpt-4o --num_agents 10

"Exact" reproduction (subject to LLM stochasticity)

To "exactly" (subject to inherent LLM stochasticity) reproduce our experiments, run the below commands. These require access to gpt-4-base and gpt-4-32k-0613. These will write logs to data/chatbot_personalization/demo_data/. To test on smaller sample sizes, use the --num_agents and --num_samples arguments (usage demonstrated above).

Generate summaries of users

python paper_replication/generate_summaries.py --model default

Empirical validation of discriminative query

python paper_replication/validate_discriminative_query.py --model default

Empirical evaluation of generative query

python paper_replication/gen_query_eval.py --model default

Generating slate

python paper_replication/generate_slate.py --model default

Match validation users to slate statements

This step uses Gurobi, but no LLM calls.

python paper_replication/compute_matching.py

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
chatbot_personalization		chatbot_personalization
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generative Social Choice: early chatbot personalization experiment

Setup instructions

Overview of repo

Generating plots from paper

Testing instructions

Quick testing

Slower testing for replication

Replicating paper instructions

Quick replication

Generate summaries of users

Empirical validation of discriminative query

Empirical evaluation of generative query

Generating slate

"Exact" reproduction (subject to LLM stochasticity)

Generate summaries of users

Empirical validation of discriminative query

Empirical evaluation of generative query

Generating slate

Match validation users to slate statements

About

Releases

Packages

Languages

generative-social-choice/chatbot_personalization

Folders and files

Latest commit

History

Repository files navigation

Generative Social Choice: early chatbot personalization experiment

Setup instructions

Overview of repo

Generating plots from paper

Testing instructions

Quick testing

Slower testing for replication

Replicating paper instructions

Quick replication

Generate summaries of users

Empirical validation of discriminative query

Empirical evaluation of generative query

Generating slate

"Exact" reproduction (subject to LLM stochasticity)

Generate summaries of users

Empirical validation of discriminative query

Empirical evaluation of generative query

Generating slate

Match validation users to slate statements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages