Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify adapting pystats #708

Open
mdboom opened this issue Dec 5, 2024 · 1 comment
Open

Simplify adapting pystats #708

mdboom opened this issue Dec 5, 2024 · 1 comment
Assignees

Comments

@mdboom
Copy link
Contributor

mdboom commented Dec 5, 2024

Adding a new pystat currently involves:

  1. Adding a new field to the PyStats struct (or one of its substructs)
  2. Outputting it from print_stats (or one of its subfunctions)
  3. Actually collecting the statistic by adding a call to STAT_INC (or friend)
  4. Adding code to add it to an output table in summarize_stats.py

(3) is just sort of required, but the other steps could probably be merged into one through use of macros and/or codegen, thereby making it much easier to add new stats and understand what we have, and how they flow through the system etc.

The main thing that will be hard to solve will be to maintain the use of sort of "English phrases" in the output file. For example, the stats.rare_event.set_class statistic is written to the file as Rare event (set_class):. I think it would be better if we just used the same dot notation everywhere, only converting to friendly English phrasing at the very last step in summarize_stats.py.

I think it's fair to say that pystats is an implementation detail that only interpreter hackers care about, and we are free to change the file format at a whim. This would create a "hard break" before and after such a change, but comparing stats across long time periods is cumbersome anyway. But perhaps comparing main against 3.13.0 is something we still care about.

Thoughts, @brandtbucher, @markshannon (others...)?

@mdboom mdboom self-assigned this Dec 5, 2024
@markshannon
Copy link
Member

I wouldn't worry about breakage. The main use of stats is to guide improvements and tell us about individual changes. We don't care about long term trends (at least, I don't).

Using the same dot notation everywhere is fine, but some stats are tables, like execution counts, and some have tables within fields and/or fields within tables. Instead of just a.b.c it can be a[b].c or a.b[c]. How would that work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants