forked from scarybeasts/beebjit
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
105 lines (83 loc) · 3.62 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
Bugs and improvements
=====================
IMPORTANT
=========
Ideally, important items get fixed before the next release.
The includes: bugs and deficiencies that are egregious. Feature requests that
are currently prioritized.
JIT improvements
================
- ARM64 partial zero page caching in host registers (turbocharge or flop?)
- Investigate mode REL for dynamic operand (see Castle Quest)
- Re-add mode IDX address optimization (Galaforce?)
- x64 and ARM64: page crossing check is ripe for optimization (Galaforce sprite
loop)
Fix later
=========
Bugs and issues not serious enough to warrant fixing before the next release.
- Update BCD for 65c12.
- "back in time" support in the debugger via fast replay.
- JIT block timing code improvements. Currently, there's a non-trivial
instruction sequence after every conditional branch in a JIT block. This
sequence can be improved a lot. For example, there only needs to be a branch
at the start of the block. Also, the timing subtractions do not need to be
dependent on one another.
- Tape loading noises.
- Disc loading noises.
- BCD support in JIT. It gives accurate results but slowly because it uses the
interpreter. No evidence yet of intense BCD usage in anything that needs to go
fast.
FSD investigations
==================
Peformance tuning
=================
Software running slower than it might for various reasons. Fun to knock these
improvements out one by one on a rainy day.
These speeds are for JIT mode. In some cases here, interp mode is faster
because of some very severe JIT interaction.
(all: -mode jit -opt sound:off)
(5th gen i5, v0.9.4 or so)
- Elite.ssd. Slams hardware registers in a tight loop reading keyboard.
- Tarzan.ssd. Very slow (120MHz). interp is faster!! Looks to be a tight
wait-for-vsync loop at $24D9 causing a lot of thrash. Surely this is common.
- Tricky's Frogger. 150Mhz only. 126k CRTC/s due to CRTC register abuse.
- Crazee Rider (swram version). 27MHz, terrible! Taking a lot of recompiles
and faults.
- Uridium. 200MHz. 1 million+ faults a second?
(11th gen i7, v0.9.5+)
- Elite.ssd. Slams hardware registers. 3.3GHz.
- Tarzan.ssd. Tight vsync wait loop. 620MHz. 32.6M reg hits/s!
- Uridium.ssd. 725MHz. 830k CRTC/s.
- Firetrack.ssd. Recompiles(!), CRTC abuse. 500MHz. 400k CRTC/s.
Ideas
=====
Ideas: ideas sufficiently interesting that they are likely to be investigated
and implemented.
- Record and playback. Since the "accurate" mode has deterministic execution
behavior, we can "record" a machine session by just logging and playing back
the sources of non-determinism -- i.e. key presses and releases.
A recording can be exited at any time, fast forwarded, etc. One great use
would be to record run-throughs of tricky games (i.e. Exile), which can be
"taken over" at any point.
[half done]
- Terminal (and headless terminal) modes. Rewires serial I/O to stdio.
[half done]
- Rewind.
Possibly linked to record and playback above, rewind would go back in time a
little bit when a key is hit.
Backburner
==========
Backburner: ideas or bugs that aren't particularly important to fix.
- JIT vectorization optimization. The BBC operates a byte at a time, including
instances where its updating a multi-byte quantity, i.e. 4-byte numbers in
BASIC.
These code patterns are fairly apparent so it would be possible to optimize to
single 4-byte operations on the host CPU.
However, initial experimentation didn't reveal any obvious gains worth the
complexity. The CLOCKSP Trig/Log test does a lot of rotating of 4-byte values
and the improvement of doing that in one 4-byte operating was surprisingly
low.
- Save state / load state.
- Mouse support.
- Joystick support.
- NULA support.