-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Format preprocessed token stream in multiple passes #1898
base: master
Are you sure you want to change the base?
Conversation
@hzeller Pinging in case you missed it |
0ece5aa
to
10487a4
Compare
@hzeller I looked at the other perprocessor PRs you mentioned, and I think most of it is in But I did decide to reuse |
10487a4
to
ffba497
Compare
I think the other pull request has been mostly added to Verible in separate sub-pull requests, so they are probably mostly added; mostly my remark was to figure out if there are things still missing. Sorry, I did not have a look at this PR yet - I was busy, then vacation and then forgot about it. I'll have a look later today. |
af3df1c
to
da412ef
Compare
da412ef
to
598c025
Compare
The biggest challenges right now, from looking at the tests, are:
|
I can imagine that filtering out branches can result in the potential problems you mentioned. Maybe we can mostly mitigate that by trying to not include preprocessing branches that don't do any damage, and don't include them in the finding of The following (intentionally unaligned) input for instance will be properly aligned in the current state of the formatter: module foo(
input a,
`ifdef WITH_B
input [9999:0] b,
`endif
input wire [99:0] c
);
endmodule So here, here we don't need to do anything special in the brancing of So maybe we need to not identify all the macros that create variants but drill down to only the preprocessing branches that cause invalid syntax trees if parsed without extra treatment ? Detection of these might be a bit more complicated. Maybe we can have a very simple manual parser just looking at the token stream that e.g. counts if a block contains a balanced set of begin/end in each preprocessing branch (counter pluse/minus) ? There might be other common pairs, e.g. begin/end, module/endmodule ... so this might get a bit unwieldy if there are too many of these potentially nesting pairs to worry about... If we have such a thing, it could also reduce the number of separate passes we need to do through the file and less chance of coming into a situation where we mess up alignment due to that. Maybe we should have a video conference to discuss some ideas. And yes, having the feature behind a flag can help (and if we're confident enough and it works well, we can switch it on by default, but allow to switch off). (Sorry it took so long - I was distracted with other projects I am part of) |
@@ -115,7 +118,9 @@ class VerilogPreprocess { | |||
|
|||
// Expand macro definition bodies, this will relexes the macro body. | |||
bool expand_macros = false; | |||
// TODO(hzeller): Provide a map of command-line provided +define+'s | |||
|
|||
MacroDefinitionRegistry macro_definitions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might not be the right place for this registry, but I suspect this is here currently mostly for the experimenting.
The FileList struct has such information as well, which should probably be consolidated with what we have here.
The final location might probably be yet different; there are a bunch of TODO's around FileList anyway.
I suspect you just have it here for now to make quick progress, but once that is nearing completion, we probably need to address that 'where to put things'.
@@ -199,10 +213,10 @@ static Status ReformatVerilog(absl::string_view original_text, | |||
} | |||
|
|||
static absl::StatusOr<std::unique_ptr<VerilogAnalyzer>> ParseWithStatus( | |||
absl::string_view text, absl::string_view filename) { | |||
absl::string_view text, absl::string_view filename, | |||
const verilog::VerilogPreprocess::Config& preprocess_config = {}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use the filter_branches
config from the preprocess_config to enable the 'formatting with preprocessing' feature (so that we can test it with a flag)
verilog/formatting/formatter.cc
Outdated
} else if (IsPreprocessorControlFlow(verilog_tokentype( | ||
uwline.TokensRange().begin()->TokenEnum()))) { | ||
UnwrappedLine formatted_line = uwline; | ||
formatted_line.SetIndentationSpaces(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This zero is the indentation of the macro-line ? Maybe this can be made a constexpr
somewhere so that it is easy to find in case someone wants to indent them in a particular way and wants to find the place
(not sure how common that is in verilog, but it could be some feature that people might request later)
verilog/formatting/formatter.cc
Outdated
if (!formatted_lines_.back().CompletedFormatting()) { | ||
// Copy over any lines that did not finish wrap searching. | ||
partially_formatted_lines.push_back(&uwline); | ||
if (!uwline.TokensRange().front().token->visible) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point we should worry about the cyclomatic complexity of this function, but good enough for this first work.
verilog/formatting/tree_unwrapper.cc
Outdated
State new_state = old_state; | ||
switch (old_state) { | ||
case kPreprocessorControlFlow: { | ||
if (token_type != PP_Identifier) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that so that we don't start a new partition between the ifdef and the identifier ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
598c025
to
c90b681
Compare
7669ef9
to
2324a37
Compare
Codecov ReportPatch coverage:
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## master #1898 +/- ##
==========================================
- Coverage 92.84% 92.84% -0.01%
==========================================
Files 355 355
Lines 26249 26538 +289
==========================================
+ Hits 24372 24640 +268
- Misses 1877 1898 +21
☔ View full report in Codecov by Sentry. |
457da90
to
ffe1581
Compare
Interesting that there are issues on Mac and Windows. I suspect that either newline confusion (on Windows) or some uninitialized value that randomly defaults to something else on these platforms. |
Currently, the formatter doesn't handle many scenarios involving preprocessor `ifdef`s/`endif`s interleaved with `begin`s, module headers, etc. This patch attempts to solve this problem by performing multiple passes of the formatting on preprocessed variants of the source. Each of these variants has a different set of preprocessor branches enabled. Together, they should cover the entire source (though that doesn't work in all cases yet). After several formatting passes for different variants of the AST, a correct and properly formatted file is produced. Signed-off-by: Krzysztof Bieganski <[email protected]>
ffe1581
to
1319cd3
Compare
The Windows and MacOS failures were basically due to some incorrect/UB pointer arithmetic. It works now. There's also the Read the Docs failure, not sure what that's about. The multi-pass formatter is able to deal with all unit tests now, with the exception of two cases. First one: module pd (
`ifdef FAA
input baaaz_t foo,
`else
output reg bar
`endif
);
endmodule : pd (there's one more similar to this) Second case: `J(D(`ifdef e)) The multi-pass formatter doesn't change this, but according to the test, it should be formatted into: `J(D(
`ifdef e)) I don't really understand this test case, so I can't comment on the validity, but I think it's okay to not support this for now. The multi-pass formatter can also format all of VeeR, except for two files where Verible's parser cannot deal with some preprocessor identifiers (but that doesn't work with the default formatter anyway). I haven't checked the result thoroughly, as it's a lot of code, but at least it doesn't crash and there is no divergence. The PR is basically ready for review; the one architectural thing I don't like is that I had to put a new field in |
Just looking through pull requests that didn't finish up to the last development cycle. This one certainly should not get lost. Since this needs rebase, I might rebase it into a new pull request. Probably collecting more comments here, then addressing them in the new PR. I am doing this in my free time, so probably only progress in evenings and weekends. The use-after-move situations (clang-tidy 17 points them out nicely) is something I'd address, because we really can't assume that it would be safe to access afterwards (even though it typically works). |
Currently, the formatter doesn't handle many scenarios involving preprocessor
ifdef
s/endif
s interleaved withbegin
s, module headers, etc. (#228, #241, #267) This patch attempts to solve this problem by performing multiple passes of the formatter on preprocessed variants of the source. Each of these variants has a different set of preprocessor branches enabled. Together, they should cover the entire source (though that doesn't work in all cases yet). After several formatting passes for different variants of the AST, a correct and properly formatted file is produced.This is still work in progress, so not everything works, and the code isn't very clean. I'd love to get some early feedback on this.