-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add "forbid-implicit-declarations" lint rule #244
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest we build up a full suite of test cases first before diving into implementation.
Good start so far.
|
||
TEST(ForbidImplicitDeclarationsRule, FunctionFailures) { | ||
auto kToken = SymbolIdentifier; | ||
const std::initializer_list<LintTestCase> ForbidImplicitDeclarationsTestCases = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially focusing on test cases first:
- [question for Verilog experts] can implicit nets also be created by merely referencing them, say on the RHS of an assign, or inside an expression?
assign foo = ~can_this_be_an_implicit_net;
. Years ago, I found such a case while debugging someone else's netlist, and root causing a chip failure as being attributed to something like a typo.
Suggest test cases:
- where LHS includes identifiers inside concatenation grouping like
{x, y, z} = ...
- with multiple net assignments, e.g.
assign a = b, c = d
. - net implicit-decls/references inside generate blocks (including conditional, and loop)
- LHS with array indices like
a[j]
- where net is explicitly declared after its first use (out-of-order)
The answers to the above may influence choice of implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, an implicit net cannot be created by referencing them on the RHS. Of course, this is according to the LRM, vendor's implementation can differ.
Using implicit declaration with array (assign a[0] = 1'b1
) is also illegal. I believe the array indices must be resolved before the assignment, therefore if a
does not exist, this would fail.
Another missing test case is implicit net declared in a terminal/port connection list:
module m;
wire a;
not U1(implicit_net_0, a);
my_m U2(implicit_net_1);
endmodule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great thanks for the clarification, and thanks for suggesting more test cases!
Your test case is exactly the situation I was thinking of (not usage in an expression but rather in a port connection). To that, I would add:
module m;
my_m U3(.foo(implicit_net_2));
my_m U4(.nonexistent_net_in_this_scope); // assuming .x is equiv. to .x(x)
endmodule
This test case, I would expect to be clean, because The authoritative text I'm reading and interpreting is LRM 6.10, this passage: If an identifier appears on the left-hand side of a continuous assignment statement, and that identifier has not been declared previously in the scope where the continuous assignment statement appears or in any scope whose declarations can be directly referenced from the scope where the continuous assignment statement appears (see 23.9), then an implicit scalar net of default net type shall be assumed. See 10.3 for a discussion of continuous assignment statements. |
You're right. It's not an implicit net declaration and it shouldn't be reported as violation.
I'll write tests for that and fix code. Thanks for reply. |
" endgenerate\n" | ||
" assign a1 = 1'b0;\n" | ||
"endmodule"}, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More test case ideas:
- assignments/connections inside loop and conditional generate constructs
// TODO: module scope | ||
|
||
// TODO: nets declared inside terminal/port connection list | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even more test case ideas based on this passage in LRM 6.10:
The implicit net declaration shall belong to the scope in which the net reference appears. For example, if the implicit net is declared by a reference in a generate block, then the net is implicitly declared only in that generate block. Subsequent references to the net from outside the generate block or in another generate block within the same module either would be illegal or would create another implicit declaration of a different net (depending on whether the reference meets the preceding criteria). See Clause 27 for information about generate blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This lint rule is very close to detecting that all variables are declared in the scope they are used. One about implicit net declaration which yields valid, yet undesirable, Verilog, and the other one is about "no such variable", which is always invalid Verilog.
I would add a test case to verify the scope is proper
module m;
wire a1;
generate
wire a2;
assign a1 = 1'b1; // OK, a1 is in scope
endgenerate
assign a2 =1'b1; // Implicit net declaration
endmodule
Note that my simulator rejected @ldalek-antmicro's proposed test case because begin/end block is not allowed in a module. I'm trying to find the proper LRM section, but I believe block statement is allowed in processes only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@corco You are correct that bare begin/end blocks were removed from SV grammar (but were in Verilog-2001). We even have a lint rule for it: https://github.com/google/verible/blob/master/verilog/analysis/checkers/v2001_generate_begin_rule.h
I found that bare begin/end generate blocks were so common in practice, that I supported it in the yacc grammar of verilog.y (and more easily accept legacy verilog), only to provide that lint rule, intended for SV code. (I don't have dialect selection of rules implemented yet, though.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I added test case which @corco suggested. Also I replaced begin-end
blocks with submodules. I got used to them too much and forgot to replace.
@@ -17,7 +17,6 @@ | |||
|
|||
#include <set> | |||
#include <string> | |||
#include <stack> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <vector>
bool ContainsAncestor(const verible::Symbol* const symbol, | ||
const verible::SyntaxTreeContext& context) const { | ||
CHECK_NOTNULL(symbol); | ||
|
||
// check for common ancestors | ||
for (const auto& iter : context) { | ||
if (iter == symbol) { | ||
return true; | ||
} | ||
} | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks suitable for SyntaxTreeContext, can we move it there?
I would also define it using std::find
or absl::c_find
.
while ((scopes_.size() > 0) && | ||
(!ContainsAncestor(scopes_.back().symbol_, context))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks quadratic: a linear search inside a while loop.
Is there a reason why a linear search is needed?
This seems like a job for automatic balancing, like:
namespace analysis { | ||
|
||
// ForbidImplicitDeclarationsRule detect implicitly declared nets | ||
class ForbidImplicitDeclarationsRule : public verible::SyntaxTreeLintRule { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is my first serious look at the implementation of this analysis, after we've amassed a healthy collection of test cases.
I'll start with high-level design thoughts. @hzeller This might be of interest to you, since you're looking at lint rule design issues these days.
I see that you've chosen to base this on SyntaxTreeLintRule
which seems intuitive, and a combination of matchers.
- Keep in mind that matchers, though powerful and expressive, run on every node that is visited (
HandleNode
), so every node is starting a new pattern search. - The limitation of a using
SyntaxTreeLintRule
is that it lacks control over the visitation of nodes, which is being done inSyntaxTreeLinter::Visit
. Why was this done?SyntaxTreeLinter
collects a bunch of rules, and executes them all in a single CST traversal that builds up and tears down SyntaxTreeContext once. The implication is that each instantiated rule object must maintain enough state to have its actions interleaved among other "concurrently" executing rules. Had we let each rule manage its own traversal, they could simply look like functions of CST structures that return collections of lint violations. However, we are not forced to share traversals with aSyntaxTreeLintRule
, we have the option of owning and customizing traversal by using the more generalTextStructureLintRule
.
Looking over the current implementation, I see effort in maintaining a stack that resembles the SyntaxTreeContext
maintained by SyntaxTreeLinter
and its base TreeContextVisitor
. (Note that verible::TreeUnwrapper
is another subclass of TreeContextVisitor
.) Only instead of growing the stack on every node, you only grow the stack on nodes bearing lexical scope. If you extended TreeContextVisitor
to include a scope-stack (Verilog-specific, no longer language-agnostic) in addition to the normal SyntaxTreeContext
stack, that could be the basis for all scoped analyses that use symbol tables (like your DeclType
).
What about matching? In past PRs I've often offered the alternative of using direct substructure access through CST/ functions, which is much faster, because it is not based on any tree-searching/predicate-matching. The alternative to grabbing .Bind()
symbols from BoundSymbolManager
is direct access.
- From every net declaration (or element declaration) you can directly reach the identifier(s) it declares precisely.
- From every assignment you can isolate the LHS from the RHS:
- You could set state variables to distinguish when you're in the left vs. right instead of trying to infer from upward inspection.
- Based on left vs. right, you would know whether you are looking at lvalues vs. rvalues (but LHS references can have rvalues in array indices, so be cautious there.)
- Most of the logic would be in the form of a sparse switch-case statement on node types (much like the formatter's
TreeUnwrapper::SetIndentationsAndCreatePartitions
) whose interesting cases include declarations of several types, assignments, instantiations, loop/conditional generate blocks, references, and non-interesting cases would just be default traversal.
Does this sound simpler than your current implementation?
Okay. I've modified my implementation. I'm using At this moment I'm curious what do you think about that approach (apart from missing tests for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress on this.
Suggest continuing to expand test cases (noted as TODOs), and hoping that might lead to implementation simplification.
CHECK_NOTNULL(syntax_tree); | ||
syntax_tree->Accept(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can also write: ABSL_DIE_IF_NULL(syntax_tree)->Accept(this);
const auto& leaf = *ABSL_DIE_IF_NULL( | ||
dynamic_cast<const verible::SyntaxTreeLeaf*>(&symbol)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use SymbolCastToLeaf
from common/text/tree_utils.h (for consistency)
const auto identifier = leaf.get().text; | ||
|
||
// Search upward stack and for declared nets. | ||
for (auto itr = scope.rbegin() ; itr != scope.rend() ; ++itr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for (const auto& s : verible::reversed_view(scope)) { ...
// Search upward stack and for declared nets. | ||
for (auto itr = scope.rbegin() ; itr != scope.rend() ; ++itr) { | ||
const auto& declared_nets = (*itr)->declared_nets_; | ||
const auto& net = declared_nets.find(identifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const auto& net_iter = ...
|
||
std::set<verible::LintViolation> violations_; | ||
|
||
ScopeTreeVisitor visitor_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScopeTreeVisitor is used as both a base class (inheritance) and a member (containership). Design-wise, we probably want to use only one of these.
const auto tag = verilog_tokentype(leaf.Tag().tag); | ||
|
||
switch (tag) { | ||
case verilog_tokentype::SymbolIdentifier: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to register declarations coming from module ports? Also keep in mind there are two style of ports: ANSI, and non-ANSI style.
} else if (context.DirectParentsAre({NodeEnum::kUnqualifiedId, | ||
NodeEnum::kLocalRoot, | ||
NodeEnum::kReference, | ||
NodeEnum::kReferenceCallBase, | ||
|
||
NodeEnum::kLPValue, | ||
NodeEnum::kNetVariableAssignment, | ||
NodeEnum::kAssignmentList, | ||
NodeEnum::kContinuousAssignmentStatement}) || | ||
|
||
context.DirectParentsAre({NodeEnum::kUnqualifiedId, | ||
NodeEnum::kLocalRoot, | ||
NodeEnum::kReference, | ||
NodeEnum::kReferenceCallBase, | ||
|
||
NodeEnum::kExpression, | ||
NodeEnum::kOpenRangeList, | ||
NodeEnum::kBraceGroup, | ||
NodeEnum::kLPValue, | ||
|
||
NodeEnum::kNetVariableAssignment, | ||
NodeEnum::kAssignmentList, | ||
NodeEnum::kContinuousAssignmentStatement})) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above: can you track some state as you traverse the tree without having to query the context?
Suggest some comments here declaring intent (examples might help). It looks like (?) you are looking for lvalues, based on kLPValue
.
What about rvalues? From the other thread about test cases:
"The implicit net declaration shall belong to the scope in which the net reference appears."
I think net references covers both lvalues and rvalues. If that is the case, your logic for entering this condition may actually be simpler.
namespace analysis { | ||
|
||
// This tree visitor traverse CST tree and mantains scope stack | ||
class ScopeTreeVisitor : public verible::TreeContextVisitor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks sufficiently specialized that we might just call this class something like NetSymbolResolver
.
// TODO: nets declared inside terminal/port connection list | ||
// TODO: assignments/connections inside loop and conditional generate constructs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest we keep expanding on some of these test cases sooner, as handling these may actually simplify some of the tracking logic.
common/util/auto_pop_stack.h
Outdated
return *ABSL_DIE_IF_NULL(stack_.back()); | ||
} | ||
|
||
typedef std::vector<value_type*> stack_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typedef std::vector<value_type> stack_type;
By removing the pointer from this implementation, you make this a suitable base class for SyntaxTreeContext
.
class SyntaxTreeContext : public AutoPopStack<const SyntaxTreeNode*> {
...
public:
using AutoPop = base_type::AutoPop;
...
};
The only adjustment you might have to make is that in the SyntaxTreeContext subclass top()
(or renamed) should return a const T&
(or const SyntaxTreeNode*&
) to be agnostic to whether T is a pointer type or object type.
From a mutability perspective, SyntaxTreeContext
was designed to forbid any mutation to the stack elements, even the top element -- everything is const, and AutoPop
is the only interface for mutation. But I'm aware you intend to use it in this PR with a mutable top/current scope, because you have to register declared symbols. So as far as class design is concerned, the common base class may support mutable elements, but the subclasses may choose to further restrict the interfaces for what is mutable (where private inheritance can help).
Let's say your scope-stack subclass wants to ensure that only the top-most scope can be modified, but all others can be looked-up with const accesses. You could lock down most of your interface by exposing only const iterators/methods (hiding the mutable ones), except for T& top()
as the only mutable entry point. The base class could provide both const and non-const top(), and the subclass could present only the const one if it chose to.
Keep this in mind should you continue to refactor this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did write this class in mind that it will be used in SyntaxTreeContext
. Thanks for noticing that. I think that it will be easier to work on this class in separate PR #289
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the test cases, the only problem I see is that module in module do not have visibility on their parent scope,
{"module m;\n" | ||
" wire a1;\n" | ||
" module foo;\n" | ||
" assign a1 = 1'b0;\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable a1 is not visible in module foo, linter should fail here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to LRM 23.4 Nested modules
:
A module can be declared within another module. The outer name space is visible
to the inner module so that any name declared there can be used
example (modelsim):
module m;
wire a1;
assign a1 = 1'b0;
module foo;
assign a1 = 1'b1;
endmodule;
initial $monitor("a1 = %b", a1);
endmodule
output is an invalid 'X' value:
# a1 = x
So a1
is visible in foo
module and it's not an implicit net.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, you are right, sorry. SystemVerilog is so ugly sometimes. I was used to nested modules with ports, which needs to be instantiated manually and not implicitly like your example.
I didn't know they had access to the parent namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks both of you for your scrutiny over the test cases.
I also just learned this about implicit instantiation (LRM 23.4):
"Nested modules with no ports that are not explicitly instantiated shall be implicitly instantiated once with an instance name identical to the module name. Otherwise, if they have ports and are not explicitly instantiated, they are ignored."
" assign a1 = 1'b0;\n" | ||
" endmodule\n" | ||
" assign a1 = 1'b1;\n" | ||
"endmodule"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last one is the same as the one I commented earlier. One of the two should be removed.
" assign ", {kToken, "x3"}, " = 1'b0;\n" | ||
" assign ", {kToken, "x2"}, " = 1'b0;\n" | ||
" assign x1 = 1'b1;\n" | ||
"endmodule"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here, the module declared in module should not have access to the parent variables
A failing case of the current implementation is hierarchical references:
A test case that I believe should be added (and passes) is using assign/force in procedural assignment:
|
What happened to this pull request ? |
I would actually recommend re-implementing this entirely using the current symbol table's data, which does most of the heavy lifting of symbol resolution, but doesn't check for the ordering of definition/use (which could be done by comparing string_view positions). |
I agree with @fangism. I'll keep the testsuite and re-implement it using current scope/symbol resolution code. |
8d78003
to
33421a6
Compare
Signed-off-by: Lukasz Dalek <[email protected]>
Signed-off-by: Lukasz Dalek <[email protected]>
Signed-off-by: Lukasz Dalek <[email protected]>
Fixes #217