The test execution engine (TEE) will read the XML format generated by the CPN Tools test generator. This format should be as general as possible in order to support as many use cases as possible, but our main focus here is on distributed systems.
In the first incarnation of the tool, we will limit ourselves to testing
from a client's perspective. That is, we consider that the TEE simulates
one or more clients invoking RPC calls on a server or a group of servers
in the case of Gorums. For now, we don't use multiple client processes,
but simulate multiple clients by making RPC calls from different goroutines.
As such we provide the <Concurrent>
tag to allow the TEE runtime to start
a goroutine for each <Call>
tag specified within a <Concurrent>
tag.
Further, if one or more <Call>
tags are specified outside a <Concurrent>
tag, the TEE runtime invokes these calls sequentially. In addition to this,
we can also specify a <Sequential>
tag to accomplish the same. This is
mainly useful for constructing a sequence of calls concurrent with
another sequence of calls.
Here is an example XML (an alternative example will be given below):
<Test Name="CrashFailurePaxos" Type="systemtest">
<TypeAssignments>
<SystemParameterTypes>
<Field Name="SystemSize" Type="int"></Field>
<Field Name="QuorumSize" Type="int"></Field>
<Field Name="ServerIDs" Type="*[]int"></Field>
</SystemParameterTypes>
<InputType Name="Server" Type="">
<Field Name="IDs" Type="*[]string">
</InputType>
<OracleType Name="LegalResponses" Type="*[]string"></OracleType>
<OracleType Name="ExpectLeaders" Type="*[]string"></OracleType>
</TypeAssignments>
<SystemParameters>
</SystemParameters>
<TestCase ID="1">
<Concurrent>
<Call Name="Failure">
<InputValues>
<Server IDs="8080,8081"/>
</InputValues>
</Call>
<Call Name="Prepare">
<InputValues>
<PrepareMsg Rnd="42"/>
</InputValues>
</Call>
<Call Name="Prepare" Fail="0.5">
<InputValues>
<PrepareMsg Rnd="43"/>
</InputValues>
</Call>
</Concurrent>
<Call Name="Accept">
<InputValues>
<AcceptMsg Rnd="42" Val="paxos"/>
</InputValues>
</Call>
<Call Name="Commit">
<InputValues>
<LearnMsg Rnd="42" Val="paxos"/>
</InputValues>
</Call>
<Oracles>
<ExpectedLeaders>8082</ExpectedLeaders>
<LegalResponses>paxos</LegalResponses>
</Oracles>
</TestCase>
</Test>
Types used in the <InputValues>
and <Oracles>
must either be defined in the
<TypeAssignment>
part, or there must already exist a corresponding Go type
similarly named, so that the tool can find and match against those types.
For this to work, the Go type struct must be embroded with the xml:"..."
tag,
as shown in this example for the <AcceptMsg>
type:
type AcceptMsg struct{
Rnd uint32 `xml:"Rnd"`
Val string `xml:"Rnd"`
}
The <InputValues>
of the different <Call>
tags should match up with the
expected input arguments to those calls. To help ensure that these fields
match up we require that every method name mentioned in a <Call>
is defined
in a user-defined TestAdapter
interface, of which an example is shown below.
// TestAdapter specified the methods that can be invoked by the
// Test Execution Engine to drive the execution of a test.
type TestAdapter interface {
Prepare(PrepareMsg) PromiseMsg
Accept(AcceptMsg) LearnMsg
Commit(LearnMsg)
Failure(string)
}
The user must define both this interface, and provide an implementation of each of its methods. Often these implementations are simple forwarding methods, but can also provide custom functionality, e.g. one can imagine different types of failure injection methods that provide different types of behaviors that cannot easily be simulated by other means.
Moreover, each of the <InputValues>
used in the XML format must also
correspond to the different types used in those methods. Typically, these
input types are defined as message types in the proto file and generated
by the protoc
compiler, and so we do not need to specify these details in
the <TypeAssignment>
part. In addition to the proto message types, input values may
assume any of the datatypes supported by default in the Go language.
However, the exact data type representation of the proto messages must also be know to the CPN Tools test generator. To assist with this, we could implement a translation function that converts proto message types and Go types to a format that can be used by CPN Tools.
Note that the <Call>
tag can take a Fail
attribute, which allows to
specify that a given call can fail with some probabilty given by the
Fail
attribute.
We also specify several <Oracles>
, such as the <LegalResponses>
and
the <ExpectedLeaders>
which will be tested against after the execution.
To allow the TEE to obtain the results of an execution, the user must
also define the following interface to be called after an execution.
// TestOracles specified the methods that can be invoked by the
// Test Execution Engine to obtain the expected results of a test.
type TestOracles interface {
ExpectedLeaders() string
LegalResponses() string
}
The above example is perhaps too fine grained for some test cases.
This example simulates the existence of two proposers that think they
are both leaders, and propose different values. However, one of the
proposers fails in the second phase, and so the legal response is
lamport
.
<Test Name="CrashFailurePaxos" Type="systemtest">
<TypeAssignments>
<SystemParameterTypes>
<Field Name="SystemSize" Type="int"></Field>
<Field Name="QuorumSize" Type="int"></Field>
<Field Name="ServerIDs" Type="*[]int"></Field>
</SystemParameterTypes>
<InputType Name="Server" Type="">
<Field Name="IDs" Type="*[]string">
</InputType>
<OracleType Name="LegalResponses" Type="*[]string"></OracleType>
<OracleType Name="ExpectLeaders" Type="*[]string"></OracleType>
</TypeAssignments>
<SystemParameters>
</SystemParameters>
<TestCase ID="1">
<Concurrent>
<Call Name="Failure">
<InputValues>
<Server IDs="8080,8081"/>
</InputValues>
</Call>
<Call Name="RunPaxosPhases">
<InputValues>
<Msg string="paxos"/>
<FailurePhase string="2"/>
</InputValues>
</Call>
<Call Name="RunPaxosPhases">
<InputValues>
<Msg string="lamport"/>
</InputValues>
</Call>
</Concurrent>
<Oracles>
<ExpectedLeaders>8082</ExpectedLeaders>
<LegalResponses>lamport</LegalResponses>
</Oracles>
</TestCase>
</Test>
We should note that the implmentation of the RunPaxosPhases
method
will be a custom implementation of a Paxos proposer, similar to the
one currently implemented in proposer.go#runPaxosPhases
where we
can trigger the failure of individual Paxos phases. In this example
we implement the RunPaxosPhases
method in such a way that we can
trigger the failure of individual phases based on the input provided
in the test case. Below is an excerpt of the code needed. This code
is more or less replicating what is needed by the proposer's code,
except that we augment the code with checks if one of the Paxos
phases should fail, as per the input provided by the test case.
func (ta *PaxosTestAdapter) RunPaxosPhases(msg string, failurePhase string) error {
p := ta.GetProposer()
// access proposer state in mutual exclusion for use below; avoid holding lock during quorum calls
p.m.RLock()
crnd, cval := p.crnd, p.cval
p.m.RUnlock()
// ******************************************************
// PHASE ONE: send Prepare to obtain quorum of Promises
preMsg := &PrepareMsg{Rnd: crnd}
p.logf("Sending Phase 1a msg: %v\n", preMsg)
prmMsg, err := p.config.Prepare(ctx, preMsg)
err = checkFailurePhase(err, "1", failurePhase)
if err != nil {
return err
}
p.logf("Received Phase 1b msg: %v\n", prmMsg)
// ******************************************************
// PHASE TWO: send Accept to obtain quorum of Learns
if prmMsg.GetVrnd() != Ignore {
// promise msg has a locked-in value
cval = prmMsg.GetVval()
// update proposer state in mutual exclusion
p.m.Lock()
p.cval = cval
p.m.Unlock()
}
// use local proposer's cval or locked-in value from promise msg, if any.
accMsg := &AcceptMsg{Rnd: crnd, Val: cval}
p.logf("Sending Phase 2a msg: %v\n", accMsg)
lrnMsg, err := p.config.Accept(ctx, accMsg)
err = checkFailurePhase(err, "2", failurePhase)
if err != nil {
return err
}
p.logf("Received Phase 2b msg: %v\n", lrnMsg)
// ******************************************************
// PHASE THREE: send Commit to obtain a quorum of Acks
p.logf("Sending Phase 3a msg: %v\n", lrnMsg)
ackMsg, err := p.config.Commit(ctx, lrnMsg)
err = checkFailurePhase(err, "3", failurePhase)
if err != nil {
return err
}
p.logf("Received Phase 3b msg: %v\n", ackMsg)
return nil
}
func checkFailurePhase(err error, phase, failurePhase string) error {
if err != nil {
// if there was an actual error; always return that first
return err
}
if phase == failurePhase {
return fmt.Errorf("Paxos test adapter failed phase %d", phase)
}
return nil
}
When running the TEE tool we first parse the XML into instances of the
relevant datatypes that can then be used to drive the test execution.
In the main test function, will be similar to the TestSystem
function
in system_tests.go
, but will be much simpler; it should do the following:
for _, id := range serverIDs {
go ServerStart(id, addrs, quorumSize)
}
Side note: This initializing of the system should ideally be specified and derived from the XML file, but suggest to postpone that for now.
We should also create an instance of the TestAdapter
.
The test adapter is implemented by the user and will typically take care
of establishing gRPC connections necessary for the different <Call>
methods used. An example of the setup needed is in the newPaxosConfig
function in paxos_gorums_helper.go
, but it will store state to be
accessible to the test adapter.
The TestAdapter
implementation will thus serve as a state object for the TEE,
and so when invoking the different methods on it, those method implementations
will have access to the necessary server references and quorum call references
to be able to invoke the gRPC and Quorum Calls on those servers and configurations.
When the main test function has access to the test adapter, it is trivial (maybe not) to process the different parts of the test execution. Basically, for each test case, invoke each specified call with the specified parameters obtain by parsing the XML into the relevant datatypes (see the Q&A below).
Here is pseudocode for a sequential execution:
for each testcase t {
for each call c in t {
m = get method name of c
inputValues = get input values of c
result = invoke m using reflection with inputValues
// not sure if we should check each individual invocation??
if result != c.expected {
fail
}
}
// here we check if the test passed/failed.
}
-
Q: Can we use XML tags for proto defined message types, such as
PrepareMsg
andAcceptMsg
and be able to parse the XML file into instances of those datatypes directly to avoid doubly defined datatypes?A: Yes. One approach is to ensure that we have access to the datatypes produced by the
protoc
compiler in the same folder from which the TEE tool is running, i.e. those in the.pb.go
file. However, to allow XML parsing into those datatypes, we need to add the necessary Go XML tags to the relevant Go file. However, this Go file is generated, and so we shouldn't change it manually. Instead we can use an extension, which is currently not supported by thegolang/protobuf
package, but is supported bygogo/protobuf
and itsmoretags
extension. Since Gorums is still usinggogo/protobuf
we can easily leveragemoretags
to support XML-based message datatypes. In the future, hopefully they will implementgo_tag
ingolang/protobuf
. See Proposal for addinggo_tag
.Another option to consider is to replace the XML format with JSON, which is already supported by
golang/protobuf
.If we stick with XML, the proto file needs to be modified as follows:
import "github.com/gogo/protobuf/gogoproto/gogo.proto"; message PrepareMsg { uint32 rnd = 1; [(gogoproto.moretags) = "xml:\"Rnd\""]; }
-
Q: How to determine if a test passed/failed? Where do we do the check? See the pseudocode above.
-
Q: If test pass/fail is done after an the execution of a sequence of RPC calls, how does the oracle learn the result of the execution?
A: We could add a
Query()
method to theTestAdapter
that must be implemented by the user, which should compute the final result of an execution based on the state of the system.I think I found a reasonable solution by requiring that a
TestOracles
interface be implemented. See above.
In a future version of the tool could start separate processes on the same machine or on different machines, using e.g. ssh. But this is not a priority now. We also limit ourselves to gRPC-based frameworks for now, although we aim to make the tool flexible enough to support other distributed computing technologies.