- Problem
- Background
- Proposal
- Rationale based on Carbon's goals
- Caveats
- Alternatives considered
The var
statement is noted in the
language overview,
but is provisional — no justification has been provided. Variable declarations
are fundamental, and it should be clear to what degree the current syntax is
adopted.
It's expected that after the adoption of this proposal, var
syntax will still
not be finalized: the proposal is an experiment.
Although constants are naturally related to variables, this proposal does not include any syntax for constants. This is expected to be revisited later.
In this proposal, "variable" is defined as an identifier referring to a mutable value.
Questions have come up about:
- The type system
- Type checking
- Scoping
All of these are important features. However, in the interest of small proposals, they are out of scope of this proposal.
Variables are standard in many languages. Some various forms to consider are:
-
C++:
int x; int y = 0; bool a = true, *b = nullptr, c;
-
Python:
x = None y = 0 z: int = 7 # Added by PEP 526.
-
Swift:
var x = 0 var y: Int = 0 var z: Int
-
TypeScript
let y: Number = 0; var x = 0; # Legacy from JavaScript.
-
Rust
let mut x = 0; let mut y: i32 = 0; let mut z: i32;
-
Go
var x = 0 y := 0 var z int var a, b = true, false
-
Visual Basic
Dim x As Integer = 3
Carbon should adopt var <type> <identifier> [ = <value> ];
syntax for variable
statements.
Considerations for this syntax are:
- Type and identifier ordering: The ordering of
<type>
before<identifier>
reflects the typical C++ ordering syntax.- Some C++ syntax can put type information after the identifier, such as
int x[6];
. Carbon should be expected to place that as part of the type.
- Some C++ syntax can put type information after the identifier, such as
var
introducer keyword: The use ofvar
makes it clearer for readers to skim and see where variables are being declared. It also reduces complexity and potential ambiguity in language parsing.- One variable: In C++, multiple variables can be declared in a single
statement. An equivalent Carbon syntax may end up looking like
var (Int x, String y) = (0, "foo");
, so limiting to one declaration is not fundamentally restrictive. However, by breaking with C++ and requiring the full type to be specified with each identifier, we achieve two important things:- It's clear what the full type is, preventing difficult-to-read statements with a mix of stack variables, pointers, and similar.
- As the language grows, a function returning a tuple may be assigned to distinctly named and typed variables.
Experiment: The ordering of type and identifier will be researched. For more information, see the alternative.
Example bison syntax for executable semantics is:
statement:
"var" expression identifier optional_assignment;
| /* preexisting statements elided */
;
optional_assignment:
/* empty */
| "=" expression
;
Carbon needs variables in order to be writable by developers. That functionality needs a syntax.
Relevant goals are:
-
3. Code that is easy to read, understand, and write:
- Adding a keyword makes it easy for developers to visually identify functions.
-
5. Fast and scalable development:
- The addition of a keyword should make parsing easy.
-
7. Interoperability with and migration from existing C++ code:
- Keeping syntax close to C++ will make it easier for developers to transition.
The name var
could still change. However, it's used with similar meaning in
other languages including Swift, Go, and TypeScript, and so it's reasonable to
expect it will not.
The idea that var
may change includes the possibility that var
may become
something like let mut
in Rust. However, this is not assumed by this proposal:
- This proposal omits constant syntax.
- It would need to be considered whether the syntax tax of
let mut
appropriately focuses on encouraging appropriate usage of features rather than restricting misuse). - Lower verbosity syntax for variables is more
consistent with C++, even if constants are made
less verbose by way of
let
.
Although var (Int x, String y) = (0, "foo");
syntax is mentioned, this
proposal is not intended to propose such a syntax. It's noted primarily to
explain the likely path, that this does not rule out abbreviated syntax such
as that. That should probably be covered as part of tuples.
Pattern matching syntax in the overview uses syntax
similar to Int: x
. As part of removing the
colon between type and identifier from the
provisional var
syntax, that syntax should be changed to remove the :
.
Details should be resolved as part of the eventual pattern matching proposal,
but if changes are needed to add a separator, the var
syntax should be updated
to remain consistent. The precise form of that implementation will be part of
normal Carbon evolution.
For example, replacing fn Sum(Int: a, Int: b) -> Int;
with
fn Sum(Int a, Int b) -> Int;
and
case (Int: p, (Float: x, Float: _)) if (p < 13) => {
with
case (Int p, (Float x, Float _)) if (p < 13) => {
.
Variables using Type:$
and similar should drop the :
, as in Type$
.
Noted alternatives are key differences from C++ variable declaration syntax.
The intent of the var
statement is to improve readability and parsability, and
it's related to fn
for functions. Although code is more succinct without
introducers, the noted benefits are expected to be significant. Most other
modern languages use similar introducers, and so this break from C++ is adopting
a different norm.
var
is used with a similar meaning in several other languages, including
Swift, Go, and JavaScript. let
is used by TypeScript. let mut
is used by
Rust, with let
used for constants (this use of let
alone is consistent with
other languages). In general var
appears to be a more common choice.
The use of a colon (:
) between the type and identifier is intended to reduce
potential parsing ambiguity, and to make reading code easier. As proposed, there
is no colon between the type and identifier.
Using a colon or other separator could make it easier to avoid certain kinds of
ambiguities. For example, suppose we decided to use a postfix *
operator to
form pointer types, as in C++. In such a setup, we could have code like the
following:
var T * x = 3;
var T * x = 3 y;
In the first statement, *
is a unary operator and so T*
is the type and x
is the identifier. However, in the second statement, *
is a binary operator
and so T * x = 3
is the type, and y
is the identifier; the resulting
compiler errors may be confusing to users. Furthermore, the place of 3
could
be taken by an an arbitrarily complex expression; this could cause resolving the
ambiguity between unary and binary *
to require unbounded look-ahead,
adversely impacting
code compilation time goals.
Consider instead the code:
var T *: x = 3;
var T * x = 3: y;
The colon makes it unambiguous whether the *
in each case is unary or binary
with only one token of look-ahead. More importantly, this syntax immediately
calls the reader's attention to the fact that the second declaration has a
highly unusual type.
There are other ways of resolving ambiguities like this. For example, we could
avoid allowing the same operator to have both postfix and infix forms, or we
could distinguish them by the presence or absence of whitespace. However, even
if we avoid formal ambiguity by such means, a separator like :
may be useful
for reducing visual ambiguity for human readers.
One of the disadvantages of :
is that with var Int: x
, ordering is
inconsistent with other languages using :
, such as Rust and Swift, which would
say var x: Int
.
It may be worth considering other syntax options. A few to consider are:
var(Int) x;
var Int# x;
var Int @x;
var Int -> x;
These aren't part of the proposed syntax mainly because it's not clear any would
gain as much support as :
. However, this is an opportunity to make suggestions
and see if there's a good compromise.
The
old draft pattern matching proposal
used :
as a separator. In pattern matching, the :
may be particularly
important to distinguish between value matching and type name matching. However,
the pattern matching proposal should examine these choices and alternatives
before we reach a conclusion that :
is necessary for pattern matching. Per
syntax ambiguity, it is expected that :
has some
advantages, but may not turn out to make a compiler difference due to prevailing
constraints on type expression syntax.
This proposal suggests we
update provisional pattern matching syntax
to match the proposed var
syntax.
Advantages:
- Reduces syntax ambiguity.
- This should improve readability and parsability.
- It should make it easier to debug issues during development.
Disadvantages:
- Deviates from the common syntax used by most languages with the type before
the identifier, including C, C++, Java, and C#.
- Changing from C++ is especially significant because of Carbon's goals for interoperability and migration which will mean an especially large portion of Carbon developers will be actively reading both Carbon and C++ code.
- Other notable languages that us
:
in variable statements, including Swift, put the type after the identifier.- It may be worth considering
alternatives to
:
. - If the alternative of type after identifier is
adopted, it's likely a
:
separator will be adopted.
- It may be worth considering
alternatives to
Right now the proposal is to not have anything between the type and identifier in order to avoid cross-language ambiguity, and to retain syntax that is closer to C++. However, the ultimate decision may hinge on type and identifier ordering, as well as related future evolution.
There are many languages that put the type after the identifier. A common format
used by Swift and Rust is var x: Int
.
It's worth considering the sentence-like readings:
var x Int
(orvar x: Int
) may be read as "declare x as an int" or "make a variable x and give it int storage".var Int x
may be read as "declare an int called x" or "make a variable with int storage called x".
These readings might be of similar quality, and are presented to offer different perspectives on how to read the possible statement orderings.
Ordering is essentially a question of pairing identifiers and types. This can be cast as asking which question developers consider more important when reading code:
- What is the type of variable
x
? - What is the identifier of the
Int
variable?
We assert the first question is the more important one: developers will see an identifier in later code, and want to know its type. However, how do we determine which order is better for this purpose?
Unfortunately, little research has been done on this. All we're aware of right
now is a study from an unpublished undergraduate project from Germany. The study
was done in Java with 50 students. Its data indicates that it's faster to answer
question 1 if the type comes first, and faster to answer question 2 if the
identifier comes first. We do not want to make decisions based on the study
because it isn't published, studied a small group, and doesn't directly compare
possible var
syntaxes; however, it still influences our thoughts.
When considering what to use for now, we can consider the popularity of various languages. The top 10 on several sources (with percentages noted by sources that have them) are:
TIOBE | Pct | GitHut | Pct | PYPL | Pct | Octoverse |
---|---|---|---|---|---|---|
C | 16% | JavaScript | 19% | Python | 30% | JavaScript |
Java | 11% | Python | 16% | Java | 17% | Python |
Python | 11% | Java | 11% | JavaScript | 8% | Java |
C++ | 7% | Go | 8% | C# | 7% | TypeScript |
C# | 4% | C++ | 7% | C/C++ | 7% | C# |
Visual Basic | 4% | Ruby | 7% | PHP | 6% | PHP |
JavaScript | 2% | TypeScript | 7% | R | 4% | C++ |
PHP | 2% | PHP | 6% | Objective-C | 4% | C |
SQL | 2% | C# | 4% | Swift | 2% | Shell |
Assembly | 2% | C | 3% | TypeScript | 2% | Ruby |
Sources:
For these languages:
- C, C++, C#, Objective-C, and Java put the identifier after the type.
- These use
<type> <identifier>
, with no keyword.
- These use
- Python, Go, TypeScript, Visual Basic, SQL, and Swift put the type after the
identifier.
- Python uses
<identifier>: <type>
, with no keyword. This was added in Python 3.6, and reflects language evolution. - Go uses
var <identifier> <type>
, with no colon. - TypeScript and Swift use
var <identifier>: <type>
. - Visual Basic uses
Dim <identifier> as <type>
. - SQL uses
DECLARE @<identifier> AS <type>
, whereAS
is optional.
- Python uses
- JavaScript, R, Ruby, PHP, Shell, and Assembly language do not specify types in the same ways as other languages.
First, with Carbon it's been discussed to require use of auto
(or a similar
explicit syntax marker) instead of allowing developers to elide the type
entirely from a var
statement. In other words, while var Int x = 0;
is
valid, and var auto x = 0;
is equivalent, there is no form such as
var x = 0;
which removes the type entirely.
Most languages that write var x: Int
also allow eliding the type when
assigning a value. For example, Go allows x := 0
and Swift allows
var x = 0;
. As a result, there is no need for an auto
keyword.
This would be more surprising with var Int x
syntax because removing the Int
now places the identifier immediately after var
, where the type normally is.
This may be subtly confusing to developers. However, if auto
is required for
explicitness, the issue is moot.
Retaining auto
does not eliminate the need to consider type elision as part of
advantages and disadvantages: if the type is put after the identifier and
var x: auto
syntax is used, it now becomes an inconsistency with other
languages. This inconsistency would be a Carbon innovation that may confuse
developers, leading to long-term pressure to remove auto
for consistency with
similar languages, and thus a disadvantage.
Advantages:
- Consistent with languages that put a
:
in the type declaration.- Notable languages using
x: Int
syntax include Swift, Rust, Kotlin and TypeScript. Go is similar but does not include:
. - Swift puts argument labels before variable declaration. Keeping the identifier first allows consistency with Swift's argument label syntax while also keeping names adjacent.
- Notable languages using
- More opportunity to unify a concept of putting the identifier immediately
after the keyword.
fn
,import
, andpackage
will all have the identifier immediately after the keyword, with non-identifier content following.- It's expected that a typical function declaration will look like
fn Foo(Int bar)
, withvar
only added when a storage for a copy is required. Thus, this advantage is primarily about the function identifierFoo
, not parameter identifiers.
- It's expected that a typical function declaration will look like
- Other cases, such as
alias
, are likely flexible and could followvar
in concept by putting the resulting name at the end. That is,alias To = From;
for consistency withvar x: Int;
versusalias From as To;
similar tovar Int x;
ordering.
Disadvantages:
- If Carbon doesn't support type elision, it creates an
inconsistency with other notable languages using
x: Int
syntax.- It may be worth considering
alternatives to
:
.
- It may be worth considering
alternatives to
- The ordering of
x: Int
is inconsistent with C++ variable syntax.- Negatively affects Familiarity for experienced C++ developers with a gentle learning curve.
- It is consistent with other parts of C++ syntax, particularly
using To = From;
, although nottypedef From To;
.
- Early signs are that putting the identifier first makes it slower for
developers to answer the question, "what is the type of variable
x
?"- This indicates worse readability in spite of the sentence ordering, the essential part of Code that is easy to read, understand, and write.
- Popular languages tend to use
int x
syntax, including C, Java, C++, and C#. Other notable languages include Groovy and Dart.
We should conduct a larger study on the topic of type and identifier ordering and syntax. Until then, we should adopt C++-like syntax. This meets the migration sub-goal
Familiarity for experienced C++ developers with a gentle learning curve, and allows applying the higher-priority goal Code that is easy to read, understand, and write if supporting evidence is found.
Experiment: The ordering of type and identifier will be researched.