Skip to content

Generation modifiers

SimGus edited this page Jun 3, 2020 · 14 revisions

Both unit declarations and sub-rules can take generation modifiers, which make the generator behave in a different way when generating a string from them. Those modifiers are usually denoted using a special character that is prepended or appended to the unit identifier or the sub-rule.

Some modifiers can only be used with unit declarations, other only with specific types of sub-rules and some with both. In the next sections, we will describe all the modifiers that exist and what they are used for.

Modifiers for sub-rules only

We describe here modifiers that are usable with sub-rules only. To be exact, those modifiers can be used with any type of sub-rule except simple words (which cannot take any modifier). The sub-rules which can take modifiers are thus:

  • unit references
  • choices all of which are made of some string surrounded by brackets.

We will make an exhaustive list of the modifiers that are applicable with those sub-rules.

Random generation

This modifier tells the generator that the particular sub-rule is allowed to not generate anything. In other words, when generating a string for a sub-rule that has this modifier, the generator will choose randomly to take into account the string generated by it or to consider it didn't generate anything. To be exact, there is a 50% chance that the sub-rule is generated and 50% chance it is not.

This modifier is noted with a question mark ? that is appended to the contents of the sub-rule (within the brackets).

For example, the choice [please?] will generate either please or ; the choice [hello|hi?] will generate hello, hi or . The rule Yes [please?] can will generate 50% of the time Yes and the rest of the time Yes please.

Named random generation

If several sub-rules should be randomly generated, but only each of them at the same time, you will need to use named random generation modifiers. Naming a random generation modifier is done by simply appending the name of the random generation to the question mark. Giving the same name to the random generation modifier of several sub-rules will make them all generate 50% of the time and each not generate the rest of the time, but never will one sub-rule generate and another one with the same modifier name not generate.

For example, the rule I don't like [world?WWII] war[ II?WWII]. generates I don't like war. 50% of the time and I don't like world war II. the rest of the time. The name of the two random generation modifiers here is WWII. If those modifiers weren't given names (or if their names were different), the sentences I don't like world war. and I don't like war II. could have been generated, which is not what the template intended.

The constraints on which characters can be used within a modifier name are the same than for a unit identifier.

Opposite named random modifiers

Starting with v1.6.1, adding an exclamation point ! before the name of a random modifier will turn it into an opposite named random modifier. Items with that modifier simply will generate when items with the corresponding "non-opposite" modifier don't, and will not generate when those items do.

For example, the rule I [do?name] [don't?!name] like this [at all?!name] will generate either I do like this or I don't like this at all. (Note that it will never generate I like this, if you want this behavior, you will need to use choices in the rule.)

Of course, if the modifier is unnamed (e.g. [test?!]), this will behave exactly as a random modifier (e.g. [test?]).

Custom probability of random generation

A last thing that can be customized with those modifiers is the probability that a sub-rule is generated (instead of the default 50/50 chances). This is customized by appending a slash / after the random generation modifier (and after its name if it has one), and a number which represents the probability (in percents) that the sub-rules gets generated. For example, [test?/80] will generate test 80% of the time and nothing 20% of the time. It is allowed to add the percent sign % after the number for legibility reasons (hence, [test?/80] and [test?/80%] are considered to be the same thing).

If you customize this for named random generation modifiers, only the probability for the very first sub-rule with that named modifier in the rule will be taken into account. I don't like [world?WWII/70] war[ II?WWII/90]. will generate I don't like world war II. 70% of the time and I don't like war. the rest of the time.

Modifiers for unit declarations only

In this section, we describe modifiers that are applicable to unit declarations. We use special characters prepended or appended to the unit declaration identifier in order to make the generation of the current unit different from what it would be by default. More precisely, the modifiers change the generation behavior of a unit when it is asked to generate something, that is, when it is referenced (in a rule) or when the generator handles an intent.

We will make an exhaustive list of the modifiers that can be used with unit declarations only.

Variation name

It is sometimes useful to have different units that represent the same concept but with slight differences, and yet still be able to reference the whole concept or one of the slightly different ones. For example, it can be useful to have a singular and a plural variation of an alias, and reference the singular or plural variation where relevant, but there are still times you would not care whether the reference should refer to the singular or plural version.

To make 2 variations of the same unit, you would make 2 different unit declarations with the same name, and append a hashtag # and the respective name of the variations to the unit names. The same naming rules apply to variation names than that of unit names: special characters should be escaped if you want to use them.

For instance, you could define the following aliases:

~[phone#singular]
   [tele?]phone
~[phone#plural]
   [tele?]phones

In other rules, you can refer to the first one using ~[phone#singular] (which will generate phone or telephone) and to the second one using ~[phone#plural] (which will generate phones or telephones). You can also use ~[phone] inside rules; this will generate phone, phones, telephone or telephones at random.

If you declare intents with variations, the generator will handle each of them as a normal intent declaration, but examples generated from those declarations will have intent "intent name" only in the final output.

Argument

It is sometimes very useful to be able to leave some parts of a template available to be filled later. Specifically, we sometimes need to be able to reference a certain unit, but have a certain string included into the string generated by this reference.

This modifier is denoted using a dollar sign $ which is put at the end of the unit declaration initiator (right before the closing square bracket), followed by the argument's name. You can then put the same string (i.e. a dollar sign $ followed by the argument's name) in some or all of the rules of this unit declaration.

When this unit is referenced, using a dollar sign $ followed by the value this argument should take will put this value in place of the argument in the rule.

For example, if the following intent is declared:

~[greetings$NAME]
   Hi $NAME
   Hello $NAME!

using the reference ~[greetings$John] in another rule will generate Hi John or Hello John!, while using the reference ~[greetings$Elvis] will generate Hi Elvis or Hello Elvis!. Note that using simply the reference ~[greetings] will generate Hi $NAME or Hello $NAME! without warnings.

The naming rules that apply for the argument's names are the same than those that apply to units identifiers.

Modifiers for both

We will describe here all the modifiers that can be used both on a unit declaration and on a sub-rules (except for simple words). Thus in the first case, those modifiers can change the generation behavior of a reference referring to a unit which has one of those modifiers or change the generation behavior of the generator that is dealing with an intent which has one of those modifiers, and in the latter case, they can change the generation behavior of a sub-rule that has one of those modifiers.

We will now make an exhaustive list of those modifiers

Case generation

This modifier allows us to tell the generator that a certain sub-rule or a unit can be generated with a leading lower case letter or a leading upper case letter, and this chosen at random. If this modifier is used, the case of the leading letter of the sub-rule or unit declaration is not important.

This modifier is denoted by putting an ampersand & right after the opening bracket of the sub-rule or the unit declaration initiator.

For example, the word group [&hello] will generate hello 50% of the time and Hello the rest of the time. If we define the following slot:

@[&doctor]
   doctor
   Dr.

using the reference @[doctor] will generate doctor 25% of the time, Doctor 25% of the time, dr. 25% of the time and Dr. 25% of the time (because of the ampersand in the first line of the declaration). Note that if we didn't have this modifier for this slot (hence, if the ampersand wasn't there), we could have the same strings generated using the reference @[&doctor], while @[doctor] would have generated doctor 50% of the time and Dr. the rest of the time.


Now that you know how templates are made and used, you need to know where to put them. See files organization for more information.

You can also find two illustrative examples here, and learn about the command line interface of the program here.