[label] opcode dest, source, source ...
If the opcode returns a result, it is stored in the first argument. Sometimes the first register is both a source value and the destination of the operation. The arguments are either registers or constants, though only source arguments can be constants:
LABEL: print "The answer is: " print 42 print "\n" end # halt the interpreter
Integer constants are signed integers.
Integer constants can have a positive (+) or negative (-) sign in front. Binary integers are preceded by 0b or 0B, and hexadecimal integers are preceded by 0x or 0X:
print 42 # integer constant print -0b101 # binary integer constant with sign print 0Xdeadbeef # hex integer constant
Floating-point constants can also be positive or negative. Scientific notation provides an exponent, marked with e or E (the sign of the exponent is optional):
print 3.14159 # floating point constant print -1.23e+45 # in scientific notation
String constants are wrapped in single or double quotation marks. Quotation marks and other nonprintable characters inside the string have to be escaped by a backslash. The escape sequences for special characters are the same as for Perl 5's qq( ) operator.
print "string\n" # string constant with escaped newline print 'that\'s it' # escaped single quote print "\\" # a literal backslash
6.2.2 Working with Registers
Parrot is a register-based virtual machine. It has 4 typed register sets with 32 registers in each set. The types are integers, floating-point numbers, strings, and Parrot objects. Register names consist of a capital letter indicating the register set and the number of the register, between 0 and 31. For example:
I0 integer register #0 N11 number or floating point register #11 S2 string register #2 P31 PMC register #31
Integer and number registers hold values, while string and PMC registers contain pointers to allocated memory for a string header or a Parrot object.
The length of strings is limited only by your system's virtual memory and by the size of integers on the particular platform. Parrot can work with strings of different character types and encodings. It automatically converts string operands with mixed characteristics to Unicode.
Parrot Magic Cookies (PMCs) are Parrot's low-level objects. They can represent data of any arbitrary type. The operations (methods) for each PMC class are defined in a fixed vtable, which is a structure containing function pointers that implement each operation.
18.104.22.168 Register assignment
set I0, 42 # set integer register #0 to the integer value 42 set N3, 3.14159 # set number register #3 to the value of
PASM uses registers where a high-level language would use variables. The exchange opcode swaps the contents of two registers of the same type:
exchange I1, I0 # set register I1 to what I0 contains # and set register I0 to what I1 contains
As we mentioned before, string and PMC registers are slightly different because they hold a pointer instead of directly holding a value. Assigning one string register to another:
set S0, "Ford" set S1, S0 set S0, "Zaphod" print S1 # prints "Ford" end
doesn't make a copy of the string; it makes a copy of the pointer. Just after set S1, S0, both S0 and S1 point to the same string. But assigning a constant string to a string register allocates a new string. When "Zaphod" is assigned to S0, the pointer changes to point to the location of the new string, leaving the old string untouched. So strings act like simple values on the user level, even though they're implemented as pointers.
Unlike strings, assignment to a PMC doesn't automatically create a new object; it only calls the PMC's vtable method for assignment. So, rewriting the same example using a PMC has a completely different result:
new P0, .PerlString set P0, "Ford" set P1, P0 set P0, "Zaphod" print P1 # prints "Zaphod" end
The new opcode creates an instance of the .PerlString class. The class's vtable methods define how the PMC in P0 operates. The first set statement calls P0's vtable method set_string_native, which assigns the string "Ford" to the PMC. When P0 is assigned to P1:
set P1, P0
it copies the pointer, so P1 are P0 both aliases to the same PMC. Then, assigning the string "Zaphod" to P0 changes the underlying PMC, so printing P1 or P0 prints "Zaphod".
22.214.171.124 PMC object types
Internally, PMC types are represented by positive integers, and built-in types by negative integers. PASM provides two opcodes to deal with types. typeof returns the name corresponding to a numeric type or the type of a PMC. find_type takes a type name and returns the integer value that represents that type.
When the source argument is a PMC and the destination is a string register, typeof returns the name of the type:
new P0, .PerlString typeof S0, P0 # S0 is "PerlString" print S0 print "\n" end
In this example, typeof returns the type name "PerlString".
When the source argument is a PMC and the destination is an integer register, typeof returns the integer representation of the type:
new P0, .PerlString typeof I0, P0 # I0 is 17 print I0 print "\n" end
This example returns the integer representation of PerlString, which is 17.
When typeof's source argument is an integer, it returns the name of the type represented by that integer:
set I1, -100 typeof S0, I1 # S0 is "INTVAL" print S0 print "\n" end
The integer representation of a built-in integer value is -100, so it returns the type name "INTVAL".
The source argument to find_type is always a string containing a type name, and the destination register is always an integer. It returns the integer representation of the type with that name:
find_type I1, "PerlString" # I1 is 17 print I1 print "\n" find_type I2, "INTVAL" # I2 is -100 print I2 print "\n" end
Here, the name "PerlString" returns 17, and the name "INTVAL" returns -100.
All Parrot classes inherit from the class default, which has the type number 0. The default class provides some default functionality, but mainly throws exceptions when the default variant of a method is called (meaning the subclass didn't define the method). Type number 0 returns the type name "illegal", since no object should ever be created from the default class:
find_type I1, "fancy_super_long_double" # I1 is 0 print I1 print "\n" typeof S0, I1 # S0 is "illegal" print S0 print "\n" end
The type numbers are not fixed values. They change whenever a new class is added to Parrot or when the class hierarchy is altered. A header file containing an enumeration of PMC types (include/parrot/core_pmcs.h) is generated during the configuration of the Parrot source tree. The PMC types take their numbers from their order in this file. Internal data types and their names are specified in include/parrot/datatypes.h.
$ perl classes/pmc2c.pl --tree classes/*.pmc
which produces output like:
Array Boolean PerlInt perlscalar scalar Compiler NCI ...
The output traces the class hierarchy for each class: Boolean inherits from PerlInt, which is derived from the abstract perlscalar and scalar classes (abstract classes are listed in lowercase). The actual classnames and their hierarchy may have changed by the time you read this.
126.96.36.199 Type morphing
The classes PerlUndef, PerlInt, PerlNum, and PerlString implement Perl's polymorphic scalar behavior. Assigning a string to a number PMC morphs it into a string PMC. Assigning an integer value morphs it to a PerlInt, and assigning undef morphs it to PerlUndef:
new P0, .PerlString set P0, "Ford\n" print P0 # prints "Ford\n" set P0, 42 print P0 # prints 42 print "\n" typeof S0, P0 print S0 # prints "PerlInt" print "\n" end
6.2.3 Math Operations
PASM has a full set of math instructions. These work with integers, floating-point numbers, and PMCs that implement the vtable methods of a numeric object. Most of the major math opcodes have two- and three-argument forms:
add I0, I1 # I0 += I1 add I10, I11, I2 # I10 = I11 + I2
The three-argument form of add adds the last two numbers and stores the result in the first register. The two-argument form adds the first register to the second and stores the result back in the first register.
The source arguments can be Parrot registers or constants, but they must be compatible with the type of the destination register. Generally, "compatible" means that the source and destination have to be the same type, but there are a few exceptions:
sub I0, I1, 2 # I0 = I1 - 2 sub N0, N1, 1.5 # N0 = N1 - 1.5
If the destination register is an integer register, like I0, the other arguments must be integer registers or integer constants. A floating-point destination, like N0, usually requires floating-point arguments, but many math opcodes also allow the final argument to be an integer. A PMC destination can have an integer or floating-point argument as the last one:
mul P0, P1 # P0 *= P1 mul P0, I1 mul P0, N1 mul P0, P1, P2 # P0 = P1 * P2 mul P0, P1, I2 mul P0, P1, N2
Operations on a PMC are implemented by the vtable method of the destination (in the two-argument form) or the left source argument (in the three argument form). The result of an operation is entirely determined by the PMC. A class implementing imaginary number operations might return an imaginary number, for example.
We won't list every math opcode here, but we'll list some of the most common ones. You can get a complete list in Section 6.9 later in this chapter.
188.8.131.52 Unary math opcodes
The unary opcodes have a single source argument and a single destination argument. Some of the most common unary math opcodes are inc (increment), dec (decrement), abs (absolute value), neg (negate), and fact (factorial):
abs N0, -5.0 # the absolute value of -5.0 is 5.0 fact I1, 5 # the factorial of 5 is 120 inc I1 # 120 incremented by 1 is 121
184.108.40.206 Binary math opcodes
Binary opcodes have two source arguments and a destination argument. As we mentioned before, most binary math opcodes have a two-argument form in which the first argument is both a source and the destination. Parrot provides add (addition), sub (subtraction), mul (multiplication), div (division), and pow (exponent) opcodes, as well as two different modulus operations. mod is Parrot's implementation of modulus, and cmod is the % operator from the C library. It also provides gcd (greatest common divisor) and lcm (least common multiple).
div I0, 12, 5 # I0 = 12 / 5 mod I0, 12, 5 # I0 = 12 % 5
220.127.116.11 Floating-point operations
Although most of the math operations work with both floating-point numbers and integers, a few require floating-point destination registers. Among these are ln (natural log), log2 (log base 2), log10 (log base 10), and exp (ex), as well as a full set of trigonometric opcodes such as sin (sine), cos (cosine), tan (tangent), sec (secant), cosh (hyperbolic cosine), tanh (hyperbolic tangent), sech (hyperbolic secant), asin (arc sine), acos (arc cosine), atan (arc tangent), asec (arc secant), exsec (exsecant), hav (haversine), and vers (versine). All angle arguments for the trigonometric functions are in radians:
sin N1, N0 exp N1, 2
The majority of the floating-point operations have a single source argument and a single destination argument. Even though the destination must be a floating-point register, the source can be either an integer or floating-point number.
atan N0, 1, 1
6.2.4 Working with Strings
At the moment, operations on string registers generate new strings in the destination register. There are plans for an optimized set of string functions that modify an existing string in place. These might be implemented by the time you read this.
String operations on PMC registers require all their string arguments to be PMCs.
18.104.22.168 Concatenating strings
Use the concat opcode to concatenate strings. With string register or string constant arguments, concat has both a two-argument and a three-argument form. The first argument is a source and a destination in the two-argument form:
set S0, "ab" concat S0, "cd" # S0 has "cd" appended print S0 # prints "abcd" print "\n" concat S1, S0, "xy" # S1 is the string S0 with "xy" appended print S1 # prints "abcdxy" print "\n" end
The first concat concatenates the string "cd" onto the string "ab" in S0. It generates a new string "abcd" and changes S0 to point to the new string. The second concat concatenates "xy" onto the string "abcd" in S0 and stores the new string in S1.
new P0, .PerlString new P1, .PerlString new P2, .PerlString set P0, "ab" set P1, "cd" concat P2, P0, P1 print P2 # prints abcd print "\n" end
Here, concat concatenates the strings in P0 and P1 and stores the result in P2.
22.214.171.124 Repeating strings
set S0, "x" repeat S1, S0, 5 # S1 = S0 x 5 print S1 # prints "xxxxx" print "\n" end
In this example, repeat generates a new string with "x" repeated five times and stores a pointer to it in S1.
126.96.36.199 Length of a string
set S0, "abcd" length I0, S0 # the length is 4 print I0 print "\n" end
Currently, length doesn't have an equivalent for PMC strings, but it probably will be implemented in the future.
The simplest version of the substr opcode takes four arguments: a destination register, a string, an offset position, and a length. It returns a substring of the original string, starting from the offset position (0 is the first character) and spanning the length:
substr S0, "abcde", 1, 2 # S0 is "bc"
This example extracts a string from "abcde" at a one-character offset from the beginning of the string (the second character) and spanning two characters. It generates a new string, "bc", in the destination register S0.
When the offset position is negative, it counts backward from the end of the string. So an offset of -1 starts at the last character of the string.
substr also has a five-argument form, where the fifth argument is a string to replace the substring. This modifies the second argument and returns the removed substring in the destination register.
set S1, "abcde" substr S0, S1, 1, 2, "XYZ" print S0 # prints "bc" print "\n" print S1 # prints "aXYZde" print "\n" end
This replaces the substring "bc" in S1 with the string "XYZ", and returns "bc" in S0.
When the offset position in a replacing substr is one character beyond the original string length, substr appends the replacement string just like the concat opcode.
When you don't need the replaced string, there's an optimized version of substr that just does a replace without returning the removed substring.
set S1, "abcde" substr S1, 1, 2, "XYZ" print S1 # prints "aXYZde" print "\n" end
The PMC versions of substr are not yet implemented.
188.8.131.52 Chopping strings
set S0, "abcde" chopn S0, 2 print S0 # prints "abc" print "\n" end
removes two characters from the end of S0. If the count is negative, that many characters are kept in the string:
set S0, "abcde" chopn S0, -2 print S0 # prints "ab" print "\n" end
This keeps the first two characters in S0 and removes the rest. chopn also has a three-argument version that stores the chopped string in a separate destination register, leaving the original string untouched:
set S0, "abcde" chopn S1, S0, 1 print S1 # prints "abcd" print "\n" end
184.108.40.206 Copying strings
new P0, .PerlString set P0, "Ford" clone P1, P0 set P0, "Zaphod" print P1 # prints "Ford" end
This example creates an identical, independent clone of the PMC in P0 and puts a pointer to it in P1. Later changes to P0 have no effect on P1.
With simple strings, the copy created by clone, as well as the results from substr, are copy-on-write (COW). These are rather cheap in terms of memory usage because the new memory location is only created when the copy is assigned a new value. Cloning is rarely needed with ordinary string registers since they always create a new memory location on assignment.
220.127.116.11 Converting characters
chr S0, 65 # S0 is "A" ord I0, S0 # I0 is 65
ord has a three-argument variant that takes a character offset to select a single character from a multicharacter string. The offset must be within the length of the string:
ord I0, "ABC", 2 # I0 is 67
A negative offset counts backward from the end of the string, so -1 is the last character.
ord I0, "ABC", -1 # I0 is 67
18.104.22.168 Formatting strings
The sprintf opcode generates a formatted string from a series of values. It takes three arguments: the destination register, a string specifying the format, and an ordered aggregate PMC (like a PerlArray) containing the values to be formatted. The format string and the destination register can be either strings or PMCs:
sprintf S0, S1, P2 sprintf P0, P1, P2
The format string is similar to the one for C's sprintf function, but with some extensions for Parrot data types. Each format field in the string starts with a % and ends with a character specifying the output format. The output format characters are listed in Table 6-1.
Each format field can be specified with several options: flags, width, precision, and size. The format flags are listed in Table 6-2.
The width is a number defining the minimum width of the output from a field. The precision is the maximum width for strings or integers, and the number of decimal places for floating-point fields. If either width or precision is an asterisk (*), it takes its value from the next argument in the PMC.
The size modifier defines the type of the argument the field takes. The flags are listed in Table 6-3.
The values in the aggregate PMC must have a type compatible with the specified size.
Here's a short illustration of string formats:
new P2, .PerlArray new P1, .PerlNum new P0, .PerlInt set P0, 42 set P1, 10 push P2, P0 push P2, P1 sprintf S0, "int %#Px num %+2.3Pf\n", P2 print S0 # prints "int 0x2a num +10.000" print "\n" end
The first eight lines create a PerlArray with two elements: a PerlInt and a PerlNum. The format string of the sprintf has two format fields. The first, %#Px, takes a PMC argument from the aggregate (P) and formats it as a hexadecimal integer (x), with a leading 0x (#). The second format field, %+2.3Pf, takes a PMC argument (P) and formats it as a floating-point number (f), with a minimum of two whole digits and a maximum of three decimal places (2.3) and a leading sign (+).
22.214.171.124 Testing for substrings
The index opcode searches for a substring within a string. If it finds the substring, it returns the position where the substring was found as a character offset from the beginning of the string. If it fails to find the substring, it returns -1:
index I0, "Beeblebrox", "eb" print I0 # prints 2 print "\n" index I0, "Beeblebrox", "Ford" print I0 # prints -1 print "\n" end
index also has a four-argument version, where the fourth argument defines an offset position for starting the search:
index I0, "Beeblebrox", "eb", 3 print I0 # prints 5 print "\n" end
6.2.5 I/O Operations
The I/O subsystem has at least one set of significant revisions ahead, so you can expect this section to change. It's worth an introduction, though, because the basic set of opcodes is likely to stay the same, even if their arguments and underlying functionality change.
126.96.36.199 Open and close a file
The open opcode opens a file for access. It takes three arguments: a destination register, the name of the file, and a modestring. With a PMC destination, it returns a ParrotIO object on success and a PerlUndef object on failure. With an integer destination, it returns an integer file descriptor on success and -1 on failure:
open P0, "people.txt", "<" open I0, "people.txt"
The modestring specifies whether the file is opened in read-only (<), write-only (>), read-write (+>), or append mode (>>). open takes a modestring argument only when it's creating a ParrotIO object, and not when it's creating a file descriptor.
close P0 # close a PIO close I0 # close a descriptor
188.8.131.52 Output operations
We already saw the print opcode in several examples above. The one argument form prints a register or constant to stdout. It also has a two-argument form: the first argument is the file descriptor or ParrotIO object where the value is printed. The standard file descriptors are 0 for stdin, 1 for stdout, and 2 for stderr.
print 2, S0 # print to stderr printerr S0 # the same print P0, "xxx" # print to PIO in P0
write is similar to print, but it only works with integer file descriptors:
write 2, S0 # write string to stderr
184.108.40.206 Reading from files
read S0, I0 # read from stdin up to I0 bytes into S0 read S0, P0, I0 # read from the PIO in P0
readline S0, I0 # read a line from descriptor I0
seek I0, P0, I1, I2
In this example, the position of P0 is set by an offset (I1) from an origin point (I2). 0 means the offset is from the start of the file, 1 means the offset is from the current position, and 2 means the offset is from the end of the file. The return value (in I0) is 0 when the position is successfully set and -1 when it fails. seek also has a five-argument form that seeks with a 64-bit offset, constructed from two 32-bit arguments.
6.2.6 Logical and Bitwise Operations
and I0, 0, 1 # returns 0 and I0, 1, 2 # returns 2
or I0, 1, 0 # returns 1 or I0, 0, 2 # returns 2 or P0, P1, P2
Both and and or are short-circuiting. If they can determine what value to return from the second argument, they'll never evaluate the third. This is significant only for PMCs, as they might have side effects on evaluation.
xor I0, 1, 0 # returns 1 xor I0, 0, 1 # returns 1 xor I0, 1, 1 # returns 0 xor I0, 0, 0 # returns 0
not I0, I1 not P0, P1
The bitwise opcodes operate on their values a single bit at a time. band, bor, and bxor return a value that is the logical AND, OR, or XOR of each bit in the source arguments. They each take a destination register and two source registers. They also have two-argument forms where the destination is also a source. bnot is the logical NOT of each bit in a single source argument.
bnot I0, I1 band P0, P1 bor I0, I1, I2 bxor P0, P1, I2
The logical and arithmetic shift operations shift their values by a specified number of bits:
shl I0, I1, I2 # shift I1 left by count I2 giving I0 shr I0, I1, I2 # arithmetic shift right lsr P0, P1, P2 # logical shift right