The first object is a 4D cube, often called a hypercube.  You can see a
small cube inside of and connected to a larger cube.  If you look a
little closer, you may notice that in-between the two cubes are some
more cubes.  (When you slice a 3D cube, you get a 2D cube -- a square. 
When you take a slice of the hypercube, you get a 3D cube).  As it
rotates along its fourth coordinate, the cube folds in upon itself.  One
way to look at it is that the cubes start to change positions -- after
180 degrees of rotation the inside cube is on the outside and the
outside cube is on the inside.  The hypercube has literally turned
inside-out.

The program works fine in PAL and NTSC, although PAL folks will get the
tune playing at the wrong speed and transposed into a different key.

Oh yes, one thing I really like is the background on the second object--
on my 1084 it looks like rope.  This is a consequence of the way VIC
generates colors -- extra colors outside of the normal 16 are being
generated, because two hires colors are being placed next to each other. =

If you look at it on a black and white monitor, you will just see thick
diagonal lines.  This very much surprised me when I first saw it!  Find
the March 1985 IEEE Spectrum article for more information on why VIC
behaves this way.

Finally, you may notice some little glitches from time to time
in drawing the 4D objects.  That is my safety valve and keeps the
program from literally destroying itself, in sometimes spectacular
fashion.  Oh well.

@(A): A Handy Glossary

   Polygon: A rectilinear closed plane figure of any number of sides.
=09
   Vector: A directed line segment having magnitude and direction.

I do not know how the term "filled vector" came into vogue, but it is
meaningless, not to mention a little silly -- what would an "unfilled
vector" look like, two points with an arrow at one end?  One may as well
talk about filled lines and filled points.

Thus, I plead with the community to not refer to polygons as vectors and
filled polygons as filled vectors.  Polygons need your help, and have
been discriminated against for too long now.  Just one small donation on
your part of a correct mathematical reference can help save the lives of
one, ten, even hundreds of polygons, both abroad and here at home. 
Individuals wanting to contribute more may sponsor individual polygons;
a kit will be sent to you containing the name of the polygon and at
regular intervals a picture of the polygon will be sent to you, so you
may monitor the progress of your particular polygon. Some polygons are
created unclosed, and some do not get the necessary ink or programming
skill to properly fill them, but be it a quadrilateral or decagon,
trapezium or parallelogram, with your help we can eventually make all
polygons closed and full, for a better, more civilized world. Thank you
for your time, and God bless all the little geometrical constructions,
no matter their dimension or configuration.

@(A): The Idea

This program displays a representation of some four-dimensional objects
-- four 4D objects, as a matter of fact, each one of them a 4D analog of
a three-dimensional object.  Each screen contains four symmetry-related
3D objects and one 4D analog of the object, rotated and projected from
4D into 2D.

To describe the four-dimensional objects is not so tough. The 4D cube
(the hypercube) is the first to be displayed, and it is the starting
point for the later objects.  It is also, I think, the easiest to see
what is going on with.  There is nothing really special about four
dimensions -- with a 3D object each point is defined by three
coordinates, say (x,y,z).  A 4D point has four coordinates, say
(w,x,y,z).  The 3D cube has eight vertices at:

   (+/-1, +/-1, +/-1)

Therefore a very natural extension into four dimensions would be:

   (+/-1, +/-1, +/-1, +/-1)

For a total of sixteen vertices.  To look at it another way:

   (1, +/-1, +/-1, +/-1)
   (-1,+/-1, +/-1, +/-1)

That is, at w=3D1 we get a cube, and at w=3D-1 we get another cube.  In
fact, if we take a "slice" of our hypercube, we get a 3D cube. Compare
to taking a slice of a 3D cube, where you get a square (a 2D cube, if
you will).

This is demonstrated when the code first starts up -- the program
"grows" a cube from 0D -> 1D -> 2D -> 3D -> 4D.  At the 4D stage there
is a smaller cube inside of a larger cube, with cubes in-between the
two.  (If you are curious as to how I did the "growing", see the code
description below for a few details).

Next, as the cube begins to rotate, it "folds in" on itself (or, if you
like, it unfolds!).  Rotations are no different than they have always
been.  To do a 3D rotation, recall that the object is rotated in the x-y
plane, the y-z plane, and the x-z plane.  To rotate in the x-y plane by
an angle phi:

   xnew =3D x*cos(phi) - y*sin(phi)
   ynew =3D x*sin(phi) + y*cos(phi)

Well, any two coordinates form a plane, so in four dimensions there are
just twice as many planes to rotate in.  In particular, the program does
rotations in the usual planes (x-y, y-z, x-z) and also does a single
rotation in the w-x plane, that is,

   wnew =3D w*cos(phi) - x*sin(phi)
   xnew =3D w*sin(phi) + x*cos(phi)

I didn't feel any great need to rotate through extra planes involving
the w-coordinate (the w-y and w-z planes).  When phi=3D90 degrees, or 180=

degrees, notice that the coordinates trade places, then go to their
negatives.  This means that as phi is increased, in essence the inner
and outer cubes are going to change positions, and this then explains
the unfolding that is seen on the screen.

The R/S key goes into 3D mode by zeroing out the angle increment for the
w-x plane.  In effect, the 4D rotation is frozen. The F4 key zeros out
the x-y, y-z, and x-z angle increments, leaving only the w-x rotation. 
F4 followed by R/S will therefore freeze the image completely -- use D
or 4 to get it going again.

There is still the issue of visualizing a 4D object.  This should not be
surprising -- after all, we have all seen 3D objects drawn on a 2D
computer screen (or a 2D piece of paper).  If we can get from 3D to 2D
then we ought to be able to get from 4D to 3D (and from there into 2D). 
Recall that a 3D projection draws a light ray from the object, through a
little pinhole located at the origin, and finds the intersection with a
piece of film located at z=3Dd, a constant:

   L =3D t * (x1,y1,z1) is my light ray, so t=3Dd/z1 gives the
                      intersection with the film of a ray from
                      the point (x1,y1,z1) passing through the
                      origin.

So this is very easy to extend into 4D -- simply project from 4D
into 3D through the origin:

   L =3D t * (w1,x1,y1,z1)  let t=3Dd/w1

   -> L3 =3D (d, d/w1 * x1, d/w1 * y1, d/w1 * z1)

The x,y,z coordinates are then projected from 3D into 2D, again through
the origin.  This gives a "perspective" view of the 4D object.

Now, what is the 4D analog of a tetrahedron, or an octahedron? I
reasoned them out by trying to think of what 3D objects I could derive
starting from a cube.  That is, taking a cube, and cutting away pieces
of it.  For instance, to do the 14-sided guy, simply take the midpoint
of each line segment on the cube -- this has the effect of cutting off
the corners of the cube.  By defining things in this way, it is fairly
straightforward to extend the objects into four dimensions.  (I was
happiest to realize how to do a tetrahedron). See the file objects.s for
more details on the individual objects. Naturally each has some
similarity to the cube: there is an inner object(e.g. a tetrahedron) and
an outer.  The two are connected, and each set of connections forms
another object, so that, for instance, there are tetrahedrons in-between
the inner and outer tetrahedrons.

Finally, to help in visualizing the objects, I stuck a dotted line
capability in.  The dotted lines in general connect the "inner" and
"outer" 3D objects -- turning them off lets you then see the two objects
interact.  (The third object was mighty impressive-looking before I
added these guys! :)
 =09
@(A): The Code

Now, it is my considered opinion that the code is awfully well
documented, so there isn't too much to say, but a few general things are
worth mentioning.

"Growing" the points is really easy -- simply start each coordinate at
zero, and gradually increase it out to its final value.  By doing this
first with the x-coordinates, then the y-coords, then z, then w, the
cube grows a dimension at each step.  I don't do anything fancy with the
other objects -- all coordinates are grown equally, so the objects grow
outwards from the origin (as opposed to some sort of zoom effect).

Each 4D character is a 12x12 character grid, which gives a 96x96 pixel
drawing area, and takes up the first 144 characters.  Each 3D character
uses a 5x5 character grid, giving 40x40, and taking up the next 4*25=3D10=
0
characters, for a total of 244 so far.  In eight of the remaining 12
characters are four patterns and their EOR #$FF complements, which are
used in the background tilings and are used indirectly in the pattern
fills.

Since the final x-y coordinates can range from -48..48, this places a
restriction on the initial values for the coordinates.  For purposes of
accuracy and such coordinates must of course be scaled, so that while a
coordinate like (1,1,1,1) is convenient for thinking, a coordinate like
(16,16,16,16) is much better suited to the implementation -- that is,
the original coordinate scaled by a factor of sixteen or so.  The table
range restricts this scaling factor: the 4D coordinate with largest
length that I use is (1,1,1,1), which has length 2.  Thus, after
rotation, it is possible that it will lie on an axis with coordinate,
say (2,0,0,0).  Since coordinates must not exceed 48 in the
implementation, this suggests a scaling factor of 24.

As a practical point, the points never really hit this maximum, so in
principle a larger scaling factor could be used. Alternatively the
projection routine can pick up the slack, which is what dim4 uses.

The first smart thing I did was to ditch the old method of computing
rotations.  Instead of calculating a big rotation matrix, I calculate
some big tables of f_x (s) =3D x*sin(s), and let the angle s range from
0..127.  To get a table of cos(s) I simply periodically extend the sine
table by copying the first 32 bytes of the table into the 128-159
positions -- cos(s) is thus sin(s+32). (I take advantage of the fact
that sin(s) and cos(s) are related by a factor of pi/2.  Were I smart I
would have taken advantage of the reflection symmetry of sin/cos, and
saved another 64 bytes.  Oh well.)

This then leaves 96 bytes for a projection table, which is just what I
need for the 4D object.  Thus I can mash tables of x*sin(s), x*cos(s),
and my projection table of f_x(z)=3Dd*(z-z0) * x into a single page.  Thi=
s
page is then extended from $6000 to $C000, i.e. giving 96 tables, for a
total of 24k.  Accessing the tables is now trivial: store x+$60 in the
high byte of a zero page pointer, the low byte contains the offset into
the table (0 for the sine table, 32 for the cosine table, and 160 for
the projection table), and do an LDA (ZP),Y to get the right value.

Thus rotations and projections are now very fast and very compact.  Note
that it isn't really necessary to generate a complete table of sines and
cosines.  For instance, 12k of tables (or 6k or whatever) could be used,
and the final result simply multiplied by two, or four.  Even though the
final coordinates might range from -48..48, calculations don't need to
be done using the full range.

The line routine is the good 'ol chunky line routine from the last
cube3d program.  It of course had to be modified to work with the two
buffers and such. I removed a bunch of really redundant code that was in
there (REALLY redundant), especially in the actual drawing part (macros
XSTEP and YSTEP -- lines are commented out with a '*'). I also added a
dotted-line capability (it only takes a few extra instructions), to make
things easier to see.

Only a single 3D object is actually drawn -- the others are generated
via symmetry (reflections through x=3D0 and y=3D0).  Since the 3D objects=

are drawn on a much smaller grid, they need to be scaled down a bit. 
Instead of writing separate routines to deal with the 3D and 4D objects,
I simply set the 4D coordinate of each point in the 3D object to some
appropriate number.  Recall that in a 3D projection, the farther away
from you the object is, the smaller it gets.  This is the same idea --
the object is pushed down the 4D axis, and this has the effect of
shrinking the object upon projection.

You may have noticed that the 3D objects tend to avoid the center of the
screen -- this is a consequence of the random number generator I coded
up (and did not test for spectral properties or anything like that :). 
Originally I was going to place things in a random row and column, but
then things just clumped along a diagonal line :).  I will also say that
the SPLAT routine caused me many days of headaches -- whose idea was it
to put color memory so close to a CIA? :)

One thing I had to prune out was a routine which draws circles as the
sine/cosine tables are being set up.  It is kind of neat and gave me
something to watch while the code was setting up, and also was a check
that the trig tables were being set up correctly.  Anyway, all it does
is to draw concentric circles of progressively larger radii, for a sort
of tunnelish-looking thing I suppose.

There is a little "failsafe" in the projection routine.  If coordinates
are out of range (greater than 96 or 40) after projection, they are set
to the origin.  At least one of the objects screws up from time to time
(the octahedron is the main culprit I think), and I think what happens
is that the line routine thinks it needs to draw a lot more points than
it really needs to.  So it happily moves along sticking bytes into the
trig/projection tables, and even makes its way up to VIC, SID and the
CIAs!  Once, it actually started pegging the SID volume register or
something, because there would be a periodic loud ticking from the
speaker.  Eventually the code just grinds to a halt or else completely
hoses the system -- hence, the failsafe :).

Finally, the very first lines of the code redirect the BASIC vector at
$0302/$0303 and JMPs to the NMI RS/RESTORE routine (although a BRK would
probably have sufficed).  This is the only way I could get the code to
work with the cruncher -- without it, the program goes into "IRQ lock". 
Crossbow of Crest suggested that ABCruncher does not put a CLI at the
end of its crunching routine, and that this can cause problems, most
notably with the CIAs.

It took 10-15 hours to get things to crunch and work correctly. In
hindsight, I can think of a bunch of things that could have been easily
done to make it work, but at the time I was sure relieved when it
finally got down to 4095 bytes.  Moral: A little thinking early on saves
massive time and effort down the road.

@(A): The Music

Finally, a word about the music.  Originally I was going to construct a
series of chords which I could modulate between in a fairly flexible
way.  I was then going to break up the chords in a nice way and move
between them randomly.  But then it occurred to me that I already knew a
piece of music which was a series of broken chords and sounded
infinitely more cool than anything I was going to accidentally write, so
I used it instead.  Even better, they are four-note "chords", broken
into four groups of four notes each -- too good to pass up.  Notes are
looked up in a frequency table, thus on my PAL 64 the music gets
transposed to a different key (in addition to playing at the wrong speed
:).

I do not necessarily recommend using the routine as a model for doing
IRQ interrupts -- I had many problems with "IRQ lock", where an IRQ is
continuously latched, and consequently is constantly running the
routine.  I still do not understand what is happening, nor do I have a
solution.

@(A): Memory Map

   $0F00-$0FFF   Starting sine+projection table
   $1000-$2257   Code
   $3000-$4000   Character set
   $6000-$C0FF   Sine, cosine, and projection tables
   $C100-$CFFF   Misc. variables and tables
     
@(A): Contents of dim4.lnx

Note: the code is available in this issue of Commodore Hacking
(Reference: code, SubRef: democode), on the Commodore Hacking MAILSERV
server (Reference: code), and at http://www.msen.com/~brain/pub/dim4.lnx
  
   dim4             Submitted entry for 4k demo contest
   dim4.text        This file, in PETSCII format
   dim4readme-runme Obvious
   dim4.names       Linker name file to use with Merlin 128
   main4.s          Main code for dim4
   objects.s        Code to define/set up objects
   graphics.s       Various graphics routines (lines, fills, etc.)
   music.s          Init and main IRQ music routine
  
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D

@(#)cpu: Exploiting the 65C816S CPU
         by Jim Brain (j.brain@ieee.org)

@(A): Introduction

For a CPU architecture that can trace its roots to the mid 1970's, the 
65XX line has proved very successful.  Finding its way into flagship
systems such as Commodore, Apple, Atari, and other lesser known units,
the CPU has toiled away for years in the single digit megahertz speeds.  =

Programmers across the world have analyzed the CPU to death and 
documented every last one of its "undocumented" opcodes.  Ask a "coder",
and he or she will rattle off the cycles it takes to do an immediate
load or an absolute store.  In short, the CPU is road tested and well
known.  

However, how much do you know about its "children"?  Yes, in the 1980's,
while Commodore was busy tinkering with the NMOS version of the CPU
designed by Chuck Peddle, Bill Mensch, and the ex-Motorola 6800 design
crew, Bill Mensch started a new company, Western Design Center, and
redesigned the 6502 to use the newer and faster CMOS fabrication
process.  In addition to the new 65C02, Mensch designed an upwardly
compatible 16 bit brother, the 65C816.  Although both were offered to
Commodore, only the 65C02 was used and only in the never produced CBM
Laptop computer.  Apple, however, used the 'C02 in later models of the
Apple II line and placed the 65C816 at the heart of the Apple IIGS
system.

Although Commodore never took advantage of the WDC CPUs, third party
products have offered their speeds to the Commodore community.  Early
models like the TurboMaster and TurboProcess offered 4 MHz speeds to the
Commodore 64 owner, while newer products like the FLASH8 offered 8 MHz
speeds.  The fastest offering thus far is the CMD SuperCPU, offering
speeds of 20 MHz to the Commodore owner.  Of these, the TurboProcess,
the FLASH8, and the CMD SuperCPU all use the 16 bit CPU, the 'C816.

Since the 'C816 is available now to the Commodore user, and with the
SuperCPU poised to provide software compatibility never before achieved,
it is likely that more and more Commodore applications will run on 'C816
equipped machines.  So, why should the Commodore software developer
care?  Sure, the 65C816 will run 6502 based applications in 6502
emulation mode at substantial speed increases, so developers can opt to
continue writing 6502 based applications.  While I encourage developers
to always provide 6502 based versions of applications when possible, 
there are useful features available only in the Native mode of the
65C816.  This article describes some of these features and how to
utilize them.

@(A): Disclaimer

The following information is based on following resources:

o  Data Sheets on the 65C816S, Western Design Center
o  _Programming the 65816_, by David Eyes and Ron Lichty, 1985, Western
   Design Center.
o  A beta version of the SuperCPU 20 MHz accelerator from CMD.
o  A beta version of the Super Assembler (SAS) 65C816 Assembler, by Jim
   Brain, Distributed by CMD.

Most of the following information is system independent, but any
information specific to the CMD SuperCPU is preliminary and subject to
change.

It is not the intention of this article to detail all the possible
65C816S opcodes nor their addressing modes.  It is also not the
intention of the article to describe the operation of the SAS assembler. =

For more information on both of these products, please consult the
manuals listed above.

@(A): Diving Right In

As this article is geared toward the programmer, we're going to
dive right into the new features.  Commodore World issue #12 has an
overview of the CPU for those just arriving on the scene.  For those who
know an index register from an accumulator, read on

@(A): Overview of Registers

One of the features of operating in Native mode of the CPU is the
enhanced set of registers available to the programmer.  They are also
key to explaining the other features of the CPU.  So, let us go over the
new register set:

        8 bits                   8 bits                  8 bits
------------------------------------------------------------------------
[  Data Bank Register   ][   X Register High    ][   X Register Low*   ]
[  Data Bank Register   ][   Y Register High    ][   Y Register Low*   ]
[          00           ][ Stack Register High  ][ Stack Register Low* ]
                         [   Accumulator High   ][   Accumulator Low*  ]
[ Program Bank Register ][ Program Counter High*][ Program Counter Low*]
[          00           ][ Direct Register High ][ Direct Register Low ]
------------------------------------------------------------------------

* Original NMOS 65XX register set

These registers are referred to in the remainder of the article by their
acronyms, as follows:

Data Bank Register    (DBR)
Program Bank Register (PBR)
X Register High       (XH)
X Register Low        (XL)
Stack Register High   (SH)
Stack Register Low    (SL)
Y Register High       (YH)
Y Register Low        (YL)
Accumulator High      (B)
Accumulator Low       (A)
Program Counter High  (PCH)
Program Counter Low   (PCL)
Direct Register High  (DH)
Direct Register Low   (DL)

In addition, the 16 bit combination of B:A is called C, the 16 bit X and
Y registers are called simply X and Y, the 16 bit Direct Register is
called simply D, and the 16 bit Stack Register is called S.

One more register requires discussion before we can delve into
programming the '816: the Status Register (P)

Bit:Description
 
7     N flag
6     V flag
5     1 in Emulation mode
      M flag in Native mode (memory select bit)
         0 =3D 16 bit accumulator
         1 =3D 8 bit accumulator
4     B flag in Emulation mode
      X flag in Native mode (Index Register Select)
         0 =3D 16 bit X and Y registers
         1 =3D 8 bit X and Y registers
3     D flag
2     I Flag
1     Z flag
0     C flag
      E flag (Emulation flag)  (Can not be accessed directly)
         0 =3D Native mode
         1 =3D Emulation mode

It is important to note that there are 3 more flags available in the
Native mode version of the status register.  Since there were 7 flags
used before, how did WDC squeeze in the extra flags?  Well, the E flag
cannot be accessed or seen in the status register.  The only way to
change it is to set up the C flag to the intended stats of the E flag
and issue the eXchange Carry and Emulation flags (XCE) opcode.  Another
flag, M, takes the place of the static 1 state in the old status
register.  M controls the length of the accumulator.  The last flag, the
X flag, controls the length of both index registers.  Note that this
flag takes the place of the B flag.  Thus, the B flag is unavailable in
Native mode.  Since the B flag is used to determine whether a hardware
IRQ or a software BRK opcode caused an IRQ interrupt, the Native mode
provides separate interrupt vectors for BRK and hardware IRQs.

The X and M flags are especially important in Native mode, so much so
that each programmer will become intimately familiar with these flags. 
When a register is selected to be 8 bits wide, it emulates the operation
of the register in Emulation mode.  However, when the register is
flipped into 16 bit operation, its length doubles everywhere.  For
instance, a push of the accumulator with the M flag reset causes 2 bytes
to appear on the stack. Likewise, an immediate load of the accumulator
will require a 3 byte instruction: one for the opcode, and a 2 byte
operand.  This opens up one of the nastiest gothcas on the chip, but
we'll detail this later in the article.

@(A): More Memory

As you may be aware, the Native mode of the '816 allows the programmer
contiguous access to up to 16 megabytes of RAM.  This access doesn't
involve tricks such as DMA, page flipping, or RAM "windows".  At any
given point in time, an application can access a memory location and
request a memory location more than 64 kB higher in the next
instruction.  In order to access the new memory locations using standard
6502 addressing modes, the new DBR and PBR registers have been added.
The PBR serves as the 3rd byte of the PC, allowing code to run at any
location in memory.  The DBR register functions as the 3rd byte for
memory accesses in addressing modes like absolute mode.  Of course,
there are restrictions, like the inability to execute code that crosses
a 64kB boundary, but these restrictions can be overcome, as you'll see
below.  

For clarity, we will refer to the 3rd bytes of an address as the "bank",
and refer to the 2 lowest bytes of an address as the offset.  Alternate
names include "segment" and offset, but that naming scheme was
previously used with the Intel 80X86 CPU line and carries with it many
bad connotations.

Since memory addresses can now be 3 bytes wide and contain 6 hexadecimal
digits, an obvious representation would be $xxxxxx.  However, many '816
references write the address as a two part quantity, with the bank
register and the 16 bit offset separated by a colon, ":".  Therefore,
$xxyyyy and $xx:yyyy are equivalent.  In this article, the former
notation is used for emphasis and because ":" notation also brings up
bad connotations from Intel 80X86 CPU line.

@(A): Increased Stack

As the S register is now 16 bits wide, the stack can now reside in all
of bank 0, giving the programmer 64 kB of stack area.  As well, the S
register can be set to any location in bank 0.  This allows one to start
stack from any non-aligned page in bank 0.

@(A): Enhancements to Old Addressing Modes

Even though the '816 supports the traditional 14 addressing modes of the
6502, it extends some of them to handle the extra features in the '816. 
Note that the opcodes and parameters have not changed for these
addressing modes; rather the way the CPU treats them differs slightly. 
Of special note is the term "zero-page", which has been expanded into
"Direct Mode".  Let's take a look at what changes you can expect.

@(A): Absolute Modes

In the 65XX CPU, modes such as absolute and its indexed siblings each
could access a memory location in the 64 kB memory map.  In the '816,
these modes are now capable of accessing memory above and beyond 64 kB. 
When accessing memory, the DBR register is prepended to the address
being accessed, thus forming a 24 bit effective address.  When
transferring control, the PBR register is prepended.  Thus, if the DBR
contains a $05, the following:

   af ff ff     lda $ffff

would load a value into the .A register from $05ffff.  If the M flag is
set to 16 bit mode, the 16 bit value in $05ffff and $060000 will be
loaded.  If the M flag is set to 8 bit, only $05ffff will be loaded. 
Notice that this example also shows "temporary-bank-incrementing". 
While loading a 16 bit value with the instruction above, the DBR is
"temporarily" incremented to allow access of data from bank $06.  The
actual DBR is left unchanged, so the next instruction will find the DBR
back at $05.

You'll rarely see such bank changes when accessing data as above, but it
is common when using indexing modes.  With the DBR at $05, executing:

   a2 ff ff     ldx #$ffff
   bd ff ff     lda $ffff,x

will load values from $06fffe and possibly $06ffff, depending on the
size of the accumulator.  

When using absolute mode on opcodes like JMP and JSR, the PBR register
is used to form the 24 bit effective address.  Unlike the DBR, the PBR
does not exhibit "temporary-bank-incrementing".  It simply rolls over
within the same bank.  Keep that in mind.

@(A): Direct Modes

To enhance the capabilities of the '816, the CPU offers Direct Mode,
which is a superset of "zero-page mode".  Basically, all z-page opcode
operands are added to the D register to form a 16 bit effective address. =

This allows using the entire bank 0 as effective z-page memory.  With
the D register set to $0200, executing:

   a5 10        lda $10

would load the accumulator from $000210 (and possibly $0211).  Direct
mode is not allowed to increment into bank 1.  If the above instruction
is executed while D =3D $ffff, the accumulator would start accessing data=

from $000009.  This highlights an important yet subtle change.  No
longer is lda $10 guaranteed to access data from $000010.  It will
access data from D + $10.

Indexing changes little with respect to Direct mode.  After the D
register is added to the 8-bit offset, the appropriate register is
added, and the effective address is normalized to fall within bank 0. 
There is no way to reference outside bank 0 in Direct mode.  Even if
index registers are set to 16 bit mode and hold $ffff, the instruction
will access bank 0.

@(A): Direct Indexed Indirect Mode

Most programmers forget, but this mode executes in two parts.  Now, it
becomes important.  In the first part, the 8-bit offset is added to the
D register and then the X register.  The result is normalized to 16
bits, and two values are accessed from bank 0.  The second part takes
those two bytes as the effective address, and PREPENDS the DBR register
to form a final address.  In this way, you can access memory outside
bank 0 with this mode, but you must store the address to access in bank
0.  Read that sentence again.

@(A): Direct Indirect Indexed Mode

Like its relation above, this mode work in two parts.  In part 1, the 8
bit offset is added to the D register and normalized to 16 bits.  Two
bytes are accessed from bank 0, and then the 16 bit value returned is
appended to the DBR to form a 24 bit effective address.  In part 2, the
Y register is added to this effective address to form the final address
for access.  As above, part 1 cannot access outside bank 0, but part 2
can.

@(A): Stack Mode (Implied)

Usually lumped in with the Implied addressing mode by most 6502
developers, stack mode has changed to accommodate the new widths of the
registers.  Depending on the width of the register, stack operations
will push and pull either 1 or 2 bytes.  This can cause problems if you
push a 16 bit register and try to pull it off as an 8-bit register. 
Caveat Emptor.

@(A): Immediate Mode

In Emulation mode, immediate mode was simple.  You specified an 8 bit
immediate value to be loaded into a register.  In Native mode, however,
registers can be 16 bits.  Everyone knows the opcode can do an immediate
8-bit load, but what opcode performs a 16-bit immediate load?  Answer:
the same opcode!  If the register is set to 8 bits via the X or M flags,
the immediate load on that register will pull in 8-bits.  If the
register is set to 16 bits, the instruction will load a 16 bit value. 
The effects of this change are monumental.  An 8-bit immediate load
requires 2 bytes, while a 16 bit load requires 3.  

This presents some problems.  Since neither the opcode nor the mnemonic
differs between the two forms, the assembler cannot tell which form is
required from context.  The develop must tell the assembler which form
to use by use of assembler directives.  However, this doesn't guarantee
success.  The developer must ensure that the flags are set correctly
before executing an immediate load of any register.  Improper settings
will either cause the instruction to pull the next opcode into the high
byte of the register or treat the high byte of the intended register
value to be executed as an opcode.  In my biased opinion, this is
severely shortsighted.  I would rank this as the number one bug that
'816 developers will face.  However, simple macros employed in your
assembler can help minimize this problem.


