The first object is a 4D cube, often called a hypercube. You can see a small cube inside of and connected to a larger cube. If you look a little closer, you may notice that in-between the two cubes are some more cubes. (When you slice a 3D cube, you get a 2D cube -- a square. When you take a slice of the hypercube, you get a 3D cube). As it rotates along its fourth coordinate, the cube folds in upon itself. One way to look at it is that the cubes start to change positions -- after 180 degrees of rotation the inside cube is on the outside and the outside cube is on the inside. The hypercube has literally turned inside-out. The program works fine in PAL and NTSC, although PAL folks will get the tune playing at the wrong speed and transposed into a different key. Oh yes, one thing I really like is the background on the second object-- on my 1084 it looks like rope. This is a consequence of the way VIC generates colors -- extra colors outside of the normal 16 are being generated, because two hires colors are being placed next to each other. = If you look at it on a black and white monitor, you will just see thick diagonal lines. This very much surprised me when I first saw it! Find the March 1985 IEEE Spectrum article for more information on why VIC behaves this way. Finally, you may notice some little glitches from time to time in drawing the 4D objects. That is my safety valve and keeps the program from literally destroying itself, in sometimes spectacular fashion. Oh well. @(A): A Handy Glossary Polygon: A rectilinear closed plane figure of any number of sides. =09 Vector: A directed line segment having magnitude and direction. I do not know how the term "filled vector" came into vogue, but it is meaningless, not to mention a little silly -- what would an "unfilled vector" look like, two points with an arrow at one end? One may as well talk about filled lines and filled points. Thus, I plead with the community to not refer to polygons as vectors and filled polygons as filled vectors. Polygons need your help, and have been discriminated against for too long now. Just one small donation on your part of a correct mathematical reference can help save the lives of one, ten, even hundreds of polygons, both abroad and here at home. Individuals wanting to contribute more may sponsor individual polygons; a kit will be sent to you containing the name of the polygon and at regular intervals a picture of the polygon will be sent to you, so you may monitor the progress of your particular polygon. Some polygons are created unclosed, and some do not get the necessary ink or programming skill to properly fill them, but be it a quadrilateral or decagon, trapezium or parallelogram, with your help we can eventually make all polygons closed and full, for a better, more civilized world. Thank you for your time, and God bless all the little geometrical constructions, no matter their dimension or configuration. @(A): The Idea This program displays a representation of some four-dimensional objects -- four 4D objects, as a matter of fact, each one of them a 4D analog of a three-dimensional object. Each screen contains four symmetry-related 3D objects and one 4D analog of the object, rotated and projected from 4D into 2D. To describe the four-dimensional objects is not so tough. The 4D cube (the hypercube) is the first to be displayed, and it is the starting point for the later objects. It is also, I think, the easiest to see what is going on with. There is nothing really special about four dimensions -- with a 3D object each point is defined by three coordinates, say (x,y,z). A 4D point has four coordinates, say (w,x,y,z). The 3D cube has eight vertices at: (+/-1, +/-1, +/-1) Therefore a very natural extension into four dimensions would be: (+/-1, +/-1, +/-1, +/-1) For a total of sixteen vertices. To look at it another way: (1, +/-1, +/-1, +/-1) (-1,+/-1, +/-1, +/-1) That is, at w=3D1 we get a cube, and at w=3D-1 we get another cube. In fact, if we take a "slice" of our hypercube, we get a 3D cube. Compare to taking a slice of a 3D cube, where you get a square (a 2D cube, if you will). This is demonstrated when the code first starts up -- the program "grows" a cube from 0D -> 1D -> 2D -> 3D -> 4D. At the 4D stage there is a smaller cube inside of a larger cube, with cubes in-between the two. (If you are curious as to how I did the "growing", see the code description below for a few details). Next, as the cube begins to rotate, it "folds in" on itself (or, if you like, it unfolds!). Rotations are no different than they have always been. To do a 3D rotation, recall that the object is rotated in the x-y plane, the y-z plane, and the x-z plane. To rotate in the x-y plane by an angle phi: xnew =3D x*cos(phi) - y*sin(phi) ynew =3D x*sin(phi) + y*cos(phi) Well, any two coordinates form a plane, so in four dimensions there are just twice as many planes to rotate in. In particular, the program does rotations in the usual planes (x-y, y-z, x-z) and also does a single rotation in the w-x plane, that is, wnew =3D w*cos(phi) - x*sin(phi) xnew =3D w*sin(phi) + x*cos(phi) I didn't feel any great need to rotate through extra planes involving the w-coordinate (the w-y and w-z planes). When phi=3D90 degrees, or 180= degrees, notice that the coordinates trade places, then go to their negatives. This means that as phi is increased, in essence the inner and outer cubes are going to change positions, and this then explains the unfolding that is seen on the screen. The R/S key goes into 3D mode by zeroing out the angle increment for the w-x plane. In effect, the 4D rotation is frozen. The F4 key zeros out the x-y, y-z, and x-z angle increments, leaving only the w-x rotation. F4 followed by R/S will therefore freeze the image completely -- use D or 4 to get it going again. There is still the issue of visualizing a 4D object. This should not be surprising -- after all, we have all seen 3D objects drawn on a 2D computer screen (or a 2D piece of paper). If we can get from 3D to 2D then we ought to be able to get from 4D to 3D (and from there into 2D). Recall that a 3D projection draws a light ray from the object, through a little pinhole located at the origin, and finds the intersection with a piece of film located at z=3Dd, a constant: L =3D t * (x1,y1,z1) is my light ray, so t=3Dd/z1 gives the intersection with the film of a ray from the point (x1,y1,z1) passing through the origin. So this is very easy to extend into 4D -- simply project from 4D into 3D through the origin: L =3D t * (w1,x1,y1,z1) let t=3Dd/w1 -> L3 =3D (d, d/w1 * x1, d/w1 * y1, d/w1 * z1) The x,y,z coordinates are then projected from 3D into 2D, again through the origin. This gives a "perspective" view of the 4D object. Now, what is the 4D analog of a tetrahedron, or an octahedron? I reasoned them out by trying to think of what 3D objects I could derive starting from a cube. That is, taking a cube, and cutting away pieces of it. For instance, to do the 14-sided guy, simply take the midpoint of each line segment on the cube -- this has the effect of cutting off the corners of the cube. By defining things in this way, it is fairly straightforward to extend the objects into four dimensions. (I was happiest to realize how to do a tetrahedron). See the file objects.s for more details on the individual objects. Naturally each has some similarity to the cube: there is an inner object(e.g. a tetrahedron) and an outer. The two are connected, and each set of connections forms another object, so that, for instance, there are tetrahedrons in-between the inner and outer tetrahedrons. Finally, to help in visualizing the objects, I stuck a dotted line capability in. The dotted lines in general connect the "inner" and "outer" 3D objects -- turning them off lets you then see the two objects interact. (The third object was mighty impressive-looking before I added these guys! :) =09 @(A): The Code Now, it is my considered opinion that the code is awfully well documented, so there isn't too much to say, but a few general things are worth mentioning. "Growing" the points is really easy -- simply start each coordinate at zero, and gradually increase it out to its final value. By doing this first with the x-coordinates, then the y-coords, then z, then w, the cube grows a dimension at each step. I don't do anything fancy with the other objects -- all coordinates are grown equally, so the objects grow outwards from the origin (as opposed to some sort of zoom effect). Each 4D character is a 12x12 character grid, which gives a 96x96 pixel drawing area, and takes up the first 144 characters. Each 3D character uses a 5x5 character grid, giving 40x40, and taking up the next 4*25=3D10= 0 characters, for a total of 244 so far. In eight of the remaining 12 characters are four patterns and their EOR #$FF complements, which are used in the background tilings and are used indirectly in the pattern fills. Since the final x-y coordinates can range from -48..48, this places a restriction on the initial values for the coordinates. For purposes of accuracy and such coordinates must of course be scaled, so that while a coordinate like (1,1,1,1) is convenient for thinking, a coordinate like (16,16,16,16) is much better suited to the implementation -- that is, the original coordinate scaled by a factor of sixteen or so. The table range restricts this scaling factor: the 4D coordinate with largest length that I use is (1,1,1,1), which has length 2. Thus, after rotation, it is possible that it will lie on an axis with coordinate, say (2,0,0,0). Since coordinates must not exceed 48 in the implementation, this suggests a scaling factor of 24. As a practical point, the points never really hit this maximum, so in principle a larger scaling factor could be used. Alternatively the projection routine can pick up the slack, which is what dim4 uses. The first smart thing I did was to ditch the old method of computing rotations. Instead of calculating a big rotation matrix, I calculate some big tables of f_x (s) =3D x*sin(s), and let the angle s range from 0..127. To get a table of cos(s) I simply periodically extend the sine table by copying the first 32 bytes of the table into the 128-159 positions -- cos(s) is thus sin(s+32). (I take advantage of the fact that sin(s) and cos(s) are related by a factor of pi/2. Were I smart I would have taken advantage of the reflection symmetry of sin/cos, and saved another 64 bytes. Oh well.) This then leaves 96 bytes for a projection table, which is just what I need for the 4D object. Thus I can mash tables of x*sin(s), x*cos(s), and my projection table of f_x(z)=3Dd*(z-z0) * x into a single page. Thi= s page is then extended from $6000 to $C000, i.e. giving 96 tables, for a total of 24k. Accessing the tables is now trivial: store x+$60 in the high byte of a zero page pointer, the low byte contains the offset into the table (0 for the sine table, 32 for the cosine table, and 160 for the projection table), and do an LDA (ZP),Y to get the right value. Thus rotations and projections are now very fast and very compact. Note that it isn't really necessary to generate a complete table of sines and cosines. For instance, 12k of tables (or 6k or whatever) could be used, and the final result simply multiplied by two, or four. Even though the final coordinates might range from -48..48, calculations don't need to be done using the full range. The line routine is the good 'ol chunky line routine from the last cube3d program. It of course had to be modified to work with the two buffers and such. I removed a bunch of really redundant code that was in there (REALLY redundant), especially in the actual drawing part (macros XSTEP and YSTEP -- lines are commented out with a '*'). I also added a dotted-line capability (it only takes a few extra instructions), to make things easier to see. Only a single 3D object is actually drawn -- the others are generated via symmetry (reflections through x=3D0 and y=3D0). Since the 3D objects= are drawn on a much smaller grid, they need to be scaled down a bit. Instead of writing separate routines to deal with the 3D and 4D objects, I simply set the 4D coordinate of each point in the 3D object to some appropriate number. Recall that in a 3D projection, the farther away from you the object is, the smaller it gets. This is the same idea -- the object is pushed down the 4D axis, and this has the effect of shrinking the object upon projection. You may have noticed that the 3D objects tend to avoid the center of the screen -- this is a consequence of the random number generator I coded up (and did not test for spectral properties or anything like that :). Originally I was going to place things in a random row and column, but then things just clumped along a diagonal line :). I will also say that the SPLAT routine caused me many days of headaches -- whose idea was it to put color memory so close to a CIA? :) One thing I had to prune out was a routine which draws circles as the sine/cosine tables are being set up. It is kind of neat and gave me something to watch while the code was setting up, and also was a check that the trig tables were being set up correctly. Anyway, all it does is to draw concentric circles of progressively larger radii, for a sort of tunnelish-looking thing I suppose. There is a little "failsafe" in the projection routine. If coordinates are out of range (greater than 96 or 40) after projection, they are set to the origin. At least one of the objects screws up from time to time (the octahedron is the main culprit I think), and I think what happens is that the line routine thinks it needs to draw a lot more points than it really needs to. So it happily moves along sticking bytes into the trig/projection tables, and even makes its way up to VIC, SID and the CIAs! Once, it actually started pegging the SID volume register or something, because there would be a periodic loud ticking from the speaker. Eventually the code just grinds to a halt or else completely hoses the system -- hence, the failsafe :). Finally, the very first lines of the code redirect the BASIC vector at $0302/$0303 and JMPs to the NMI RS/RESTORE routine (although a BRK would probably have sufficed). This is the only way I could get the code to work with the cruncher -- without it, the program goes into "IRQ lock". Crossbow of Crest suggested that ABCruncher does not put a CLI at the end of its crunching routine, and that this can cause problems, most notably with the CIAs. It took 10-15 hours to get things to crunch and work correctly. In hindsight, I can think of a bunch of things that could have been easily done to make it work, but at the time I was sure relieved when it finally got down to 4095 bytes. Moral: A little thinking early on saves massive time and effort down the road. @(A): The Music Finally, a word about the music. Originally I was going to construct a series of chords which I could modulate between in a fairly flexible way. I was then going to break up the chords in a nice way and move between them randomly. But then it occurred to me that I already knew a piece of music which was a series of broken chords and sounded infinitely more cool than anything I was going to accidentally write, so I used it instead. Even better, they are four-note "chords", broken into four groups of four notes each -- too good to pass up. Notes are looked up in a frequency table, thus on my PAL 64 the music gets transposed to a different key (in addition to playing at the wrong speed :). I do not necessarily recommend using the routine as a model for doing IRQ interrupts -- I had many problems with "IRQ lock", where an IRQ is continuously latched, and consequently is constantly running the routine. I still do not understand what is happening, nor do I have a solution. @(A): Memory Map $0F00-$0FFF Starting sine+projection table $1000-$2257 Code $3000-$4000 Character set $6000-$C0FF Sine, cosine, and projection tables $C100-$CFFF Misc. variables and tables @(A): Contents of dim4.lnx Note: the code is available in this issue of Commodore Hacking (Reference: code, SubRef: democode), on the Commodore Hacking MAILSERV server (Reference: code), and at http://www.msen.com/~brain/pub/dim4.lnx dim4 Submitted entry for 4k demo contest dim4.text This file, in PETSCII format dim4readme-runme Obvious dim4.names Linker name file to use with Merlin 128 main4.s Main code for dim4 objects.s Code to define/set up objects graphics.s Various graphics routines (lines, fills, etc.) music.s Init and main IRQ music routine =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D @(#)cpu: Exploiting the 65C816S CPU by Jim Brain (j.brain@ieee.org) @(A): Introduction For a CPU architecture that can trace its roots to the mid 1970's, the 65XX line has proved very successful. Finding its way into flagship systems such as Commodore, Apple, Atari, and other lesser known units, the CPU has toiled away for years in the single digit megahertz speeds. = Programmers across the world have analyzed the CPU to death and documented every last one of its "undocumented" opcodes. Ask a "coder", and he or she will rattle off the cycles it takes to do an immediate load or an absolute store. In short, the CPU is road tested and well known. However, how much do you know about its "children"? Yes, in the 1980's, while Commodore was busy tinkering with the NMOS version of the CPU designed by Chuck Peddle, Bill Mensch, and the ex-Motorola 6800 design crew, Bill Mensch started a new company, Western Design Center, and redesigned the 6502 to use the newer and faster CMOS fabrication process. In addition to the new 65C02, Mensch designed an upwardly compatible 16 bit brother, the 65C816. Although both were offered to Commodore, only the 65C02 was used and only in the never produced CBM Laptop computer. Apple, however, used the 'C02 in later models of the Apple II line and placed the 65C816 at the heart of the Apple IIGS system. Although Commodore never took advantage of the WDC CPUs, third party products have offered their speeds to the Commodore community. Early models like the TurboMaster and TurboProcess offered 4 MHz speeds to the Commodore 64 owner, while newer products like the FLASH8 offered 8 MHz speeds. The fastest offering thus far is the CMD SuperCPU, offering speeds of 20 MHz to the Commodore owner. Of these, the TurboProcess, the FLASH8, and the CMD SuperCPU all use the 16 bit CPU, the 'C816. Since the 'C816 is available now to the Commodore user, and with the SuperCPU poised to provide software compatibility never before achieved, it is likely that more and more Commodore applications will run on 'C816 equipped machines. So, why should the Commodore software developer care? Sure, the 65C816 will run 6502 based applications in 6502 emulation mode at substantial speed increases, so developers can opt to continue writing 6502 based applications. While I encourage developers to always provide 6502 based versions of applications when possible, there are useful features available only in the Native mode of the 65C816. This article describes some of these features and how to utilize them. @(A): Disclaimer The following information is based on following resources: o Data Sheets on the 65C816S, Western Design Center o _Programming the 65816_, by David Eyes and Ron Lichty, 1985, Western Design Center. o A beta version of the SuperCPU 20 MHz accelerator from CMD. o A beta version of the Super Assembler (SAS) 65C816 Assembler, by Jim Brain, Distributed by CMD. Most of the following information is system independent, but any information specific to the CMD SuperCPU is preliminary and subject to change. It is not the intention of this article to detail all the possible 65C816S opcodes nor their addressing modes. It is also not the intention of the article to describe the operation of the SAS assembler. = For more information on both of these products, please consult the manuals listed above. @(A): Diving Right In As this article is geared toward the programmer, we're going to dive right into the new features. Commodore World issue #12 has an overview of the CPU for those just arriving on the scene. For those who know an index register from an accumulator, read on @(A): Overview of Registers One of the features of operating in Native mode of the CPU is the enhanced set of registers available to the programmer. They are also key to explaining the other features of the CPU. So, let us go over the new register set: 8 bits 8 bits 8 bits ------------------------------------------------------------------------ [ Data Bank Register ][ X Register High ][ X Register Low* ] [ Data Bank Register ][ Y Register High ][ Y Register Low* ] [ 00 ][ Stack Register High ][ Stack Register Low* ] [ Accumulator High ][ Accumulator Low* ] [ Program Bank Register ][ Program Counter High*][ Program Counter Low*] [ 00 ][ Direct Register High ][ Direct Register Low ] ------------------------------------------------------------------------ * Original NMOS 65XX register set These registers are referred to in the remainder of the article by their acronyms, as follows: Data Bank Register (DBR) Program Bank Register (PBR) X Register High (XH) X Register Low (XL) Stack Register High (SH) Stack Register Low (SL) Y Register High (YH) Y Register Low (YL) Accumulator High (B) Accumulator Low (A) Program Counter High (PCH) Program Counter Low (PCL) Direct Register High (DH) Direct Register Low (DL) In addition, the 16 bit combination of B:A is called C, the 16 bit X and Y registers are called simply X and Y, the 16 bit Direct Register is called simply D, and the 16 bit Stack Register is called S. One more register requires discussion before we can delve into programming the '816: the Status Register (P) Bit:Description 7 N flag 6 V flag 5 1 in Emulation mode M flag in Native mode (memory select bit) 0 =3D 16 bit accumulator 1 =3D 8 bit accumulator 4 B flag in Emulation mode X flag in Native mode (Index Register Select) 0 =3D 16 bit X and Y registers 1 =3D 8 bit X and Y registers 3 D flag 2 I Flag 1 Z flag 0 C flag E flag (Emulation flag) (Can not be accessed directly) 0 =3D Native mode 1 =3D Emulation mode It is important to note that there are 3 more flags available in the Native mode version of the status register. Since there were 7 flags used before, how did WDC squeeze in the extra flags? Well, the E flag cannot be accessed or seen in the status register. The only way to change it is to set up the C flag to the intended stats of the E flag and issue the eXchange Carry and Emulation flags (XCE) opcode. Another flag, M, takes the place of the static 1 state in the old status register. M controls the length of the accumulator. The last flag, the X flag, controls the length of both index registers. Note that this flag takes the place of the B flag. Thus, the B flag is unavailable in Native mode. Since the B flag is used to determine whether a hardware IRQ or a software BRK opcode caused an IRQ interrupt, the Native mode provides separate interrupt vectors for BRK and hardware IRQs. The X and M flags are especially important in Native mode, so much so that each programmer will become intimately familiar with these flags. When a register is selected to be 8 bits wide, it emulates the operation of the register in Emulation mode. However, when the register is flipped into 16 bit operation, its length doubles everywhere. For instance, a push of the accumulator with the M flag reset causes 2 bytes to appear on the stack. Likewise, an immediate load of the accumulator will require a 3 byte instruction: one for the opcode, and a 2 byte operand. This opens up one of the nastiest gothcas on the chip, but we'll detail this later in the article. @(A): More Memory As you may be aware, the Native mode of the '816 allows the programmer contiguous access to up to 16 megabytes of RAM. This access doesn't involve tricks such as DMA, page flipping, or RAM "windows". At any given point in time, an application can access a memory location and request a memory location more than 64 kB higher in the next instruction. In order to access the new memory locations using standard 6502 addressing modes, the new DBR and PBR registers have been added. The PBR serves as the 3rd byte of the PC, allowing code to run at any location in memory. The DBR register functions as the 3rd byte for memory accesses in addressing modes like absolute mode. Of course, there are restrictions, like the inability to execute code that crosses a 64kB boundary, but these restrictions can be overcome, as you'll see below. For clarity, we will refer to the 3rd bytes of an address as the "bank", and refer to the 2 lowest bytes of an address as the offset. Alternate names include "segment" and offset, but that naming scheme was previously used with the Intel 80X86 CPU line and carries with it many bad connotations. Since memory addresses can now be 3 bytes wide and contain 6 hexadecimal digits, an obvious representation would be $xxxxxx. However, many '816 references write the address as a two part quantity, with the bank register and the 16 bit offset separated by a colon, ":". Therefore, $xxyyyy and $xx:yyyy are equivalent. In this article, the former notation is used for emphasis and because ":" notation also brings up bad connotations from Intel 80X86 CPU line. @(A): Increased Stack As the S register is now 16 bits wide, the stack can now reside in all of bank 0, giving the programmer 64 kB of stack area. As well, the S register can be set to any location in bank 0. This allows one to start stack from any non-aligned page in bank 0. @(A): Enhancements to Old Addressing Modes Even though the '816 supports the traditional 14 addressing modes of the 6502, it extends some of them to handle the extra features in the '816. Note that the opcodes and parameters have not changed for these addressing modes; rather the way the CPU treats them differs slightly. Of special note is the term "zero-page", which has been expanded into "Direct Mode". Let's take a look at what changes you can expect. @(A): Absolute Modes In the 65XX CPU, modes such as absolute and its indexed siblings each could access a memory location in the 64 kB memory map. In the '816, these modes are now capable of accessing memory above and beyond 64 kB. When accessing memory, the DBR register is prepended to the address being accessed, thus forming a 24 bit effective address. When transferring control, the PBR register is prepended. Thus, if the DBR contains a $05, the following: af ff ff lda $ffff would load a value into the .A register from $05ffff. If the M flag is set to 16 bit mode, the 16 bit value in $05ffff and $060000 will be loaded. If the M flag is set to 8 bit, only $05ffff will be loaded. Notice that this example also shows "temporary-bank-incrementing". While loading a 16 bit value with the instruction above, the DBR is "temporarily" incremented to allow access of data from bank $06. The actual DBR is left unchanged, so the next instruction will find the DBR back at $05. You'll rarely see such bank changes when accessing data as above, but it is common when using indexing modes. With the DBR at $05, executing: a2 ff ff ldx #$ffff bd ff ff lda $ffff,x will load values from $06fffe and possibly $06ffff, depending on the size of the accumulator. When using absolute mode on opcodes like JMP and JSR, the PBR register is used to form the 24 bit effective address. Unlike the DBR, the PBR does not exhibit "temporary-bank-incrementing". It simply rolls over within the same bank. Keep that in mind. @(A): Direct Modes To enhance the capabilities of the '816, the CPU offers Direct Mode, which is a superset of "zero-page mode". Basically, all z-page opcode operands are added to the D register to form a 16 bit effective address. = This allows using the entire bank 0 as effective z-page memory. With the D register set to $0200, executing: a5 10 lda $10 would load the accumulator from $000210 (and possibly $0211). Direct mode is not allowed to increment into bank 1. If the above instruction is executed while D =3D $ffff, the accumulator would start accessing data= from $000009. This highlights an important yet subtle change. No longer is lda $10 guaranteed to access data from $000010. It will access data from D + $10. Indexing changes little with respect to Direct mode. After the D register is added to the 8-bit offset, the appropriate register is added, and the effective address is normalized to fall within bank 0. There is no way to reference outside bank 0 in Direct mode. Even if index registers are set to 16 bit mode and hold $ffff, the instruction will access bank 0. @(A): Direct Indexed Indirect Mode Most programmers forget, but this mode executes in two parts. Now, it becomes important. In the first part, the 8-bit offset is added to the D register and then the X register. The result is normalized to 16 bits, and two values are accessed from bank 0. The second part takes those two bytes as the effective address, and PREPENDS the DBR register to form a final address. In this way, you can access memory outside bank 0 with this mode, but you must store the address to access in bank 0. Read that sentence again. @(A): Direct Indirect Indexed Mode Like its relation above, this mode work in two parts. In part 1, the 8 bit offset is added to the D register and normalized to 16 bits. Two bytes are accessed from bank 0, and then the 16 bit value returned is appended to the DBR to form a 24 bit effective address. In part 2, the Y register is added to this effective address to form the final address for access. As above, part 1 cannot access outside bank 0, but part 2 can. @(A): Stack Mode (Implied) Usually lumped in with the Implied addressing mode by most 6502 developers, stack mode has changed to accommodate the new widths of the registers. Depending on the width of the register, stack operations will push and pull either 1 or 2 bytes. This can cause problems if you push a 16 bit register and try to pull it off as an 8-bit register. Caveat Emptor. @(A): Immediate Mode In Emulation mode, immediate mode was simple. You specified an 8 bit immediate value to be loaded into a register. In Native mode, however, registers can be 16 bits. Everyone knows the opcode can do an immediate 8-bit load, but what opcode performs a 16-bit immediate load? Answer: the same opcode! If the register is set to 8 bits via the X or M flags, the immediate load on that register will pull in 8-bits. If the register is set to 16 bits, the instruction will load a 16 bit value. The effects of this change are monumental. An 8-bit immediate load requires 2 bytes, while a 16 bit load requires 3. This presents some problems. Since neither the opcode nor the mnemonic differs between the two forms, the assembler cannot tell which form is required from context. The develop must tell the assembler which form to use by use of assembler directives. However, this doesn't guarantee success. The developer must ensure that the flags are set correctly before executing an immediate load of any register. Improper settings will either cause the instruction to pull the next opcode into the high byte of the register or treat the high byte of the intended register value to be executed as an opcode. In my biased opinion, this is severely shortsighted. I would rank this as the number one bug that '816 developers will face. However, simple macros employed in your assembler can help minimize this problem.