######## ################## ###### ###### ##### ##### #### #### ## ##### #### #### #### #### #### ##### ##### ## ## #### ## ## ## ### ## #### ## ## ## ##### ######## ## ## ## ##### ## ## ## ## ## ##### ## ## ######## ## ## ## ### ## ## #### ## ## ##### #### #### #### #### ##### #### #### #### #### #### ###### ##### ## ###### ###### Issue #21 ################## Feb 5, 2002 ######## Special Focus on Minigames ............................................................................... "Do not meddle in the affairs of wizards, for they are subtle and quick to anger." -- Ancient Elvish saying ............................................................................... BSOUT (Why yes, I _am_ re-reading Lord Of The Rings, how did you know? Sure, the movie is pretty good, not the same as the book, but probably the best you could do, although why does every Hollywood evil creature essentially come from Night of the Living Dead? But hey, the elves actually looked like elves. Black Hawk Down is really good too, and... oh, sorry!) Hoo-ah! Welcome to another issue of C=Hacking! There's lots of nifty stuff to get to so this will be brief. First, the "Hacking Exchange" is now up at the Official Unofficial C=Hacking Homepage: http://www.ffd2.com/fridge/chacking/ It's just a simple message board, with the idea that C=Hacking types can make comments, ask questions, and otherwise talk to one another. Check it out! Second, if you have any projects you're working on... please contact me, and consider writing them up for C=Hacking! And finally, something I think you'll enjoy: http://groups.google.com/groups?hl=en&th=87c2a2ced7e32ce1&rnum=42 From: duck@clumsy.pembroke.edu (duck@clumsy.pembroke.edu) Subject: C= Hacker's Net-Mag Newsgroups: comp.sys.cbm Date: 1992-02-08 14:52:20 PST Due to the recent influx of "Tech-Know" (People who actually understand / hack te C=64 into doing stuff that was previously unknown) I am going to tentativly start a C= Hackers Net-Mag (Hacking in the 90's since of the word, not the 80's). So - please send me your netmail address if you're interested in receiving it. Or if you are interested in contatcting me concerning an article etc that you'd like to see distributed please also contact me. For the first issue I'm tentatively planning on: - Line Drawing on 8563 VDC (640 x 200 hi-res graphics) - Beginning ML column - Raster Article . . . - Craig Taylor duck@pembvax1.pembroke.edu P.S. - Not sure when the first is gonna go out but hopefully soon. From: duck@clumsy.pembroke.edu (duck@clumsy.pembroke.edu) Subject: Issue 1 - C= Hacking Available Newsgroups: comp.sys.cbm Date: 1992-02-27 17:22:44 PST Issue 1 of C= Hackers is now available via NETMAIL and is a compilation of several articles on the tehnical side of the Commodore 64 and 128. For those of you who missed the first posting, please reply via email and ask to be put on the list. The first issue had programming in ml, documented and undocumented 6502 opcodes, and a line-drawing package in machine language for the C=128 hi-res screen. Issue 2 will be coming out in a month or so. Many thanks for the bandwith. - Craig Taylor duck@pembvax1.pembroke.edu ....... .... .. . C=H 20 ::::::::::::::::::::::::::::::::::: Contents :::::::::::::::::::::::::::::::::: BSOUT o Voluminous ruminations from your unfettered editor. Jiffies o News, rumours, and stuff. Side Hacking o "Pulse Width Modulation, continued" by various. Tying up some loose ends from last issue's digi article. o "Introducing Full-Screen IFLI mode with a SuperCPU", by Todd Elliot (Hey, what's with all this MSN crapola? :) Using a SuperCPU, it is possible to use the first three columns of an (I)FLI picture, and Todd shows how. Main Articles o "VIC-20 Kernel ROM Disassembly Project, part IV", by Richard Cini And now it's time to start on that most frightening of creations: the tape drive code! o "The Art of the Minigame" -- an article in six parts: Introduction, by the editor Part 1: Codebreaker, by David Holz Part 2: TinyPlay, by S. Judd Part 3: MagerTris, by Per Olofsson Part 4: Compressing Tiny Programs, by S. Judd Part 5: TinyRinth, by Mark Seelye Part 6: Tetrattack!, by Stephen Judd .................................. Credits ................................... Editor, The Big Kahuna, The Car'a'carn..... Stephen L. Judd C=Hacking logo by.......................... Mark Lawrence Special thanks to the cbm-hackers for many otherwise unacknowledged contributions. Legal disclaimer: 1) If you screw it up it's your own fault! 2) If you use someone's stuff without permission you're a dork! For information on the mailing list, ftp and web sites, send some email to chacking-info@jbrain.com. ................................... Jiffies .................................. $01 Jochen Adler has made a program that reads the second side of a 1541 disk in a 1571 - without turning the disk over. It reads the blocks from end to beginning. Because of the mechanical bump however, it can only read tracks 5 to 35. If anybody wants this program please e-mail Jochen (NLQ@gmx.de) $02 Soci/Singular has been working on a commented C128 ROM listing. Check out this great effort at http://singularcrew.hu/c128rom/ $03 64net/2 has been updated: For those that do not know: 64net/2 is yet another PC-to-C64/128 User<->LPT parallel cable software. It supports d64/d71/d81/t64/lnx/dhd disk images, raw files and own Internet partition. It is possible to enter disk images like any other directory. Client programs are provided for C64 and C128. The next goal will be to patch ROM instead of loading client. A small BASIC example program is included that is able to send e-mails through 64net/2 host or any other machine if its IP is known. How many e-mail agents are there for C=? How many of them are written in plain BASIC 2.0? :) http://sourceforge.net/projects/c64net/ contains the latest version, and http://venus.wmid.amu.edu.pl/~ytm/64net2win.tar.gz contains a Windows binary. $04 CC65 is up to version 2.7.0, with many new improvements. Check it out at http://www.cc65.org/ $05 Moreover, Ullrich has set up his C64 as a web server at http://c64.cc65.org/ The web server runs on a stock 64, using a Swiftlink for communications, and uses the uIP TCP/IP stack written (with cc65) by Adam Dunkels http://dunkels.com/adam/uip Pretty cool, eh? $06 A new IDE interface has become available: Elysium is proud to announce a new software and hardware solution for your Commodore mass data storage needs. The CIA-IDE is yet another approach to connect an IDE hard drive to C64/128. It differs from previous similar projects in these areas: - it is the simpliest one (only two chips required (or just one in case of C128)) - it is free (documentation and software), - the software is available in source code form under GNU GPL, - there _exists_ a ready to use GEOS 2.0 driver. Documentation, source codes, and binaries are at: ftp://ftp.elysium.pl/groups/Elysium/Projects/ciaide/ $07 Marko Makela has developed a tape drive emulator with an RS-232 port, allowing transfers between any computer with a tape port and any computer with an RS-232 interface (e.g. a PC or Swiftlink). The hardware and software is at http://www.funet.fi/pub/cbm/crossplatform/transfer/C2N232/ $08 GoDot -- the C64 image processing program -- is now public domain. Arndt will still be working on it, but it's now available at http://members.aol.com/howtogodot/godnews.htm $09 The C64 is now listed in the Guiness Book of World Records; a scan of the page (from Robert Bernardo, posted by Frank Michlick) is at www.cupid.de/upload/famous.jpg (love that line about "16K sound"). Also, I highly recommend taking Robert's advice and checking out the infinitely cool "Logo-Matic" on the main www.cupid.de site! $0A JOS just keeps cruising along, with lots of changes. Among the biggest changes, of course, is the announcement that JOS will be merged with Clips, with the new system to be called "Wings" (with some of the letters capitalized and some not; I never remember that stuff). For the latest JOS news, check out http://www.jolz64.cjb.net/ $0B Frank Kontros has made a commented disassembly of the C64 ROM, with the BASIC ROM on the way: http://c64.uz.ua/sources/C64_Kernal_Disassembly.zip $0C And finally, Aleksi Eeben, author of two minigames, has now written a VIC-20 game: http://www.student.oulu.fi/~aeeben/download/dragonwing.zip more screenshots: http://www.student.oulu.fi/~aeeben/screen1.png - screen4.png Aleksi's homepage is at http://www.cncd.fi/aeeben Neat! ................................ Side Hacking ................................ Pulse Width Modulation, continued --------------------------------- from various The digi article in issue #20 of C=Hacking left a few loose ends, and generated some followups. First, Otto Jarvinen (sounddemon) emailed to say that the SID detection routine occasionally reported incorrect results for him, and suggested that a workaround was to do the detect several times. YMMV! Second, a day or two after issue #20 was released, Levente discovered a brilliant way to play 6-bit PWM digis on a stock machine: -- I couldn't resist, and tried something out (see attachment). It works!!! :-) In fact, when I wrote the last letter I didn't know that I found something useable, just had some ideas - I felt that I'm at the right place. When I read C=H 20 this morning and read your comment about the Test bit (from the PRG), I knew that it must work. All I had to do is then to put this idea into code. The whole idea is about starting the pulse by software, and then having the SID turn it back to 0 after a time. Is it possible? ...The keys are the Test bit (the SID wave counter can be reseted anytime), the pulse width register, the wave counter and the SIDs way of generating pulse wave. (Ie. the pulse wave is high, as long as the wave counter is less than the value in the pulse width register). Check this algorithm: - Init: volume at max, voice 1 sustain level max, start attack. Freq is selected well (=$4000), so the wave counter is incremented by 4 every processor clock cycles. Loop: - load next sample value, and put it to the pulse width low register ($d402; ensure that $d403 is 0). - Set test bit, and clear test bit (counter reset). - Increase sample pointer, some delay, then loop. The delay must be 64 clock cycles + the time while the Test bit is kept set (4 cycles if using STA $d404 : STX $d404 immediately with pre-loaded values). What will happen? The 8-bit sample value is put directly to the pulse width register (MSBs of the pulse width register are cleared!...). The wave counter is started (release test bit), and it increases 4 by every CPU cycles (= counts 256 in 64 cycles). After some time, the counter will reach the value in the pulse width register. This happens in exactly after (8-bit sample value / 4) cycles, because of the above. In this cycle (or the next?...) the SID turns its pulse output to 0. Voilá! One must just make sure that the loop length in cycles matches the above conditions, and then it runs like hell... Since it does exactly the same on the SID as the other (bit-banging) way, it just does it with some hardware help, there's also no problem with the 4khz maximum barrier (since the oscillator is reset every loop). With little enhancement, it's possible to write an about 7.5 bits player for a stock C64 by this method. This is what you find in the attachment... The idea is using all the 3 channels simultaneously. A slightly increased sample value is written to the three pulse width registers, so the oscillators will finish the duty cycle one processor cycle later, when there's a carry between bits(0,1) to the MSBs. The replay freq is the CPU clk / 68 (~15khz). 64 cycles (variable duty cycle) + 4 cycles (constant duty cycle because of the reset time - no problems with that, it doesn't change (just gives a small constant DC...)). By similar methods, it should be possible to write a sample player with higher PWM freq (with less resolution of course, but eliminating this still audible whistling). (I tried using the filter to reduce it, but it sounded so bad that I left it out. It clicked like hell. The FETs got saturated.) [Richard Atkinson suggested turning down the sustain volumes to avoid this] See the attachment, and the binary. I think the sample sounds pretty good :-). (The cut is from 'Greece 2000' by Three drives on a vinyl). (Another idea that popped up in my mind: since the TED sound generator can also be reset, I could probably translate this idea to the Plus/4 :-O ). Best regards, Levente -- The binary is available at http://www.ffd2.com/fridge/chacking/ towards the bottom of the page. Third, I received a very interesting email from an Apple-II guy, which I'd like to pass on: -- Hi! I found your page as I was searching for something else 6502-related, and was very interested. Although I have always been aware of the C64, I have never really been a user--I have used Apple II's since 1980. I was particularly interested in the article on playing "digis" on the C64. I became interested in playing digitized sounds on the Apple II in 1993, after hearing a 3-bit, 11.025 KHz PWM player. At 3 bits, you can imagine how noisy speech samples were, but the overall effect for a 1 MHz machine with a 1-bit speaker "toggle" was amazing. It made me wonder how far this PWM technique could be pushed on a stock, 1 MHz Apple II (not the somewhat faster, 65816-based IIgs). The short answer is, much farther than I expected! Robin and Stephen accurately describe the theoretical PWM limit as 6 bit samples at about 16 KHz for a stock 1 MHz machine, but, as they point out, that is not practically realizable for a number of reasons, unless the play loop is completely unrolled! Furthermore, in the Apple II world, sampled sounds have acquired a few standardized sampling rates--mostly as a result of Mac influence, which was in turn influenced by CD's. The most common rate in the Apple II world is 11.025 KHz, or one-fourth of the audio CD sampling rate. This is commonly considered to be "AM radio quality", with a Nyquist bandwidth of about 5.5 KHz and a practical bandwidth of 4+ KHz, given practical anti-aliasing filters (at the sampling end, not the playback end). A frequency of 11.025 KHz is, though high, still painfully audible to people whose ears are not zonked--a piercing "squeal" running through every sound. So even though it is possible to write a practical 6-bit 11.025 KHz PWM player (usually called a SoftDAC in the Apple II world), the resulting listening experience is disappointing. So I went to work on a way to do 2x oversampling, and built a 5-bit 22.050 KHz PWM player. It was sad to lose a bit, but the absence of any audible "carrier" more than compensated for it! If you have access to an 8-bit Apple II (preferably with lower case, like a //e), and also preferably with a way of attaching an external speaker or headphones in place of the miserable 2.75" internal speaker, then you can easily give it a try and judge for yourself. I'm pretty proud of the novel design of the code, which I would characterize as "vectored" unrolled loops, one for every two pulse duty cycles, which I wrote a BASIC program to write for me--much less painful for counting cycles! The package is available on the web at: http://members.aol.com/MJMahon/index.html and is called Sound Editor v2.2, since I had to "dress up" the player into something fun to play with. ;-) An earlier version of Sound Editor was published on SoftDisk in 1994, IIRC, but this one is a little more evolved. It also introduced 2:1 ADPCM compression of 8-bit sampled sounds, to save disk space. It is a lossy compression, but not very noticeably. The editor package also includes those routines, in 6502 assembly code. All of this should be trivially adaptable to the stock, 1 MHz C64, with very good results. By using the filters, you could probably filter out the 11.025 KHz carrier and return to 6-bit accuracy! I should note that in the Apple world, sampled sounds are usually represented as "excess-128" codes, which means that the sign bit is inverted. This actually simplifies things, since the sample value is within a few shifts of being the pulse width in cycles. Let me know what you think! -michael -- (Always great to hear from Atari and Apple ][ folks!) And finally, I have a little mathematical analysis of PWM and how it compares to a "straight" digi. Basically, I found some of the PWM explanations a little unconvincing in issue #20 (even though I wrote them!). For example, the idea of "average voltage" seems a little funny, since every two samples has an "average voltage", as does every four, etc. but that set of average voltages would give a different sounding signal than the original (or more dramatically, there is an average voltage over a full second of digi playback, but that's not what you hear!). So I wanted to know how a PWM signal _really_ compares to a straight digi playback. Another issue is changing the amplitude of a PWM digi, i.e. using two pulse waveforms, with one 1/16 the value of the other, to get higher resolution. If you recall the discussion of digis, the resolution of a PWM digi depends on the number of pulse widths available, not the amplitude. Adding two PWM waveforms together does not change the number of pulse widths available, so I wanted to figure out what changing the amplitude _really_ does to a PWM digi, and if it can really be exploited. And finally, I wanted to know about the carrier wave (that is so piercing at lower playback frequencies) -- and once again, how it compares with a standard digi (which, after all, is stair-stepping the voltages at the playback rate). Since the rest of this article is some Fourier analysis that 99% of people will have zero interest in, I'll put the conclusions here. The first is: PWM digis and standard digis are essentially identical except at higher frequencies (except for a phase shift, which doesn't make any difference to your ear). The second is: changing the amplitude of a PWM changes the resolution. More specifically, the amplitude of the pulse multiplies the digi sample value. If two pulses can be synced close enough, it should indeed be possible to use two pulses to get a higher resolution. Moreover, by modulating the amplitude of a single PWM digi, using the $d418 volume register -- that is, using PWM _and_ $d418 -- it should be possible to get a higher dynamic range, something that should be a little more achievable using SID (but maybe not that useful, so I didn't try it out). And finally, a standard digi has zero amplitude at the carrier frequency. In other words, after a lot of effort I was able to demonstrate what everyone already knows. The analysis doesn't change anything from the previous articles (except possibly the idea for changing the PWM amplitude to get more dynamic range). And now, some Fourier analysis. A standard digi just sets the voltage to the sample value s_j, for a length of time dt (dt = 1/sample rate). The Fourier transform of a single sample s_j (occuring at time t_j) is s_j [e^(-iw dt) - 1] * [e^(-iw t_j) / -iw] where w = angular frequency. Since the above is a little hard to read, I'll say it in words. The first term is the sample value s_j, which scales amplitudes at all frequencies. The second term is due to the finite length of the pulse (evaluating the Fourier integral at the boundaries), and basically changes the phase of the transform. The third term is like sin(w)/w -- a sinusoid with decreasing amplitude as frequency increases. So: the transform goes like sin(w)/w times the sample value, with some phase effects thrown in (we'll get back to these in a moment). A PWM digi sets the duty cycle of a pulse to the sample value s_j, giving a Fourier transform of [e^(-iw s_j dt) - 1] * [e^(-iw t_j) / -iw] Compare this with the earlier expression, and you'll see that the sample value s_j has moved up in to the exponent of the "phase term" but that they're otherwise the same. The first thing to do is to show that both expressions, PWM and standard, reduce to the same thing -- that is, that a PWM and a standard digi sound the same! The expressions both decrease as 1/frequency, due to the sin(w)/w term. This means that at large frequencies the values become negligible. (How large? For example, if the sample frequency is just 1KHz, then sin(w)/w is .001 times smaller near w=1KHz (i.e. the sample frequency, which is twice the Nyquist limit) than it is near w=0). So now consider the phase terms for small w. The Taylor expansion for e^x is 1 + x + x^2/2 + ... We can therefore expand the "phase terms" as regular: e^(-iw dt) - 1 = (1 - iw*dt + w^2 dt^2/2 + ...) - 1 = -iw*dt + O(w^2 dt^2) pwm: e^(-iw s_j dt) - 1 = -iw*s_j*dt + O(w^2 dt^2) where O(w^2 dt^2) is considered very small since w and dt are both small. Substituting the above into the original expressions gives s_j*iw*dt [e^(-iw t_j) / iw] in both cases. That is, we have shown that for "small" frequencies -- more specifically, for frequencies where (w^2*dt^2) is much smaller than (w*dt), which is where w*dt<1, which is frequencies less than the sample frequency, which is all frequencies of interest! -- PWM and standard digis are the same. The explanation lies in the phase terms. Those "phase terms" [e^(iw dt) - 1] (regular) and [e^(iw s_j dt) - 1] (PWM) do more than just change the phase. When they multiply the sin(w)/w signal, they take the sin(w)/w signal, change the phase, and then subtract the sin(w)/w signal again. It's this difference of signals that makes things work out at the frequencies we care about. PWM and standard digis are _not_ the same, but the main differences are at higher frequencies, where the amplitudes are in general much smaller. But... but... what about the PWM carrier frequency? If we take a constant digi, say with sample values = 1/2, the standard digi gives a constant voltage, whereas a PWM digi gives a square wave at the sample frequency. The answer comes from the "phase terms" above. The sample frequency is w = 2*pi/dt. Substituting this into the phase terms gives [e^(i*2*pi) - 1] (regular) and [e^(i s_j 2*pi) - 1] (PWM) The regular expression is exactly zero -- there is _nothing_ at the sample frequency of a regular digi. But that's not the case for the PWM term, because of the s_j up in the exponent. PWM digis have a _finite_ amplitude at the carrier frequency. Note that because of the sin(w)/w term it gets smaller as the sample frequency increases -- but it isn't zero. Finally, the phase term expansions give some insight into what happens when both the pulse width _and_ height are varied. If the pulse width is s_j, and the height is set to h_j, then the Fourier transform becomes h_j*s_j *iw*dt [e^(-iw t_j) / iw] That is, the amplitude multiples the width. For the case of adding two PWM waves together, then, the amplitude really does effectively scale the sample value, and it should be possible to add one PWM value at 1/16 the amplitude of another to get an effective 8-bit value. What about _varying_ the amplitude of a single PWM sequence? For a 6-bit PWM digi, say, the sample values s_j can go from 0 to 63. If this is then multiplied by h_j=2 say, then the values become 0 2 4 ... 126 -- a 7-bit number where the lowest bit is always 0. What use is that? Well, we still have the h_j=1 values of 0..63, which do include the lowest bit. So we can effectively change the dynamic range from 0..63 to 0..126 using just two amplitude values. As a practical matter, then, it might be possible to use all 15 $d018 values available to get a big dynamic range, and hence a better sounding digi, using fewer CPU cycles. Well, ok, we're only _sort of_ changing the dynamic range, so I pretty much doubt the usefulness of it. But maybe someone out there would like to give it a shot. All right, let's hope this closes the book on pulse width modulation for digi playback! ....... .... .. . C=H 20 .............................................................................. Introducing Full-Screen FLI mode for the SuperCPU Copyright (C) 2002 By Todd S. Elliott The 'FLI Bug', where the first three columns of a FLI screen are essentially unusable, can be squashed with the help of a SuperCPU. I won't go into great detail on IFLI, as it has been well-documented elsewhere, but I'll begin with a short summary to get us all up to speed. I refer you to Albert 'Pasi' Ojala's excellent coverage of the FLI mode in C=Hacking #4. Pasi also proofread this article. A Three-Minute Summary of the FLI mode The VIC-II chip asserts a badline when it needs to access the databus and fetch character data or videomatrix data. It was discovered that the VIC-II chip can be manipulated by its vertical scroll register at $d011 (SCROLY) to induce a badline at any given rasterline. By having a badline at every visible rasterline, the program can manipulate $d018 (VMCSB) to point at the right videomatrix to achieve the maximum flexibility of colors given to a multi-color screen. Unfortunately, when a program forces a badline via SCROLY, the BA (Bus Available) line in the computer goes high, and for three cycles the 6510/8510 processor has to finish its write operations or halt its read operations before the BA line is released to the VIC-II chip. The maximum number of successive write operations is three, hence the 3-cycle delay. It is in those three cycles that the VIC-II does not fetch video matrix data to fill in the first three columns and causes the 'FLI Bug'. I wish to stress that in those first three cycles, when the BA line is high, the 6510/8510 processor is still active and can complete write operations. It isn't fully shut down. After the badline retrigger at STA SCROLY, the code following it is fetched on the databus and is ready to be executed by the 6510/8510 processor. When BA is high, the VIC-II will reference the value on the databus as videomatrix data and display it in the first three columns of the screen. The actual instructions that follow the STA SCROLY in the FLI loop constitutes the video matrix data for the first three columns of the screen. Enter the SuperCPU! Normally, a VIC-II chip access is only possible every 4 cycles. The SuperCPU can access the VIC-II chip in 1 cycle (1MHz) intervals, making cycle to cycle changes possible within the VIC-II chip. More importantly, the SuperCPU tristates the 6510/8510 processor inside the host Commodore computer (which is a fancy way of saying that you can disconnect the processor from the system without physically removing it). When a forced badline retrigger occurs with a STA SCROLY in a FLI loop under the SuperCPU, the BA signal inside the host Commodore computer goes high. But, the SuperCPU runs asynchronously and really doesn't have to pay attention to the host Commodore as it runs code after the STA SCROLY. In fact, the SuperCPU will execute code even if the VIC-II badline is in full swing inside the host Commodore computer. I knew that the instruction opcodes left on the databus after the STA SCROLY made up the video matrix data for the VIC-II chip for those first three columns of the screen. But I wondered how this was possible in a SuperCPU configuration because there would be no instruction opcodes left hanging on the databus inside the host Commodore computer. After some discussions with Per Olofsson ("MagerValp"), he suggested that writes/reads to the i/o area will force a value to be put on the databus. This is where the magic begins, when the FLI loop forces the SuperCPU to write to the i/o area of the host Commodore after the forced badline retrigger at STA SCROLY. The SuperCPU will note that the BA signal is still high, so it can still access the databus and stash values there via DMA. This BA high signal will last for 3 cycles, enough for the SuperCPU to stash three values onto the databus. The 6510/8510 is still tristated by the SuperCPU, and there's nothing on the databus after the forced badline retrigger at STA SCROLY. Normally, the 6510/8510 CPU shares the databus with the VIC-II for each machine cycle. With the 6510/8510 CPU out of the equation, the SuperCPU can stash a value onto this shared bus on the CPU half of this machine cycle and the VIC-II chip will see it in its other half of the machine cycle. However, the databus is only eight bits wide. The VIC-II chip fetches video matrix data and color ram data 12 bits at a time. The SuperCPU can force values onto the databus during the first three cycles after the forced badline retrigger, but on each cycle the last four bits belonging to Color RAM would not be fed to the VIC-II chip. Only pixel values of %10 and %01 can be individually selected in multicolor FLI mode, while %11 pixel values cannot be individually set for those first three columns of the screen. The high resolution FLI mode does not suffer from this problem because it does not use color RAM for color attribute information. Full-Screen FLI in practice Let's get down to the nitty gritty. The Write I/O Approach requires three 200-byte tables, corresponding to each column. Each value on those tables correspond to each visible rasterline. For example, the first byte of each table corresponds to rasterline 50, the second byte of each table corresponds to rasterline 51, etc. The first table contains values needed for the first column of the screen, the second table contains values needed for the second column of the screen, and the third table values for the third column. In the FLI display loop prior to the STA SCROLY command, the current rasterline is used as an index to all three tables. The values are then fetched from the tables and inserted into the code that follows the STA SCROLY command using self-modifying code techniques. When the STA SCROLY happens, the code that immediately follows it starts writing the values onto the databus, all three in a row to complete the first three columns of the screen. There is a disadvantage with this approach. It requires that three 200-byte tables be specially constructed and stored somewhere in memory that is not mirrored by the SuperCPU. A routine would have to read in a FLI graphics file, extract information from the first three columns and store it into their respective 200-byte tables. Pasi Ojala came up with a graph depicting the SuperCPU interacting with the VIC-II in action, showing what happens after the forced DMA retrigger at STA SCROLY. The 'LDA #$xx' would have been modified earlier in the FLI routine (before the STA SCROLY) using self-modifying code. Here is the relevant source code which takes up 4 machine cycles inside the host Commodore computer. sta scroly : abcd, d = write Y to SCROLY on 1MHz bus CPU half - Mach. Cycle #1 lda #$00 : ef sta $d022 : ghij, j = write 1 to $D022 on 1MHz bus CPU half - Mach. Cycle #2 lda #$00 : kl sta $d022 : mnop, p = write 2 to $D022 on 1MHz bus CPU half - Mach. Cycle #3 lda #$00 : qr sta $d022 : stuv, v = write 3 to $D022 on 1MHz bus CPU half - Mach. Cycle #4 There are the two shared halves consisting of a machine cycle inside the host Commodore bus, and by stashing values onto the databus, this value is carried over to the VIC-II half and is read as videomatrix data during the first three columns of the FLI screen. ________ = VIC-II half of the 1MHz cycle . = SCPU synchronizes to 1MHz bus Each char position is equivalent to a 1 (20MHz) cycle. Mach. Cycle #1 Mach. Cycle #2 Mach. Cycle #3 Mach. Cycle #4 +-------------------+-------------------+-------------------+------------------- __________YYYYYYYYYY___DMA____1111111111___col0___2222222222___col1___3333333333___col2___ 1MHz abc.......ddddddddddefghi.....jjjjjjjjjjklmno.....ppppppppppqrstu.....vvvvvvvvvv SCPU Values on the databus which is carried over onto the VIC-II half of the databus: DMA: DMA condition detected by VIC-II col0: colors for column 0 read, gets the value 1 put into the bus by SCPU col1: colors for column 1 read, gets the value 2 put into the bus by SCPU col2: colors for column 2 read, gets the value 3 put into the bus by SCPU An alternative approach bites the dust The SuperCPU can also fetch values onto the databus by reading from the I/O region. If a coder were so inclined to use a 'Read I/O Approach', where is a program going to find 600 free bytes in the i/o region at $d000-$dfff? The idea is to force the SuperCPU to do a read on the databus via DMA and this can't be done with mirrored locations similar to the ones used in those VIC optimization modes. When a SuperCPU reads a value from mirrored memory, it does so from its local RAM and not the RAM that is inside the host Commodore computer. However, if the SuperCPU reads from the I/O block at $d000-$dfff, it will read a value from inside the host Commodore computer using DMA. Unfortunately, this approach did not work when the BA line went high inside the host Commodore computer and is unworkable for a full-screen FLI mode. The SuperCPU stops for reads if BA is high, just like its 1MHz 6510/8510 counterpart. Other Considerations. There were some interesting observations while debugging the full-screen FLI routines. The full-screen FLI routines were originally inspired by Robin Harbron's IRQ-based IFLI routines. Because they are driven by an IRQ, the CPU is still available for normal computational tasks. When all three videomatrix values are written to after the STA SCROLY in the line-interruptible FLI routine, the IRQ must then exit quickly with the restoration of the registers. It's a good idea to avoid writing to any mirrored location or read/write to any I/O region ($D000-$DFFF), since the SuperCPU will have to wait for VIC to finish with the data bus. Using a raster IRQ will naturally lead to trouble, since cycle-exact timing is needed, so a CIA timer is used. The timer may be set to synchronize a PAL or NTSC machine. Then in the FLI routine the timer can be checked and indexed into a table of preset timing values so that the forced badline retrigger at STA SCROLY will always happen at the right time on the screen, no matter what the SuperCPU is doing when the VIC-II interrupted it with an raster IRQ. Thanks goes to Ninja/The-Dreams (aka Wolfram Sang) for tips on how to create a stable line-interruptible FLI routine using timers. Source code Without further ado, here is the source code. This source code was used in the Santa Claus FLI Demo for Wheels OS. This code will run in either Commodore 64 or 128 computers and in either PAL or NTSC systems. It did take a lot of tweaking at Points #1, #2, #3, #4, & #5 as I tried to perfect the routines as closely as possible. The full source code for the Santa Claus FLI demo can be supplied via email upon request. It is in Concept+ (geoProgrammer) format. ; Wedges the full-screen FLI interrupt handler in Wheels systems. ; Thanks goes to Robin Harbron for the idea of a line-interruptible FLI routine. InstallFLI: ; Installs the FLI routine jsr ClearMouseMode ; Turn off the mouse. sei lda CPU_DATA ; Save 6510/8510 Location #$01. sta r6510 lda screenMode ; Check computer. bne 2$ ; Take branch in 128 mode. lda #IO_IN .byte $2c 2$: lda #%00110111 ; for 128 mode only sta CPU_DATA lda vmcsb ; Save original video bank info for Wheels. sta oVMBmp lda scroly ; save screen Y axis sta yaxis MoveW $fffe, oldVector ; Saves the old Wheels IRQ vector. lda #fli sta $ffff ; Point #1 lda #$31 ; Trigger the IRQ request sta raster ; At rasterline 49. lda scroly and #$7f ; Clear bit 7 of raster register. sta scroly lda #1 sta vicirq ; Ack raster ints. cli rts RemoveFLI: ; Removes the FLI routine sei MoveW oldVector, $fffe ; Restores the old Wheels IRQ vector. lda #$fb ; Trigger the IRQ request sta raster ; At rasterline 251. lda yaxis ; restore screen Y axis and #$7f ; Clear bit 7 of raster register. sta scroly lda #1 sta vicirq ; Ack raster ints. lda oVMBmp ; Restore original video bank info for Wheels. sta vmcsb lda r6510 sta CPU_DATA ; Restores 6510 Port #$01 cli jmp StartMouseMode ; Start the mouse on. fli: ; The actual FLI interrupt routine lies here. pha .byte $da ;phx .byte $5a ;phy php ; Save processor flags. ; Point #2 ldx #$03 ; #$0f for PAL SuperCPU systems. 3$ dex bpl 3$ lda raster tax ldy colOneClrs,x ; Get colors for the first three columns. sty mark4+1 ldy colTwoClrs,x sty mark5+1 ldy colThreeClrs,x sty mark6+1 ; ldy backgndTable,x ; Get background color for scanline. ; stx $d021 inx ; Point #3 cpx #$f9 ; Have we reached scanline 249? bne 1$ ; Point #1 ldx #$31 ; Restart the IRQ at rasterline 49. 1$: stx raster ; By this time, the raster interrupt register is ; incremented by one, and will re-trigger the ; same fli routine. ; This way, it is fully line-interruptible ; and frees up SuperCPU time. ldy #$01 sty vicirq ; Ack raster ints. and #$07 ; Mask out lower three bits. tax ldy tabd018,x ; Use preset values for vmcsb. lda d011tab,x ; Use preset values for scroly. sty vmcsb ; Select video matrix. sta scroly ; Forces the badline. mark4: lda #$00 ; Stores a video matrix value onto the first column. sta $d022 mark5: lda #$00 ; Stores a video matrix value onto the second column. sta $d022 mark6: lda #$00 ; Stores a video matrix value onto the third column. sta $d022 plp ; Restore processor flags. .byte $7a ;ply .byte $fa ;plx pla ; Do NOT use any memory accesses to the host rti ; Commodore databus in this part because it will ; be blocked by the VIC-II badline. ; Point #4 tabd018: ; Preset video matrix values. .byte $78,$08,$18,$28,$38,$48,$58,$68; NTSC systems ;.byte $08,$18,$28,$38,$48,$58,$68,$78 - PAL systems d011tab: ; Preset VIC DMA retrigger values. .byte $38,$39,$3a,$3b,$3c,$3d,$3e,$3f ChkAbortKey: ; Checks the RUN/STOP keypress. LoadB $dc00, #$7f ; check for the STOP key 3$: asl $dc01 ; Check for the RUN/STOP keypress. ; This also synchronizes the line-interruptible FLI routine. bcs 3$ ; Branch if it isn't pressed. rts prep3Cols: ; Prepares the first three columns of the FLI screen ; Ideally, a FLI file would be loaded in and this ; routine would then be called to set up the three ; 200-byte tables corresponding to each column, ; covering the first three columns of the screen. lda #$40 sta mark1+2 ; Prepare the marks. sta mark2+2 sta mark3+2 ldy #$00 sty mark1+1 iny sty mark2+1 iny sty mark3+1 php sei lda screenMode beq 1$ ; take branch in 64 mode. lda $ff00 pha ; save 128 configuration. lda #%01111110 ; select RAM at $4000 sta $ff00 1$: ldy #$00 ldx #$07 ; Use self-modifying code to create three ; Point #5 mark1: lda $4000 ; 200-byte tables for each column of the sta colOneClrs+49,y ; FLI screen and each value is indexed by mark2: lda $4001 ; the scanline in the FLI routine. sta colTwoClrs+49,y mark3: lda $4002 sta colThreeClrs+49,y ; use +48 for the column offset in PAL clc ; systems. lda mark1+2 adc #$04 sta mark1+2 sta mark2+2 sta mark3+2 iny dex bpl mark1 sec lda mark1+2 sbc #$20 sta mark1+2 clc lda mark1+1 adc #$28 sta mark1+1 tax inx stx mark2+1 inx stx mark3+1 lda mark1+2 adc #$00 sta mark1+2 sta mark2+2 sta mark3+2 cpy #200 bne mark1-2 lda screenMode beq 2$ ; take branch in 64 mode. pla sta $ff00 ; restore 128 configuration. 2$: plp rts .ramsect $1000 ; All column colors are referenced by scanline. colOneClrs: ; Column one colors of the FLI screen. .block 256 colTwoClrs: ; Column two colors of the FLI screen. .block 256 colThreeClrs: ; Column three colors of the FLI screen. .block 256 Hopefully the full-screen FLI possibilities that the SuperCPU can now unlock will bring forth cool software for our SuperCPU's and tons of 'eye candy'. Enjoy. ....... .... .. . C=H 20 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Main Articles :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: VIC KERNAL Disassembly Project - Part V Richard Cini February, 2002 Introduction ============ In Part 4 of this series, we discovered that of the 39 ROM routines available to be called by user code, 26 of them related to device I/O. The path to using a device from machine code was to first "open" it. So, we looked at the routines required to open and begin using a logical device. When we concluded the discussion on the OPEN command, we left out seven routines dealing with the tape device. This is some of the most complex code I've ever seen. Unlike the IEEE serial peripherals, the tape deck is a "dumb" device, meaning that the VIC Kernal has to do the work of moving the data to/from the tape. With the serial peripherals, the Kernal just sends a character to the device that then acts on it independently of the VIC. After opening the device, there are nine non-tape routines to deal with getting data into and out of the VIC, including talking and untalking, listening and unlistening, moving the characters in and out, and some secondary address stuff. In this installment, we'll conclude our discussion of the OPEN procedure by looking at the tape routines that are called from OPEN. Later articles will discuss the actual movement of bits to and from the cassette and the remaining IEEE serial routines, as well as the loading and saving of files. Tape Routines ============= In the last installment, we examined the beginnings of opening a device for use by a program. The reader will find that the routines are convoluted and hard to follow, with jumps and branches into apparently unrelated subroutines. Overall it appears to be ugly but highly functional code. Here's a "calling tree" outline to the routines called by OPEN: IOPEN---- |-FIND (analyzed in Part IV of this series) |-SENDSA (ultimately handles IEEE serial stuff) |-SEROPN (handles RS232 stuff) | |(all tape-related from here down) |-GETBFA (get address of tape buffer) |-PLAYMS (prompt user to press PLAY on tape deck) | |-TPSTAT (tape key status) | |-TPSTOP (check for STOP keyboard key during tape ops) | |-SRCHMS (print "Searching" or "Searching for..." messages) |-LOSCPH (locate a tape header with filename matching one in | OPEN) |-LOCTPH (locate first/next header; no filename specified) | |-SETBST (sets tape buffer start/end pointers) | |-PLAYMS (see above) | |-TPCODE (main tape code-moves bits in and out on IRQ) | |-SBIDLE (serial buss idle check) | |-STOIRQ1 (put key tape vectors into table) | |-NCHAR (sets bit counter for char input operations) | |-TPSTOP (see above) | |-IUDTIM (update jiffy clock; previous installment) | |-RECDMS (prompt user to press PLAY & RECORD on tape deck) |-WRTPHD (write a tape header to tape) | |-SETSBT (see above) | |-TPWRIT1 (precedes TPCODE by 12 bytes) When dealing with the tape, it helps to understand that Commodore built the tape protocol with an eye towards readability under adverse conditions, including tape quality and motor speed. This made Commodore tapes probably one of the most reliable data systems when compared to TI, Apple, and Tandy, among others. Data headers and data blocks are repeated on the tape and a comparison is made between the two reads to ensure integrity. Additionally, the recording method is self-clocking so the effects of varying tape speed are minimized. One of the first routines used in opening the tape device is determining the address of the tape buffer and making sure that it's in the $02xx page (or higher) of the system RAM. A test at IOPEN_S5 bails out with an "illegal device" error if the tape buffer isn't just so. F84D ;============================================================= F84D ; GETBFA - Get start of tape buffer F84D ; Returns buffer address in .X (LSB) and .Y (MSB) F84D ; F84D GETBFA F84D A6 B2 LDX TAPE1 F84F A4 B3 LDY TAPE1+1 F851 C0 02 CPY #$02 ;is buffer at $02xx? F853 60 RTS PLAYMS is called by IOPEN at IOPEN_S6. A test there determines if we're trying to read/load or write/save from/to a tape. Read mode results in the "Press Play..." message, the "Searching for..." message and two routines that search for the appropriate tape header. F894 ;=============================================================== F894 ; PLAYMS - Wait for tape key on read F894 ; F894 PLAYMS F894 20 AB F8 JSR TPSTAT ;get tape key status F897 F0 1C BEQ TPSTEX ;$F8B5 pressed? yes, exit. F899 F899 A0 1B LDY #KIM_PLAY ;offset for "Press Play..." message F89B PLAYMS1 F89B 20 E6 F1 JSR MSG ;print it F89E WTPLAY F89E 20 4B F9 JSR TPSTOP ; check for STOP key F8A1 20 AB F8 JSR TPSTAT ;get key status F8A4 D0 F8 BNE WTPLAY ;$F89E wait for PLAY switch F8A6 A0 6A LDY #KIM_OK ;offset for "OK" message F8A8 4C E6 F1 JMP MSG ;print it and return to caller This simple routine prints the "Searching for..." message only if in direct mode, and if appropriate, the filename that is being searched for. If no filename is present, the message is changed to "Searching..." (without the "for") before it's output to the screen. F647 ;============================================================== F647 ; SRCHMS - Print "Searching for [filename]" F647 ; F647 SRCHMS F647 A5 9D LDA CMDMOD ;direct mode? F649 10 1E BPL SRCHEX ;$F669 no (programmed mode), exit F64B F64B A0 0C LDY #KIM_SRCH ;"Searching" message F64D 20 E6 F1 JSR MSG ;output message F650 A5 B7 LDA FNMLEN ;get filename length F652 F0 15 BEQ SRCHEX ;$F669 no filename, skip "for" F654 A0 17 LDY #$17 ;point to "FOR" in "Searching For" F656 20 E6 F1 JSR MSG ;print it F659 ; F659 ; FLNMMS - Print filename F659 ; F659 FLNMMS F659 A4 B7 LDY FNMLEN ;get filename length F65B F0 0C BEQ SRCHEX ;$F669 no filename, exit F65D A0 00 LDY #$00 F65F FLNMLP ; loop to print filename F65F B1 BB LDA (FNPTR),Y ;get character F661 20 D2 FF JSR CHROUT ; and print it F664 C8 INY F665 C4 B7 CPY FNMLEN F667 D0 F6 BNE FLNMLP ;$F65F loop F669 SRCHEX F669 60 RTS ;exit If the user is attempting to open a tape file with a specific filename, the IOPEN code makes a call to LOCSPH in IOPEN_S6 to find the file header associated with the filename. If the specific header is not found, the routine emits the "File not Found" error message. LOCSPH is a loop wrapper around the LOCTPH routine that searches for the "next" file header regardless of name. The secondary address parameter of the OPEN command defines whether the action is to (0) read a tape file and relocate it in memory, (1) read a tape file without relocation (a machine program), or (2) write a tape file and put both EOF an EOT markers after it. Once a header is found, the filename from the header is compared to the filename in the OPEN command to see if there's a match. F867 ;============================================================ F867 ; LOCSPH- Find specific tape header F867 ; F867 LOCSPH F867 20 AF F7 JSR LOCTPH ;search for next header F86A B0 1D BCS LCSPEXC+1 ;$F889 returned EOT? Go to ready F86C A0 05 LDY #$05 ;filename offset in header F86E 84 9F STY TPTR2 ;save offset F870 A0 00 LDY #$00 ;loop counter F872 84 9E STY TPTR1 ;save it F874 LCSPHLP F874 C4 B7 CPY FNMLEN ;compare filename length F876 F0 10 BEQ LCSPEXC ;$F888 counter 0, exit F878 F878 B1 BB LDA (FNPTR),Y ;get filename letter F87A A4 9F LDY TPTR2 ; offset to name in header F87C D1 B2 CMP (TAPE1),Y ;compare to tape header F87E D0 E7 BNE LOCSPH ;f867 different, get next header F880 E6 9E INC TPTR1 ;increment counters F882 E6 9F INC TPTR2 F884 A4 9E LDY TPTR1 F886 D0 EC BNE LCSPHLP ;$F874 compare next character F888 LCSPEXC F888 18 CLC ; exit success F889 60 RTS Tape searches are performed linearly, so the LOCTPH routine is used to search for the next file header on the tape starting from the current tape position. This routine is also called by LOAD through IOPEN if no filename is provided as a parameter to the LOAD (or OPEN) call. Additionally, it's called in a loop by the LOCSPH routine when searching for a specific file. LOCTPH reads a tape block to the tape buffer using TPREAD and examines a few important fields in the file header. The fields examined indicate the file type and the filename. If the file header indicates that the file is anything other than a program file or a data file, the routine looks for another file header until it reaches the end of the tape. If BASIC is in "direct mode", LOCTPH prints the "Found" message in addition to the filename. F7AF ;============================================================ F7AF ; LOCTPH - Read any tape header F7AF ; Header type: 1=BASIC program, 2=data block, 3=machine program, F7AF ; 4=data header, 5=end-of-tape marker F7AF ; F7AF LOCTPH F7AF A5 93 LDA IOFLG2 F7B1 48 PHA ;save load/verify flag F7B2 20 C0 F8 JSR TPREAD ;read tape block to buffer F7B5 68 PLA F7B6 85 93 STA IOFLG2 ;restore flag F7B8 B0 2C BCS LOCTPEX ;F7E6 error, end search F7BA A0 00 LDY #$00 ;index reg F7BC B1 B2 LDA (TAPE1),Y ;get header type F7BE C9 05 CMP #$05 ;EOT? F7C0 F0 24 BEQ LOCTPEX ;$F7E6 yes, exit F7C2 C9 01 CMP #$01 ;BASIC program? F7C4 F0 08 BEQ LOCTP1 ;$F7CE yes, branch F7C6 C9 03 CMP #$03 ;ML program? F7C8 F0 04 BEQ LOCTP1 ;$F7CE yes, branch F7CA C9 04 CMP #$04 ;data header? F7CC D0 E1 BNE LOCTPH ;must be data block-skip it F7CE LOCTP1 ;program or data header comes here F7CE AA TAX F7CF 24 9D BIT CMDMOD ;direct mode? F7D1 10 11 BPL LOCTPEX-2 ;$F7E4 no, continue F7D3 A0 63 LDY #KIM_FOUN ;setup for "Found" msg F7D5 20 E6 F1 JSR MSG ;print it F7D8 A0 05 LDY #$05 ;offset to file name F7DA LOCLOOP ; loop to print filename F7DA B1 B2 LDA (TAPE1),Y ;print loop F7DC 20 D2 FF JSR CHROUT F7DF C8 INY F7E0 C0 15 CPY #$15 ;21 characters max F7E2 D0 F6 BNE LOCLOOP ;$F7DA loop F7E4 18 CLC F7E5 88 DEY F7E6 LOCTPEX F7E6 60 RTS The TPREAD routine is responsible for the setup required to read or verify a block from the tape. It prompts the user to press the PLAY button on the tape deck, disables system interrupts and sets some important parameters before execution falls through to the code responsible for starting the tape IRQ routines. In many Commodore machines, tape operations are performed under the operation of a routine installed as a temporary IRQ handler -- sort of a cheap multitasking so that the system doesn't appear to be hung while tape operations are occurring. Execution ultimately comes to code at $F8EF (TPCODE) which is responsible for installing and starting the tape IRQ routine. All of this and we haven't yet reached the bits on the tape :-) F8C0 ;========================================================== F8C0 ; TPREAD - Read tape block F8C0 ; F8C0 TPREAD F8C0 A9 00 LDA #$00 F8C2 85 90 STA CSTAT ;clear status variable... F8C4 85 93 STA IOFLG2 ;and read/verif flag F8C6 TPREAD1 F8C6 20 54 F8 JSR SETBST ;set tape buffer pointers F8C9 ; F8C9 ; load program F8C9 ; F8C9 TPREAD2 F8C9 20 94 F8 JSR PLAYMS ;wait for Play key F8CC B0 1F BCS TPCODE-2 ;$F8ED (in TPWRIT1) F8CE F8CE 78 SEI ;disable interrupts F8CF A9 00 LDA #$00 ;clear work memory for IRQ routines F8D1 85 AA STA RIDATA F8D3 85 B4 STA BITTS F8D5 85 B0 STA TPCON1 F8D7 85 9E STA TPTR1 F8D9 85 9F STA TPTR2 F8DB 85 9C STA BYTINF F8DD A9 82 LDA #$82 ;Timer H constant F8DF A2 0E LDX #$0E ;number for IRQ vector F8E1 D0 11 BNE TPCODE1 ;$F8F4 (TPCODE1 in TPWRIT below) ; falls through to TPWRIT below) SRCHMS also calls this small routine to determine if a key is pressed on the tape deck. TPSTAT looks at PA6 on VIA1 to determine the key state and sets up the Z flag for a compare to be performed in SRCHMS. F8AB ;============================================================ F8AB ; TPSTAT - Check tape key status F8AB ; F8AB TPSTAT F8AB A9 40 LDA #%01000000 ;$40 F8AD 2C 1F 91 BIT D1ORAH ;switch sense F8B0 D0 03 BNE TPSTEX ;$F8B5 not pressed, exit F8B2 2C 1F 91 BIT D1ORAH ;button pressed. Setup for another F8B5 ;compare later. Z=1 if pressed F8B5 TPSTEX F8B5 18 CLC F8B6 60 RTS ;return clear One of the tests performed in IOPEN (also at IOPEN_S6) is to determine if the tape operation is a read or a write. If we're in the write mode, the code jumps to IOPEN2. At IOPEN2, the Kernal prompts for the user to press play and record on the tape deck and then writes a file header by calling WRTPHD with a control ID of $04 (a data header). Then, IOPEN writes a control byte ID of $02 (block is a data block) to the tape buffer and returns. RECDMS is called by IOPEN to determine if a key is pressed on the tape deck and if not, sets the message flag to the "Press Play & Record" message and prints the message by calling into the PLAYMS routine. PLAYMS prints the message and then checks for the key press. F8B7 ;=========================================================== F8B7 ; RECDMS - Wait for tape key on write F8B7 ; F8B7 RECDMS F8B7 20 AB F8 JSR TPSTAT ;get button status F8BA F0 F9 BEQ TPSTEX ;$F8B5 pressed? Yes, continue F8BC A0 2E LDY #KIM_RECP ;no, remind to "Press Play & Record" F8BE D0 DB BNE PLAYMS1 ;exit through $F89B IOPEN calls WRTPHD at IOPEN2 with $04 as the control byte (following block is a data header) to be written into the header. WRTPHD then writes some critical information into zero-page locations in advance of filling the tape buffer with the same information. At the end of the routine, the data is written to the tape in WRTPH1. F7E7 ;========================================================== F7E7 ; WRTPHD - Write tape header F7E7 ; On entry, .A is the header type: 2=data blk; 4=data hdr F7E7 ; F7E7 WRTPHD F7E7 85 9E STA TPTR1 ;save header type F7E9 20 4D F8 JSR GETBFA ;get tape buffer address F7EC 90 5E BCC WRTPEX ;$F84C exit if not right F7EE A5 C2 LDA STAL+1 ; save some program info F7F0 48 PHA ;save...start H F7F1 A5 C1 LDA STAL F7F3 48 PHA ;...start L F7F4 A5 AF LDA EAL+1 F7F6 48 PHA ;...end H F7F7 A5 AE LDA EAL F7F9 48 PHA ;...end L F7FA A0 BF LDY #$BF ;buffer length-1 (191) F7FC A9 20 LDA #$20 ; {space} F7FE WRTPLP1 ; write program data to tape buffer F7FE 91 B2 STA (TAPE1),Y ;clear buffer F800 88 DEY F801 D0 FB BNE WRTPLP1 ;$F7FE F803 A5 9E LDA TPTR1 ;get header type F805 91 B2 STA (TAPE1),Y ;write it F807 C8 INY F808 A5 C1 LDA STAL ;get start L F80A 91 B2 STA (TAPE1),Y ;write it F80C C8 INY F80D A5 C2 LDA STAL+1 ;get start H F80F 91 B2 STA (TAPE1),Y ;write it F811 C8 INY F812 A5 AE LDA EAL ;get end L F814 91 B2 STA (TAPE1),Y ;write it F816 C8 INY F817 A5 AF LDA EAL+1 ;get end H F819 91 B2 STA (TAPE1),Y ;write it F81B C8 INY F81C 84 9F STY TPTR2 ;save buffer offset F81E A0 00 LDY #$00 ;filename loop counter F820 84 9E STY TPTR1 ;save loop counter F822 WRTPLP2 ; write filename to buffer F822 A4 9E LDY TPTR1 ;get loop counter F824 C4 B7 CPY FNMLEN ;compare filename length F826 F0 0C BEQ WRTPH1 ;$F834 done F828 B1 BB LDA (FNPTR),Y ;get filename char F82A A4 9F LDY TPTR2 ;get tape buffer pointer F82C 91 B2 STA (TAPE1),Y ;write char to buffer F82E E6 9E INC TPTR1 ;increment loop counters F830 E6 9F INC TPTR2 F832 D0 EE BNE WRTPLP2 ;$F822 loop F834 WRTPH1 ; flush buffer to tape F834 20 54 F8 JSR SETBST ;set start and end address pointers F837 A9 69 LDA #$69 ;checksum block ID F839 85 AB STA RIPRTY ;save parity character F83B 20 EA F8 JSR TPWRIT1 ;$F8EA write block F83E A8 TAY ;restore start and end addresses F83F 68 PLA F840 85 AE STA EAL F842 68 PLA F843 85 AF STA EAL+1 F845 68 PLA F846 85 C1 STA STAL F848 68 PLA F849 85 C2 STA STAL+1 F84B 98 TYA F84C F84C WRTPEX F84C 60 RTS ;exit SETBST is a helper routine called by LOCTPH and WRTPHD to setup the tape buffer before a tape operation takes place. It sets the starting address of the buffer to the first address of the assigned buffer range and sets the end of the buffer to start + 192 bytes. F854 ;========================================================== F854 ; SETBST - Set buffer start/end pointers F854 ; F854 SETBST F854 20 4D F8 JSR GETBFA ;get buffer address F857 8A TXA F858 85 C1 STA STAL ;start=start of buffer F85A 18 CLC F85B 69 C0 ADC #$C0 ;end=start+192 F85D 85 AE STA EAL F85F 98 TYA F860 85 C2 STA STAL+1 F862 69 00 ADC #$00 F864 85 AF STA EAL+1 F866 60 RTS TPWRIT performs the nuts and bolts of moving cassette data in and out of the VIC. The VIC moves tape data by using a series of interrupt routines that are swapped into the IRQ vector as needed. The benefit here is that the tape IRQ code is then executed 60 times per second, along with normal processing, until the operation is complete, resulting in a cheap form of multitasking. TPWRIT performs some setup chores before changing the IRQ vector, including clearing interrupts, ensuring that the IEEE serial bus is idle, saving the old vector, assigning the new vector and setting-up the variables used by the tape IRQ routine to actually move bits. Finally, interrupts are enabled at $F92E which starts the whole process. When the tape IRQ routine is finished it restores the IRQ vector to the standard $EABF which is detected by TPWRIT at TPCDLP2 (an I/O loop). When completed, TPWRIT exits through the TPSTOP routine. This loop also updates the jiffy clock. F8E3 ;========================================================== F8E3 ; TPWRIT - Initiate tape write F8E3 ; F8E3 TPWRIT F8E3 20 54 F8 JSR SETBST ;get buffer pointers F8E6 A9 14 LDA #$14 ;checksum F8E8 85 AB STA RIPRTY ;save it F8EA ; F8EA ; write buffer to tape F8EA ; F8EA TPWRIT1 F8EA 20 B7 F8 JSR RECDMS ;wait for Record+Play keys F8ED B0 68 BCS TPSTPX1 ;$F957 exit F8EF ; F8EF ; TPCODE - Common tape code F8EF ; F8EF TPCODE F8EF 78 SEI ;disable interrupts F8F0 A9 A0 LDA #%10100000 ;$A0 Timer H constant F8F2 A2 08 LDX #%00001000 ;$08 offset for tape IRQ vector F8F4 TPCODE1 F8F4 A0 7F LDY #%01111111 ;$7F disable interrupts F8F6 8C 2E 91 STY D2IER ;save to interrupt enable reg F8F9 8D 2E 91 STA D2IER ; on VIAs F8FC 20 60 F1 JSR SBIDLE ;wait for timeout F8FF AD 14 03 LDA IRQVP ;save current IRQ Vector F902 8D 9F 02 STA TAPIRQ F905 AD 15 03 LDA IRQVP+1 F908 8D A0 02 STA TAPIRQ+1 F90B 20 FB FC JSR STOIRQ1 ;$FCFB .X=8 set tape IRQ vectors F90E A9 02 LDA #$02 ;read # of blocks F910 85 BE STA FSBLK F912 20 DB FB JSR NCHAR ;set bit counter for serial shifts F915 AD 1C 91 LDA D1PCR F918 29 FD AND #%11111101 ;$FD CA2 manual low F91A 09 0C ORA #%00001100 ;$0C force bits 2,3 high F91C 8D 1C 91 STA D1PCR ;switch on tape drive F91F 85 C0 STA CAS1 ;flag as on F921 A2 FF LDX #$FF ;delay loop for high (outer) F923 A0 FF LDY #$FF ;inner loop F925 TPCDLP1 F925 88 DEY F926 D0 FD BNE TPCDLP1 ;$F925 F928 CA DEX F929 D0 F8 BNE TPCDLP1-2 ;$F923 outside loop F92B 8D 29 91 STA D2TM2H F92E 58 CLI ;allow tape interrupts F92F ; F92F ; wait for I/O-end F92F ; F92F TPCDLP2 F92F AD A0 02 LDA TAPIRQ+1 ;check IRQ direction F932 CD 15 03 CMP IRQVP+1 ;standard vector? F935 18 CLC F936 F0 1F BEQ TPSTPX1 ;$F957 yes, ready F938 20 4B F9 JSR TPSTOP ;no, check STOP key F93B AD 2D 91 LDA D2IFR ;timer 1 IF clear? F93E 29 40 AND #%01000000 ;$40 F940 F0 ED BEQ TPCDLP2 ;$F92F continue F942 AD 14 91 LDA D1TM1L ; get timer 1 loword F945 20 34 F7 JSR IUDTIM ;update RTC F948 4C 2F F9 JMP TPCDLP2 ;$F92F loop TPSTOP is another helper routine. It scans for the keyboard STOP key, and if detected, restores the standard IRQ vector and returns to the caller's caller (the caller to TPWRIT). F94B ;=========================================================== F94B ; TPSTOP - Check for tape stop F94B ; F94B TPSTOP F94B 20 E1 FF JSR STOP ;scan STOP key F94E 18 CLC F94F D0 0B BNE TPSTPX ;$F95C not pressed, return F951 20 CF FC JSR RESIRQ ;pressed, turn drive off and restore IRQ F954 38 SEC ;set error flag F955 68 PLA ;pop return address F956 68 PLA F957 TPSTPX1 F957 A9 00 LDA #$00 ;flag normal IRQ vector F959 8D A0 02 STA TAPIRQ+1 F95C TPSTPX F95C 60 RTS ; return to caller's caller The SBIDLE routine is used in TPWRIT to detect if the RS-232 serial bus is active and if so, will wait until it's idle before returning to continue the tape code. F160 ;=========================================================== F160 ; SBIDLE - Set timer for serial bus timeout F160 ; F160 SBIDLE F160 48 PHA ;save .A F161 AD 1E 91 LDA D1IER ;get IER F164 F0 0C BEQ SBIDLEX ;$F172 no interrupts pending, exit F166 SBIDLLP F166 AD 1E 91 LDA D1IER ;get IER F169 29 60 AND #%01100000 ;$60 T1 & T2 F16B D0 F9 BNE SBIDLLP ;$F166 F16D F16D A9 10 LDA #%00010000 ;$10 kill CB1 RS232 F16F 8D 1E 91 STA D1IER F172 SBIDLEX F172 68 PLA F173 60 RTS RATS3 is at the tail end of the RAMTAS routine - the subroutine that precedes the tape vectors. FCF6 ;=========================================================== FCF6 ; STOIRQ - Set IRQ vector FCF6 ; usually called with .x=$08 or $0e. $08 points to the first FCF6 ; entry in TAPVEC while $0e points to the last entry FCF6 STOIRQ FCF6 20 CF FC JSR RESIRQ ;restore std IRQs FCF9 F0 97 BEQ TPEOI ;$FC92 FCFB STOIRQ1 FCFB BD E9 FD LDA RATS3,X ;$FDE9,X set vectors from table FCFE 8D 14 03 STA IRQVP FD01 BD EA FD LDA RATS3+1,X ;$FDEA,X FD04 8D 15 03 STA IRQVP+1 FD07 60 RTS These are the actual routines responsible for writing bits to the tape. Various calling routines place these vectors into the IRQ vector that then gets executed 60 times per second. In order, these routines are for writing a leader block to the tape, the routine to write data to the tape, the standard IRQ vector, and the routine to read bits from the tape. FDF1 ;=========================================================== FDF1 ; TAPEVC - Tape IRQ vectors FDF1 ; FDF1 TAPEVC FDF1 ; .dw $FCA8, $FC0B, $EABF, $F98E FDF1 A8FC0BFCBFEA .dw WRLDR2, TWRD7, IRQVEC, RDTPBT FDF7 8EF9 The NCHAR routine resets the internal bit counters to their initial state before a calling routine begins to shift the bits over a serial or tape channel. It sets the bit length to 8 and resets intermediate variables to 0. FBDB ;=========================================================== FBDB ; NCHAR - Set bit counter for new character (serial output) FBDB ; FBDB NCHAR FBDB A9 08 LDA #$08 FBDD 85 A3 STA SBITCF FBDF A9 00 LDA #$00 FBE1 85 A4 STA CYCLE FBE3 85 A8 STA BITCI FBE5 85 9B STA TPRTY FBE7 85 A9 STA RINONE FBE9 60 RTS That's all of the room we have for this part of the series. All of this code and we still haven't moved the bits off the tape. So, next time we'll look at the actual routines responsible for moving bits on and off of the tape and we'll begin to look at the IEEE serial routines. ....... .... .. . C=H 20 :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: ----------------------- The Art Of The Minigame ----------------------- Introduction ------------ In summer 2001, an 8-bit minigame contest was held. Voter turnout may have been low, but author turnout was high, with a total of thirty entries for the C64, Spectrum, Amstrad, and Atari 8-bit. The entries and the results are available at http://demo.raww.net/minigame/, and what follows are a series of articles by several minigame authors. The articles are for enjoyment, to stimulate thought, and, hopefully, to motivate people for next year's contest. (Everyone had a great time, btw.) Minigames -- writing tiny programs in general -- present a set of very unique challenges. Whereas many of us are used to optimizing programs for cycle-efficiency, optimizing programs for byte-efficiency turns out to be very different -- challenging, at times aggrivating, but very rewarding. I think you will find the challenges and solutions ahead to be very interesting. The game authors were pioneers, exploring pretty new territory, and I salute all that entered the contest (especially for making such a nice C64 showing). A special mention should be made of MagerValp ("Skinny Puppy" in Swedish, in case you've been wondering), who motivated a number of lazy people (e.g. myself) to get involved with the contest. Before diving in to the articles ahead, which contain lots of tricks to save memory, I thought it might get us in the mood to go over some general ideas and concerns for saving bytes on a C64. It should be obvious that some balance has to be found between cycle- efficiency and memory-efficiency. Routines should be reasonably fast in most cases, but the major portion of that balance is -- has to be -- memory efficiency. Obviously, everything that can be put into zero page should be put there, since zero page instructions are two bytes instead of three. Equally obvious is that every list-type fetch should use the .X register, since there is no zero-page lda xx,y instruction. Moreover, memory is initialized to various values when the computer is reset. Many zero-page locations have specific values -- by careful code design, these can be taken advantage of to avoid any initialization code. For example, many times the code needs a zero-page pointer with the low byte zero (i.e. sta (point),y). If that variable is chosen as a location which is already zero, the code doesn't have to waste four bytes on lda #00 sta point. One-time counters can also be used in this way. Finally, certain locations can be manipulated to have certain values; for example, the Tetrattack load address forces the end address+1 to be $10f0, because $10 and $f0 are the z-coordinates of two object vertices. There is also a fairly substantial kernal to take advantage of -- routines to clear the screen, perform memory manipulations, and so on. Knowing what is available, and what the routines do, is handy. Self-modifying code comes in handy when large portions of code can be used for similar things with just a few changes. For example, the line routine in Tetrattack uses a single routine for lines in both the x- and y-directions, by just interchanging a few INX/INY instructions. And finally, there are things which Mark Seelye terms "injections", which seems like a good term to me. The idea is to reuse known register values whenever possible. That is, instead of having LDA #00 STA zp in a subroutine, you might put an "STX zp" in an earlier subroutine, where you know the value of .X is zero. In other words, the instruction to clear the zp location is "injected" into a different subroutine to save a couple of bytes. Every program below uses this trick in one form or another. You'll see lots of neat tricks in the articles ahead, but the basic design framework is: put stuff in zero-page, take advantage of default zero-page values, take advantage of the kernal, always know the register values, and reuse as much code as possible. With that, let's check out the programs. ....... .... .. . C=H 20 =====================================Part 1=================================== Postmortem: Codebreaker by White Flame (aka David Holz) http://white-flame.com The most difficult part of making an entry for this compo was deciding what game to do that could fit in 512 bytes. The ubiquitous snake clones and scroll/dodge games abound; finding something different is truly a challenge. I wanted to do a Mastermind game a looong time ago, with a full-on "We're getting attacked, solve to code to save the world!" type of setting with lots of animation, but usually found myself in application and library/utility programming instead. But luckily this idea popped back into my head as something different that would fit in the 2-page limit. The concept is pretty simple (as most 512-byte games are). There's a 4-color secret key, you get 10 guesses at it. Two numbers reflect the score for the current guess: a black number shows how many slots have the right color, and a white number shows how many correct colors are in the guess, but in the wrong slot. Experienced players should be able to deduce the correct answer in 4-6 guesses. Random number generation There are 3 obvious ways to get random numbers easily in the 64: SID oscillator with noise waveform, BASIC prng, or the raster counter. I decided to go for the raster solution. Obviously, you need to wait a certain amount of time between reads of the raster location, or they'd be the same or linear. To fix this, every time I wanted a random number, I'd read the raster and call a loop that many times. When it was done, $d012 should have a "random enough" number in its lower 3 bits (to get a color from 0 to 5). randomDelay: ldx $d012 randomLoop: jsr delay dex bne randomLoop; delay: dey bne delay rts Code space saving What's interesting is that it actually took less bytes of code and data to have custom draw loops for the 3 textual messages than setting a pointer and calling a drawText function. ldy #22 titleloop: lda titletext,y sta screenLoc-48,y dey bpl titleloop: These functions assume that you're not running on an ancient kernal, so color memory is initialized to the cursor color (white). I also had all variables (except the secret key) use screen memory, color memory, or register .Y instead of manipulating off-screen variables and then updating the screen to match: - For entering a guess, color memory is INC'd and DEC'd, with .Y holding the current offset from the fixed screen location. JSR $e8ea is used to scroll the screen, and everything is drawn to screen line 23. - To find out if you've already made 10 guesses, a memory location near the top of the screen was polled to see if it was still a space. As guesses are made, the screen was scrolled, and if the title line ever scrolled into that memory location, it's game over. - The current guess score '0 0' is drawn to the screen first, then those screen values are INC'd or DEC'd as matches are found. - Winning constitutes checking to see if the screen has a '4' in the black score location. Score calculation This was a bit of a pain. At first I trying to calculate the white score by the scoring rules (every matching color that was in the wrong slot), INCing the white score on every hit, then looping for exact matches, INCing the black score on every hit. This code ended up being way too big, so I got the idea to simplify the white loop: INC the white score on every color match, whether in the right spot or not, then on every black match, DEC the white score and INC the black score. This gave the same result, but with much simpler code. Conclusion All in all, it was an interesting challenge for me, and had I not gotten stuck in Phoenix for an extra week, I'd have finished my 2k entry. The code is pretty straightforward, and at 415 bytes total, I'm quite happy to claim having the smallest entry, even if I didn't win (why Pacman didn't beat out a low-res scroll/dodge/single-shot game is a mystery to me). Being a hobbyist C64 programmer, and my career history being more on the research end of things, having a deadline is a nice change in terms of getting a project completed. :) Publisher: Gasman/Raww Arse Number of full-time developers: 1 programmer Number of contractors: 0 Estimated budget: 0 USD (0 EUR) Development time: About 5 hours Release date: August 22, 2001 Platforms: Commodore 64 and compatibles Development hardware: pee sea (1.4GHz T-bird, 512MB RAM, 100GB HD, Win98SE) 3rd party tools: xa, Vice, Notepad In-house tools: none Project size: 290 lines of 6502 assembly code, containing 1 line of tokenized BASIC --- cb.asm --- .word $0801 ;Starting address for loader * = $0801 .word nextLine ;Line link .word $0 ;Line number .byte 158 ;SYS .byte 50,48,54,54 ;"2066" .byte 0 nextLine .byte 0,0 ;end of basic screenGameOver = $0400+(4*40)+16 screenLoc = $0400+(23*40)+16 colorLoc = $d800+(23*40)+16 screen = $fd color = $fb peg = 81 cursor = 87 crsr_up = 145 crsr_down = 17 crsr_left = 157 crsr_right = 29 WaitStart: jsr $ffe4 beq WaitStart Start: ;Change to white lda #$05 jsr $ffd2 ;Clear screen lda #147 jsr $ffd2 ;Set bg colors lda #$00 sta $d020 sta colorLoc sta colorLoc+1 sta colorLoc+2 sta colorLoc+3 sta $ff ;for cursor lda #11 ;dark gray sta $d021 ;Init zp pointers lda #screenLoc sta screen+1 lda #>colorLoc sta color+1 ;Draw text ldy #22 titleloop: lda titletext,y sta screenLoc-48,y dey bpl titleloop: ;Randomize values iny newGuess sty $02 guessAgain jsr randomDelay lda $d012 and #$07 cmp #$06 bpl guessAgain ldy $02 checkLoop: ;See if the color's already in the key dey bmi goodGuess cmp secret,y beq guessAgain bne checkLoop goodGuess: ;Save the color and put a peg on the screen to show progress ldy $02 sta secret,y lda #peg sta (screen),y iny cpy #$04 bne newGuess Game_Loop: ;Get cursor pos ldy $ff ;Draw cursor lda #cursor sta (screen),y Key_Loop: ;Wait for key getin: jsr $ffe4 beq getin ldy $ff ;If up, bump color up cmp #crsr_up bne notUp lda (color),y and #$07 adc #$00 ;carry is always set after the cmp cmp #$06 bne colorNotHigh lda #$00 colorNotHigh: sta (color),y notUp: ;If down, bump color down cmp #crsr_down bne notDown lda (color),y and #$07 bne colorNotLow lda #$06 colorNotLow: sbc #$01 ;carry is always set after the cmp sta (color),y notDown: ;If left, cmp #crsr_left bne notLeft ; Erase cursor lda #peg sta (screen),y ; Bump cursor left dey bpl cursorOKLeft ldy #$03 cursorOKLeft: ; Draw cursor lda #cursor sta (screen),y notLeft: ;If right, cmp #crsr_right bne notRight ; Erase cursor lda #peg sta (screen),y iny cpy #$04 bne cursorOKRight ldy #$00 cursorOKRight: lda #cursor sta (screen),y notRight: ;Store cursor pos sty $ff ;If not enter, goto Key_Loop cmp #13 bne Key_Loop ;------------------------ ;Erase cursor lda #peg sta (screen),y ;Check score ;Draw "0 0" lda #'0 sta screenLoc+5 sta screenLoc+7 lda #0 ;black sta colorLoc+5 ;Calculate white score (any matches between the guess & key) ldy #3 yLoop: ldx #3 xLoop: lda (color),y and #$07 cmp secret,x bne noWhiteMatch inc screenLoc+7 noWhiteMatch dex bpl xLoop dey bpl yLoop ;Calculate black score (exact matches between the guess & key) ldy #3 blackLoop: lda (color),y and #$07 cmp secret,y bne noBlackMatch inc screenLoc+5 dec screenLoc+7 noBlackMatch: dey bpl blackLoop ;Check win (black score = "4") lda #'4 cmp screenLoc+5 beq Win ;Check for Game Over (screen's getting full) lda screenGameOver cmp #32 bne Game_Over ;Scroll screen up 2 rows jsr $e8ea jsr $e8ea ;Copy last hand ldy #3 copyLoop: lda colorLoc-80,y sta (color),y lda #peg sta (screen),y dey bpl copyLoop: jmp Game_Loop Game_Over: jsr $e8ea ldy #8 ;Show "YOU SUCK!" loseloop: lda losetext,y sta (screen),y dey bpl loseloop: jsr $e8ea ldy #3 ;Reveal the secret key losecopy: lda secret,y sta (color),y lda #peg sta (screen),y dey bpl losecopy jmp WaitStart Win: jsr $e8ea ldy #15 ;Show Zero Wing reference winloop: lda wintext,y sta screenLoc-4,y dey bpl winloop rts randomDelay: ldx $d012 ;Loop n times, n = current raster location randomLoop: jsr delay dex bne randomLoop; delay: dey bne delay rts wintext .byte 1,32,23,9,14,14,5,18,32,9,19,32,25,15,21,33 losetext .byte 25,15,21,32,19,21,3,11,33 titletext .byte 58,58,58,32,3,54,52,32,3,15,4,5,2,18,5,1,11,5,18,32,58,58,58 secret = * --- end cb.asm --- ....... .... .. . C=H 20 =====================================Part 2=================================== Tinyplay by SLJ -------- Like many people, I was a serious Ultima guy growing up. Not only did I love the games (and still do), but I absolutely loved the music (and still do). In fact, I think the music from III and IV is some of the finest, most musical composing ever done on the C64, and the best example of how appropriate, atmospheric music can add volumes (so to speak) to a game. It also turns out that I had been thinking about some new ideas for a new music system for a while. So when Robin said he was writing an Ultima-like game -- well! I'm certainly no Kenneth W. Arnold, but a set of tiny tunes sure sounded like a neat and fun thing to do, especially in the Ultima style! Robin removed a lot of neat features and text to add in the music, so I feel pretty honored to have been included. The player and tunes were written on pretty short notice; I think we started a week or so before the deadline. Moreover, I had to go out of town the weekend of the deadline, getting back that Sunday, where many of the optimizations took place in frantic coding sessions! So it shouldn't be surprising that the code is not as optimal as it could be -- if you get around to looking at the source code, you'll see an awful lot of weird, duplicate, and otherwise embarrasing lines of code which anyone with a clear head would never have used. But as is, the player and three tunes take up some 428 bytes (I think), which isn't too bad. (Later on I'll describe a music compression routine which would have worked great, but that I didn't have time to implement.) Broadly speaking, the player is a note-duration based player, using two sawtooth voices for that Ultima sound (originally, before reality kicked in, the thinking was to put some sound fx in the third voice). It uses a single play routine, using the .X register to tell which voice to play, and can play multiple tunes. The original version has several effects included like major and minor arpeggios, but memory constraints forced these to be taken out (along with the original tune, alas, alas). The full source code for the player is at the end of this article, and can be divided into four primary parts: a one-time init routine, a routine to select a tune, a routine to play notes, and the music data. But perhaps the best way to understand the player is to begin by looking at how the data for a tune is stored. Tune encoding ------------- Here is the third, "castle" tune, in its entirety: * Tune 3: Castle t2v1 dfb 97 ;gate dfb 24+$80-1 dfb 70,96,72 dfb 6+$80-1 dfb 96,96,70,72 dfb 12+$80-1 dfb 74,74,96,75 dfb 24+$80-1 dfb 77,75,74,96,72 dfb 6+$80-1 dfb 96,96,70,72 dfb 24+$80-1 dfb 70,96,96,96 dfb 00 t2v2 dfb 97 dfb 24+$80-1 dfb 34,41,46,41 dfb 34,41,46,41 dfb 58,62,53,57 dfb 58,53,46,96 dfb 0 end The first chunk of data (t2v1) is the voice 1 data, and the second chunk the voice 2 data (t2v2). There are 12 notes in an octave, and eight octaves total, for 96 possible notes. In the player, data bytes 0-95 represent these notes -- the player just uses the data as an index into a frequency table. This leaves bytes 96 and above free for other uses; also, since the lower note values are rarely used, values like 0, 1, etc. are also free in principle. Byte values >= 128 are used to specify the note duration. When the player encounters these byte values it simply strips the high bit and stores the result. When a new note is read in, this stored duration is placed into a counter which is decremented on each player call. In the castle tune above, the duration for voice 2 is only set once, at the very beginning -- every note has the same duration. In the voice 1 data it is set several times, to change the duration when appropriate. For some reason the code decrements the duration counter until it becomes negative, instead of decrementing it until it becomes zero. I'm sure it was because of something relevant in an earlier version of the player, I just totally don't remember what. But that is why the durations above are encoded as 24+$80-1 -- the -1 is to make it go negative, instead of to zero. Looks weird. Byte values >95 and <128 are available for 'special' features. The player supports one "effect": the gate toggle. The player can either leave the gate bit on all the time or can turn it off right as a note is about to finish (say, when the duration counter decrements down to 2). A byte value of 97 turns the gate toggle on -- in the castle tune above it is the first byte of data. In other tunes, a value of 100 is used to turn off gate toggling. (Historically speaking, values 98 and 99 were used for major and minor arpeggios, and 100 was for no fx at all.) The byte value 96 is used as a "rest" (or hold) -- the player resets the duration counter but does not reset the gate bit or the note value. When there is no gate toggling this provides a way of holding a note for longer than the basic duration. When there is gate toggling, it starts the release cycle, and lets notes fade away. In the castle tune above, the first two notes in voice 1 are "70 96" -- a note, then a rest, during which the note fades away. Finally, the value 0 is used to indicate the end of data. While 0 is technically a note it never gets used as such. When the player hits a zero it simply resets the data pointer. So with all that in mind, let's take a look at the player. The Player ---------- Before playing a tune, a one-time routine is called which sets up the frequency tables; when the player reads a new note, it looks up the SID frequency settings in this table, along the lines of: ldy note,x lda freqlo,y sta $d400 lda freqhi,y sta $d401 Frequencies in different octaves are related by powers of 2. When you go up an octave, the frequency doubles -- for example, the frequency of C-5 is twice the frequency of C-4. The routine starts with a table of twelve frequency values for the highest octaves, and copies those values into the end of the frequency table (e.g. freqlo+95, freqlo+94, ..., freqlo+84). After copying, it divides each value by 2 and repeats the process -- now the frequencies correspond to the next lower octave -- and this is done until the frequency table is full. Piece of cake. Once the frequency table is set up, the main play routine is called with the tune number in .A. If the tune number is different than the current tune, the player simply resets the music data pointers and note durations, and falls through to the main player routine. And the main player routine simply decrements the duration, and when it becomes negative reads in the next data. Because the process -- decrement duration, read next data, store frequencies in SID -- is the same for each voice, a single routine is used for both voices one and two, using .X as the voice index. That is, instead of dec duration the code can use dec duration,x and instead of sta $d400 the code can use something like ldy SidOffset,x sta $d400,x where .X is 0 or 1. And that's about all there is to it -- the code is pretty straightforward. Compressing Music ----------------- Most music has the property that it doesn't jump around all over the place, but rather notes progress in relatively small intervals (seconds, thirds, fifths, etc.). For example, a major chord (as used in a typical arpeggio) is note note+4 note+7 note So the idea is to use a differential scheme, encoding the note _intervals_ instead of the note _values_. That is, the above could instead be encoded as x 4 3 -7 where each value is added to the current note value. Thus, if intervals are restricted to -8..7 then only four bits are required, cutting the data size in half. Well, not quite of course, because an escape code is needed to specify notes outside of the -8..7 range, not to mention durations and fx settings. Luckily, not all interval values are used, so these can be used for escape codes; code 96 -- the rest or "hold" note -- can also use an otherwise unused value. So the decompression code looks something like read next four bits if escape code then read next byte otherwise add to current note value Pretty simple, and leads to quite substantial savings. Ah, but for a little more time... Another possibility arises if not all note values are used. I set up the frequency tables to contain all 12 notes in eight octaves. If not all twelve notes are actually used -- if none of the tunes contain a D# or whatever -- then a few bytes can be shaved off of the frequency table. But this means a lot of renumbering and such... it all depends on just how much you want those extra bytes! And that's about all there is to it. Source code ----------- * * Tiny music player, for the * minigame contest. * * slj 9/23/01 * sjudd@ffd2.com * ; org $2800-491 org $2800-425 ARPDEL = 2 SR = $c9 GATEDEL = 2 curtune = $fe curdur = $fc tunepos = $b0 ;position in note sequence seqpos = $b2 ;position in seq list dur = $b4 ;2 bytes each notefx = $b6 curnote = $b8 newnote = $ba arpdur = $bc arpoff = $bd noteptr = $fa temp = $fa freqtab = $c000 * * Code begins here * start *------------------------------- do 0 jmp InitFreq jmp Play jmp Reset fin *------------------------------- * * InitFreq -- set up note table * InitFreq ldy #192 :l2 ldx #24 :loop lda freq-1,x sta freqtab-1,y dex dey lda freq-1,x sta freqtab-1,y lsr freq,x ror freq-1,x dey beq :xit dex bne :loop beq :l2 :xit rts * * Play routine * tune in .A * Reset ; Force a tune reset sec ror curtune ; ldx #255 ; stx curtune Play ldx #1 cmp curtune beq PlayTune sta curtune ldy #00 sty tunepos ; sty tunepos+1 sty dur sty dur+1 lda #15 sta $d418 lda #SR sta $d406 sta $d406+7 * * Main routine * * voice in .X (0, 1) * SetTune ldy curtune txa beq :v1 :v2 lda v2tunepos,y bne :sta :v1 lda v1tunepos,y :sta sta tunepos,x PlayTune Get lda #00 sta newnote,x dec dur,x bpl DoFx :next ldy tunepos,x inc tunepos,x lda tunes,y beq SetTune bpl :c1 and #$7f ;$80+dur sta curdur,x bne :next :c1 cmp #96 bcc :setfreq beq :hold sta notefx,x bne :next :note ;sta curnote,x ; sta newnote,x ; jsr setfreq :SetFreq asl tay lda Freqtab,y pha lda Freqtab+1,y ldy SidOffset,x sta $d401,y pla sta $d400,y :gate lda #$21 ldy SidOffset,x sta $d404,y :hold lda curdur,x sta dur,x DoFx ldy notefx,x cpy #97 bne AllDone ldy dur,x cpy #GATEDEL bcs :c1 lda #$20 ldy SidOffset,x sta $d404,y :c1 AllDone dex bpl PlayTune RTS rts * * Frequencies * freq da 34334 da 36377 da 38539 da 40831 da 43258 da 45831 da 48557 da 51444 da 54502 da 57743 da 61177 da 64815 ;arptab dfb 0,4,7 * * Tune positions (offset into sequence * list) * v1tunepos dfb t0v1-tunes dfb t1v1-tunes dfb t2v1-tunes v2tunepos dfb t0v2-tunes dfb t1v2-tunes dfb t2v2-tunes tunes ; dfb 00 ;dummy byte * * Sid offset table * SidOffset dfb 00 dfb 07 t0v1 dfb 97 dfb 10+$80-1 ;dur=10 dfb 42,47,51 dfb 42,47,51 dfb 42,47,47,47 dfb 51,51,51 dfb 54,52,51 dfb 52,51,49 dfb 51,51,51 dfb 49,49,49 dfb 51,49,47 dfb 49,47,46 dfb 44,44,44 dfb 46,46,46 dfb 47,51,47 dfb 46,42,46 dfb 47,96,96 dfb 96,96 dfb 00 t0v2 ;v2 dfb 97 dfb 10+$80-1 dfb 35,35,96 dfb 35,35,96 dfb 35,35,96 dfb 35,39,96 dfb 39,42,96 dfb 42,30,96 dfb 30,35,96 dfb 35,34,96 dfb 34,32,96 dfb 32,32,96 dfb 32,32,96 dfb 32,34,96 dfb 34,35,96 dfb 35,30,96 dfb 35,35,96 dfb 35,35,96 dfb 00 * Tune 2: Combat t1v1 dfb 100 ;No gate dfb 11+$80-1 ;dur=11 dfb 51,54,58,59,96 dfb 58,57,54,51,54,58,63,96 dfb 62,61,60,59,56,53 dfb 58,54,51,56,53 dfb 46,50,53,56 dfb 59,58,57,54 dfb 0 t1v2 ;v2 dfb 100 dfb 11+$80-1 dfb 15,27,15,39 dfb 15,27,15,39 dfb 15,27,15,39 dfb 15,27,15,39 dfb 47,44,41,51,46,42,47,44 dfb 22,34 dfb 22,34 dfb 22,34 dfb 22,34 dfb 0 * Tune 3: Castle t2v1 dfb 97 ;gate dfb 24+$80-1 dfb 70,96,72 dfb 6+$80-1 dfb 96,96,70,72 dfb 12+$80-1 dfb 74,74,96,75 dfb 24+$80-1 dfb 77,75,74,96,72 dfb 6+$80-1 dfb 96,96,70,72 dfb 24+$80-1 dfb 70,96,96,96 dfb 00 t2v2 dfb 97 dfb 24+$80-1 dfb 34,41,46,41 dfb 34,41,46,41 dfb 58,62,53,57 dfb 58,53,46,96 dfb 0 end ....... .... .. . C=H 20 =====================================Part 3=================================== MagerTris -- by Per Olofsson Way back in 1990 or thereabouts I wrote a Tetris clone in basic on the Amiga. When the compo was announced I scratched my head for a bit and figured that I should have a good chance of fitting Tetris in 512 bytes. If I couldn't I would at least have a good engine for the 2K compo. The Game This puzzle game was invented in the 80s by the Russian programmer Alexey Pazhitnov. The game has a vertical field where one of seven puzzle pieces appears at the top and falls towards the bottom. The player can rotate the piece 90 degrees or move it left or right. When it reaches the bottom the game field is checked for filled lines that are removed. The player is awarded points for removing lines (with a bonus for clearing more than one in one go) and the game is over when the screen fills up so that new pieces can't enter the game field. Drawing the Screen The screen is filled with inverted space and the game pieces and the border is drawn into the color ram. There aren't any real tricks used here, the only one is that the color of the border is a side effect of the last character printed, ascii 13 (light green). Random Numbers We need a decent source of random numbers to generate a random Tetris piece. The SID chip's noise waveform is probably the best one available on the C64, and fortunately the code to access it is really short. To initialize I used: initrnd subroutine lda #$81 ; enable noise sta $d412 ; voice 3 control register sta $d40f ; $81xx as the frequency and to get an 8-bit random number all you have to do is lda $d41b. The Tetris Pieces Tetris has 7 puzzle pieces that you have to store in a table. As every piece consists of four boxes we need a total of 7*4 = 28 entries. The smallest tetris table is probably the one where each piece is represented by 8 bits, like this: 0010 0000 0011 0110 0100 0010 0110 0111 1111 0110 0011 0111 1110 0110 The table is very compact, a mere 7 bytes, but then you need code to rotate every piece and accessing the table is somewhat clunky on the 6502. I've seen an x86 version that did this though, with a total size of less then 256 bytes. However, on the 6502 I needed four bytes for each piece for the code to be reasonably compact and I also kept all the rotated pieces in the table, for a total of 112 bytes. Fortunately, as every piece has a center box we only need to store 3 which trims the table to 84 bytes. But what do we actually store? As the screen is painted to color ram I used a pointer to the current position. In the table I simply stored the offset in color ram from the center box. With indirect indexed addressing the routine is nice and short. Failure The rest of the program is fairly straight forward. User input is just a cmp loop, there are dec zp timers for most events, and a raster wait to time everything. This is also where I failed to fit everything inside the 512 byte limit, as everything put together took about 600 bytes, even after some serious optimizations. There are two very similar routines: the one that draws a piece on the screen and the one that checks for collisions. Both iterate through the four boxes in a piece, but because of the way I used some zeropage pointers I couldn't merge the two routines into one without rewriting most of the code. Success As rewriting everything meant too much work, I just decided to go for plan B: write a 2K entry. With a 600 byte engine there was plenty of space to add features, and at around 700 bytes or so compressíon starts to make sense -- pucrunch breaks even around there somewhere, and the final binary is actually 6500 bytes uncompressed. I could easily fit a title screen complete with custom graphics, a nice tile set for the tetris pieces, and even a hiscore list that saves to disk. The only trick I used here was that the custom character set was EOR:d with the ROM font, and as most characters are the same or very similar it compresses much better that way. And that's the story of the 2K MagerTris. ....... .... .. . C=H 20 =====================================Part 4=================================== Tuff -- Compressing tiny programs -----------------------------------> Stephen L. Judd Most compression schemes address the problem of taking a large program or data file and compressing it down. But have you given much thought to the problem of trying to compress a _small_ program, even a 512-byte program? I spent some time trying to come up with a compression scheme for tiny programs -- saving even 2 or 3 bytes in tetrattack would have been helpful. The effort was basically a failure, as the best I thought I could do was to break even (on paper, for tetrattack). I never actually implemented the code either -- just worked it out on paper. It is fairly interesting though, and maybe these preliminary efforts will give other people ideas (especially for compressing 2k programs). And after writing this article, I now think it might have really worked, and saved a handful of bytes -- maybe next year? Tiny Compression Basics ----------------------- If you remember your C=Hacking #16, there are two basic approaches to compression: taking a fixed-length input while giving a variable-length output, or taking a variable-length input and producing a fixed-length output. An example of the latter is LZ compression -- looking for repeated strings of length 2, 3, etc. and replacing those with a single byte (or two bytes, or whatever). An example of the former is a Huffman tree, which replaces each byte with a string of bits, using fewer bits for the most common bytes. To decompress the compressed data, of course, a decompression routine is needed. For tiny compression, this routine obviously needs to be as tiny as possible. Most C64 decompression programs copy themselves somewhere before decompressing the data stream to wherever it is supposed to go. Obviously, if we put a few restrictions on the data to be decompressed the decompression routine can be made smaller, and more specific to the task. Finally, it is worth mentioning that a tiny program in general does not use all possible byte values. Now, first consider Huffman encoding. In a typical Huffman decoding scheme, the decoder reads a bit at a time, traversing the Huffman tree with each bit, until a byte is hit. This implies a fairly complicated decoder, and worse the tree must be stored. Finally, the tree can be fairly large, if the number of possible symbols is large. So Huffman doesn't seem like a good approach. So, consider LZ-style encoding: replacing n-byte sequences with a shorter code. In LZ, the n-byte code is replaced with a reference to the previously decompressed bytes -- a code which says how far back to go, and how many bytes to copy. What a drag -- an escape code, the distance backwards, and the number of bytes to copy. For a byte-aligned decompressor, this implies at least two bytes (since the decompression routine must be small, a byte- aligned decompressor is of greater interest). So far so bad. But, since a tiny program doesn't use all possible byte values, perhaps a simple substitution method is possible -- that is, replacing various 2-, 3-, 4- etc. byte sequences with one of those bytes. Alas, the numbers don't work out very well: in the above scheme, let's say an n-byte sequence appears m times, for a total of n*m bytes. We replace each of those sequences with a single byte, giving a savings of n*m - m bytes. But we also need to store the original sequence in a table somewhere, plus the byte corresponding to that sequence. This means another n+1 bytes, so the total savings is n*m - (m+n+1) So, for example, if we have a 2-byte sequence repeated 3 times, the savings is 2*3 - (2+3+1) = 0 bytes! The issue is that replacing a 2-byte sequence with a 1-byte sequence saves one byte, and this must happen at least three times to overcome the three bytes of table storage. With 4 repetitions, one byte is saved. And so on. The net result is that you have to have a _lot_ of sequences repeated a lot of times to get any savings, and somehow this savings needs to be greater than the decompression code. Note that, in general, if there are only a few sequences to replace then some custom code can be shorter, but as the number of sequences increases then generic code, with sequences stored in tables, quickly becomes shorter. The final dagger in the heart of these schemes is the program itself. I wrote a simple program to find all the repeated strings in tetrattack. There were numerous 2-byte sequences, but rarely repeated more than three times. There were a few 3-byte sequences, and even one 4-byte sequence. But overall, there just aren't enough sequences, especially long sequences, to make any sequence-shortening scheme worthwhile. There are, however, lots of repeated bytes. For example, there are an awful lot of STA instructions and zp variables, and usually a lot of zeros. This suggests taking another look at a Huffman-style scheme to replace those bytes with a smaller number of bits. One major problem identified earlier is storing the Huffman tree -- it's just too big. So, the first thing to consider is to just compress the most common bytes -- maybe the top five or eight bytes or whatever -- to make the tree smaller. But that still leaves a substantial decompression code to traverse the tree, and a tree with several empty nodes in it. Once again, if there are only a few nodes then custom code might be shorter, but beyond a few nodes it's more practical to store the tree and have some generic traversal code. But consider the following alternative to a Huffman tree: what if we had a tree which looked like 0/\1 / \ lit 0/\1 / \ b1 0/\1 / \ b2 ... where "lit" is a literal byte and b1, b2, etc. are encoded bytes. This would encode the most common byte with two bits (10), the next-most common byte with three bits (110), and so on, and encode literal bytes with nine bits (0byte). To see if this is a viable scheme, check out the numbers. Let's say that we encoded just the top three most common bytes in a program; in an early version of tetratack those bytes were 00 $2F occurances 03 $1A occurances (ZP variable) 8D $18 occurances (STA opcode) In a real implementation, I'd choose zp variables to be common opcodes like $8D, which would double the number of occurances of that byte, but let's just consider the above numbers in a 512-byte program. Those top three bytes total $5A bytes, almost 1/8 of the total bytes (the top seven bytes take a little less than 1/3 of the total, by contrast). The number of bits required is 9*($200-$5A) + 2*$2F + 3*$1A + 4*$18 = $FE2 bits = $1FD bytes. so, a whopping 3 byte savings -- just enough to store the three "tree" values. For the top seven bytes, a similar calculation suggests $01D8 bytes of compressed data, again for code not optimized for compression. For a more optimized code, with zp variables the same as common opcodes and such, the total bytes could conceivably be on the order of $1C0 bytes -- maybe 64 bytes of savings (but again, I never actually tried coding a more optimized version so I don't know for sure). Of course, we haven't talked about decompression yet. One important feature of this tree is that the "treeness" of it is unimportant -- it's really just a substitution code. Remember that the data is encoded as 0+8 bits literal byte 10 byte 1 110 byte 2 1110 byte 3 and so on. A decompression algorithm simply reads bits until a 0 is found, and then looks up the relevant value in a table. The exception is if the first bit read is 0, in which case the code needs to read the next eight bits and output the byte. There is no tree traversal, in the sense of looking up nodes and moving left/right, to speak of. Reading bits is trickier than reading bytes. After every eight bits read some sort of memory pointer has to be incremented to the next eight bits. Moreover, when reading codes like 110 or 1110 the number of 1's read has to be counted too, to be used as an index into the lookup table. And if there's a literal byte then the next eight bits need to be read, regardless of their value. When reading the bitstream, after every eight bits the pointer into that bitstream must be incremented. To count eight "global" bits at a time, I use a zp variable "count" and just rotate a bit through the different bit positions. Every time it turns zero, the data stream pointer is incremented. The advantage of this method is that it frees up a register (and winds up saving a byte overall). To count the number of "local" bits read, I use the .Y register. Initially it is set to zero, and increments as 1s are read. For a literal byte, I set .Y to -8, and put a little check in the bit reader for negative .Y; that way, it simply increments to .Y=0. When bit=0 is finally found, the code simply does an lda data,y to get the coded data. Since .Y counts the number of bits read, .Y will be at least 1 for any coded data, but .Y=0 for a literal byte as described above. Therefore, for a literal byte, the code can simply rotate bits into the location "data", and use the same lda data,y instruction to fetch the literal byte. With all that in mind, here's the code I finally came up with (on paper!): Tuff+1 [data lookup table] Source [compressed data bitstream] :lit ldy #-8 :getbit lsr count ;Increment pointer every eight bits bne :skip ror count inx bne :skip inc :src+2 :skip :src asl Source,x ;Get next bit bcs :next tya ;.y=0 means first bit is 0 beq :lit ;so read next 8 bits bpl :found ;If .y<0 then literal :next rol Tuff ;Save bit iny bne :getbit ;until .y=0 :found lda Tuff,y ;Fetch code/literal byte ldy #00 sta (zp),y ;Output data inc zp bne :getbit inc zp+1 bpl :getbit [Uncompressed code -- (zp) points here!] for a total of 45 bytes, plus 7 bytes for the "Tuff" byte table -- total of 52 bytes, compared with a possible 64 byte savings (maybe I should have tried to implement this after all!). Hopefully someone can improve on this further. (And "Tuff", btw, is a contraction of "TinyHuff"). Now, there may be a few raised eyebrows here which I hope I'll lower. This code makes several assumptions. It can either be entered in at ":src", or at ":getbit" depending on the value of "count", the global bit counter. I assume count is already initialized (e.g. default kernal/restart value), either to 1 (in which case :getbit is the entry point) or $80, in which case :src is the entry point, with the "Source" stream modified accordingly in each case. I further assume that .axy have their default values of 0 when the routine is called from BASIC. I also assume "zp" is initialized to the correct value already. One easy way to do this is to use $ae/$af, the end load address+1, but other options are certainly available. Finally, you may have noticed that the thing just keeps outputting data until zp=$8000 -- correct. Unless you really really care about garbage in memory above the program, there's no reason to waste bytes on an "end of data" check. Another variant --------------- I spent a little time looking at a variant of this scheme: using a fixed length bit sequence, for example, assigning four bits to each substitution byte, i.e. 0+ literal byte 1000 byte 1 1001 byte 2 1010 byte 3 ... 1111 byte 8 The numbers work out pretty well, compression-wise, but at the time I felt the decompression code was more complicated and that it wouldn't break even. Looking at it now, though, I'm starting to think otherwise -- that loading three bits at a time isn't any more complicated that fetching eight bits so code can be reused, and there isn't any issue about checking for a 1 bit, so the decompression code ought to be even simpler. So this may actually be the better scheme. Sigh... I don't know about you, but every time I revisit a problem I wind up feeling much stupider. Final thoughts -------------- So, that's it: assign shorter bit codes to the most common bytes, and use a highly optimized decompression routine. This method is very specialized, obviously, but might, just might, make it possible to save a few bytes on a tiny program that has been optimized for compression -- using zp locations the same value as common opcodes, reusing vars as often as possible, and so forth. The reason this works is that, in general, repeated long strings in a small program just don't happen, whereas repeated bytes are common and make up a substantial fraction of the total. Pelle (magervalp@cling.gu.se) has suggested the possibility using the first eight bytes of the compressed program for the 'Tuffman' table -- to structure the code to make these common bytes, and thereby save eight bytes by not having to store the 'tree'. Might save a few more bytes with some programs. Finally, while writing this article, I started to wonder about byte sequences with "holes", for example, for things like STA zp1 STA zp2. I wonder if there's some way of taking advantage of these repeated "mask" patterns. But I leave that thought for another person, for another day. ....... .... .. . C=H 20 =====================================Part 5=================================== Tinyrinth by Mark Seelye (aka Burning Horizon) mseelye@yahoo.com http://www,burninghorizon.com Introduction When the mini-game competition was first announced I was originally not interested because of time considerations -- I spend most of my time at work, or watching the kids. But once I saw that nearly everyone on EFNet #c-64 was doing one, I thought it would be fun if I could just keep it simple. I had no idea what game to do. I had a few other ideas but I settled on a maze game because I thought it'd be an interesting challenge doing a maze algorithm and have game code in under 512 bytes. As it turned out, I ended up getting the code down to like 475 bytes, but the version I turned in was 509 bytes. (I ended up shaving an additional 34 bytes just before the due date.) The other reason for the maze game is I have this weird thing I do where in each language I know I code this little maze generation algorithm that I made a long time ago. I had used the algorithm in ASP, VB, C, and a few others but for some reason I had never done it in 6502! Imagine that! It was going to be a bit of a challenge because I had never done it in so few instructions, and never in ASM. It would have to lose some weight and fluff - but I knew I could do it. So obviously there are MANY better ways to generate a maze, but in light of trying to make it small this routine ended up being such a strange mutation of my idea that I'm surprised it worked at all. Setup Before I do anything I must setup some things. This is also used when a player dies OR enters a new level. The only difference between a reset of the game and moving to the next level is the ldx #$00: sta $53. The $53 is also used to set the color of the maze so it rotates from a level one of white, to a level 16 of BLACK (which is evil because then you can't see the maze except for the small hint of the corners on a turn.) ; Initialization setup = * gameover = * ;Setup game variables ldx #$00 stx $53 ; Start at level 1 (to be inc'd) init = * ;Setup level inc $53 ; next level ldx $53 stx $54 ;Counter for drawing Keys (next level) stx $0286 ;Character Color to use on clear (e544) ;Set Render Cursor Start Pos / Player Color lda #$05 sta EBCM1 ;Set ebcm color PLAYER to GREEN ($d022) sta $45 ; Cursor/Player Position X (0-39) sta $46 ; Cursor/Player Position Y (0-24) ;Clear Screen jsr $E544 ;clear screen set to char color ($0286) lda #$5b sta $d011 ;turn on EBCM lda #$18 sta $d018 ;Activate Cset As you can see that I mention extended color background mode, I'll talk about that later. You also might notice I use a character set, but that I will talk about a little later as well, first I want to jump right into the maze generation. Maze Algorithm The easiest way to describe how the algorithm works is to compare it to a worm. The worm eats through an area until it gets blocked by one of its former meals. While eating the worm is counting how many meals he has, once he gets to a certain number he knows he can eat no more. With that said, I'm sure you're completely confused and probably think (know?) I'm crazy. So now for the technical explanation. Starting with a 2-dimensional grid of some size, initialized with 0's, set a start point for the formation of the maze within the grid somewhere. After that has been taken care of the rest follows simple logic: Have we reached every part of the grid? if so: done; if not: Can we grow the maze into an adjacent area from the current position? if so: Grow in one of those directions at random, count it, continue; if not: Find a place in the maze that can grow, continue I had to change the order of the logic around to get it to better fit into a smaller number of bytes. So I moved the "find a place to grow from" routine, and the "Have we reached every part of the grid" check after the check if we can grow in any direction from the current position. To check in all directions from the current position I used a subroutine. The subroutine has 5 entry points which translate to: cangrow up/right/down/ left, and the 5th entry point is actually the first entry point - it checks to see what is in .a and checks the direction that value represents. To check the direction I use a kernal routine to set some zp so I can easily check the value on the screen to the left, right, below, or above the current position. This is also where the boundaries are set: 40 columns wide and 25 characters high. popgridloop = * ldy $45 ;xpos ldx $46 ;ypos cangrowxy = * ;Can grow in any direction? jsr cgup ;check up beq _cgxy ;if 0 then we can grow inx ;offset up check jsr cgright ;check right beq _cgxy ;if 0 then we can grow dey ;offset right check jsr cgdown ;check down beq _cgxy ;if 0 then we can grow dex ;offset down check jsr cgleft ;check left _cgxy beq growloop ;if 0 then we can grow ....The subroutine!... There is a weird "injection" in the bottom of this routine to save a couple bytes.. it is explained in the comments to the code. ;Check if a cell can grow a direction ;1-up 2-right 3-down 4-left ; (y xpos, x ypos, a=dir) x/y switched for indirect lda($xx),y below ; return: a == 0 : true (can move) ; a <> 0 : false (cannot move) cangrow = * cmp #$01 beq cgup cmp #$02 beq cgright cmp #$03 beq cgdown ;cmp #$04 ;beq cgleft *** not needed falling through cgleft = * dey ;set xpos - 1 cpy #$ff ;check xpos beq cgno bne cgloadcell cgright = * iny ;set xpos + 1 cpy #$28 ;check xpos (<40) beq cgno bne cgloadcell cgup = * dex ;set ypos - 1 cpx #$ff ;check xpos beq cgno bne cgloadcell cgdown = * inx ;set ypos + 1 cpx #$19 ;check ypos (<25) beq cgno ;*** fallthrough, bne cgloadcell not needed cgloadcell = * lda #$1f loadcell = * ;x = ypos, y = xpos, a = and value sta $50 jsr $e9f0 ; * sets $d1 $d2 lda ($d1),y ;load byte (x pos in y reg!) cgno = * and $50 ;#$1f = use only low 5 bits! ;rts see below! ; This is mixed with the rts because the first byte would ; be wasted growfrom = * rts .byte 1 ; this is part growfrom part growto 48 is used for both ; again first byte would be wasted so we overlap with the previous growinto = * .byte 2, 4, 8, 1, 2 ;explanation of above ; 0 1 2 4 8 ; 0 4 8 1 2 ;rts 1 2 4 8 ; 4 8 1 2 *From Mapping: ;59888 $E9F0 ;Set Pointer to Screen Address of Start of Line ;This subroutine puts the address of the first byte of the ;screen line designated by the .X register into locations ;209-210 ($D1-$D2). Well, if the code at "cangrowxy" discovers that it cannot grow in any direction from the current position, it has to find a place to grow from. To do this it falls into the "findgrow" routine; the findgrow routine is interesting because it is a "stateful" routine, meaning that upon reentry it resumes what it was doing before. The reason for this is I didn't want it restarting at the top of the maze each time it entered this find routine - I wanted it to continue to search from where it left off. This routine works by "walking" the current x and y positions through the grid. To walk it through it keeps track of where it left off and it just keeps setting the x and y pointers to the next place and "returns" to the start of the generation portion of the code. The other advantage to doing it this way is I can reset this routine to place a "key" at a dead end in the maze. This reset is done elsewhere in the maze by setting a zp value to 0. When the reset is done the findgrow routine runs a small piece of code to reset itself, but before it does this it places a key if it has not exceeded the max number of keys. Here is the find grow routine, each line of code is documented to hint at what it is doing: ; *** fall into findgrow findgrow = * lda $4b ; Check byte 0 != resume findgrow bne _fgresume sta $49 ;Reset Findgrow Xpos sta $4a ;Reset Findgrow Ypos inc $4b ;Set findgrow flag to resume (<>0) ;Place keys in corners (injected here for ease of placement, d1/d2 is ;pointed at a dead end) iny ; offset left check beq _fgresume ;Do not try when column is 0, it freaks out lda $54 beq _fgresume ;if 0 then keys are done dec $54 ;dec # of keys left to place inc $51 ;actual num keys left lda ($d1),y ;load byte ora #$c0 ;EBCM value for key! sta ($d1),y ;store new value ;(end of injection) _fgresume = * _fgx ldx $4a ;Findgrow ypos _fgy ldy $49 ;Findgrow xpos inc $49 ;Next xpos (next round) cpy #$28 ; < 40 beq _fgcr ; next line if >= 40 jsr cgloadcell ; load cell byte beq _fgy ; if 0 then get next xpos/byte sty $45 ;Set Current xpos stx $46 ;Set Current ypos jmp cangrowxy ;Check if this can grow _fgcr lda #$00 ;Reset Findgrow xpos sta $49 ; 0->xpos inc $4a ;Next Findgrow ypos lda $4a cmp #$19 ;check ypos (<25) bne _fgx ;If we're at x40y25 we are ready to play! beq gameinit ;Start game logic That code also checks to see if we have completed generating the maze, at which point it enters the game logic. But before we jump into that I have to explain what happens when the "cangrowxy" routine discovers that it CAN GROW, instead of falling into the "findgrow" routine. In the "cangrowxy" routine described a bit above there was the final line of: _cgxy beq growloop ;if 0 then we can grow This "growloop" routine is the place where the actual "growing" or "eating" occurs. This routine works by choosing a direction at random, and attempting to grow in that direction. It reuses the same "cgup" etc. routines that the "cangrowxy" uses because although we know we can grow in a direction, we don't know which direction. In a larger version of this algorithm I usually link all of this stuff together, but this is how it worked out in tinyrinth. growloop = * randdir = * ;jsr getrand; not a func, not reused yet getrand = * lda #$80 sta $028a ; Key Repeat; (injected here for #$80) just using #$80 for ;smaller code sta $d412 ;sta $d404 d412 is V3, d404 is V1!! sta $d40f ;set v3 random # gen bits lda $d41b ; read random number from V3 and #$03 ; Force Random number to 0-3 clc adc #$01 ; Add 1 to get 1-4 sta $4c ; store rand direction ldy $45 ; Current Xpos ldx $46 ; Current Ypos jsr cangrow ; Check if we can grow in that direction bne randdir ; if <> 0 then Try again sta $4b ; reset findgrow flag (injected here for .a==0) grow = * ldx $4c ;Get saved rand direction lda growinto,x ; 1-4 (4, 8, 1, 2) Get bit set for new cell sta ($d1),y ; write new location lda growfrom,x ; 1-4 (1, 2, 4, 8) Get bit to set for old sta $4d ; Save growfrom bit ldy $45 ; Reload Current xpos ldx $46 ; Reload Current ypos jsr cgloadcell ; Load base cell again ora $4d ; Combine with growfrom bit sta ($d1),y ;Modify old cell to connect to new loc ;Change current position lda $4c ; Get saved rand direction jsr cangrow ; Get new x y again - (this will only perform next x/y ;adj, returns <>0) sty $45 ; xpos set to new location stx $46 ; ypos jmp popgridloop ; Return to populate grid loop As you can see above I use the random number generator from V3 to get the random direction. A HUGE bonus here is I can use the value returned to directly call that main entry point in "cangrow" so it will figure out which direction to check by itself. If the direction it checks is bad it just loops and tries again. In a larger version of the program I'd make this much more efficient, but efficiency costs bytes. Many coders would be afraid of using a loop like this but since I pre-checked that I can grow before entering this loop, unless my random number generator never returns that direction, I should be fine! Once we have a good direction I use the already set $d1 zp to read and update the screen data. Nice bonus of using the routine that sets it! The code that updates the data must update the cell you are growing to, but ALSO it must update the cell you are growing from, which is why I have to reload the original data up there. When it is finished doing the growing it moves the x/y positions ($45, $46) to the place it grew to; and resumes the maze generation from there. As I said before, the findgrow routine will jump into the game logic once it knows it is finished, but before we talk about that I should describe another problem that I needed to solve. In a 2 dimensional maze there are 16 possible pieces that can be formed. Starting with completely closed ending with completely open. I represent these pieces with numbers 0-15; each bit represents a side of a cell that is open/closed. So when I finish with the maze generation I have a grid of numbers 0-15. (Actually 1-14; in this game I never end up with 0's or 15's as all pieces are used.) In Tinyrinth the "grid" I used was the screen memory (40x25), so I ended up with a screen full of characters A-N. Navigating a maze of characters A-N would be rather difficult so I had to decide whether to use the built in character set and transform what was on the screen or somehow use a character set. Well, another hitch was I wanted this to be a game and losing the simplicity of 0-15 for detecting which way a player could move would be bad - so I opted for a character set of 16 characters. Hmmm 8 bytes per character time 16 characters.. 128 bytes! Yikes! Character Set Generation So I was going to need a 128 byte character set, but I didn't want to have to lose 128 bytes. So what did I do? I coded about 70 bytes of code to generate the character set. This too pained me because I was sure I could make it smaller somehow, but once I got it working and a decent percentage smaller than the size of what it produced I stopped worrying about it. Here is how it works: we know we want to represent the 16 pieces with 0-15 so the bits must give some hint to the shape of the piece. What I did was create a loop to check the current counter value and either set or skip the top, sides, bottom of the piece. Here is how it works, I have commented each line of code to describe what it is doing. ; Generate Cset! lda #$20 ; write hi sta $48 ; use zp lda #$00 ; write lo ;Initialize Screen, variables (injected here to save bytes - using ;lda #$00) sta $51 ; Clear actual num keys placed counter (see findgrow) sta $d020 sta EBCM0; Set BG Color ($d021) ;(end of injection) tax ; counter = 0 _again sta $47 ; use zp ldy #$00 ; index txa ; counter to a and #$01 ; check for top beq _ytop ; yes top eor #%01111111 ; 00000001 -> 011111110 -> 10000001 _ytop eor #%11111111 ; 00000000 -> 111111111 _6sides sta ($47),y ; store top/sides to cset iny ; next mem location txa ; counter to a and #$02 ; check for right eor #%00000010 ; flip lsr ; 00000010 -> 00000001 || 0->0 sta $49 ; store for right side txa ; counter to a and #$08 ; check for left side bne _noleft ; no left eor #%10001000 ; 00000000 -> 10001000 -> 10000000 _noleft eor #%00001000 ; 00001000 -> 00000000 ora $49 ; merge with right cpy #$07 ; 7->15->23->... bne _6sides ; total of 6 side pieces txa ; counter to a and #$04 ; check for bottom beq _ybot ; no bottom eor #%01111010 ; 00000100 -> 01111110 -> 10000001 _ybot eor #%11111111 ; 00000000 -> 11111111 sta ($47),y ; store bottom to cset inx ; next counter clc ; clear carry lda $47 ; inc zp adc #$08 ; by 8 bne _again ; do it again inc $48 ldy $48 cpy #$28 ;repeat through cset 2000-2800 bne _again sty EBCM2 ;Set Death color ($d023) (using result of cset gen for ;color value!) I still think I could have made that smaller; either way though it was fun making a little routine to generate the cest. With that all said, we now know how the maze is generated. But now I needed a way to play it, as suggesting players use dry erase markers on their screens was not a great idea. Game Loop Ok, once "findgrow" reaches the end of its maximum size it branches to "gameinit". Game init sets up some quick things and gets going on the game play. The game loop loops until the player either gets all the keys or dies. When the player gets all the keys the level number ($53) is increased which is used to increase the maximum number of keys and the color of the maze! Here is the game loop, I have commented much of the code and separated it into sections: *** Flashes the keys, sets speed of "minitaur" based on the level ; Game Initialization and Game Loop gameinit = * gameloop=* inc EBCM3 ; Flash Keys ($d024) inc $58 ; Increase Speed counter #1 (0-255) bne moveplayer ; Skip move inc $57 ; Increase Speed counter #2 ($57|#$f0 - 255) bne moveplayer ; Skip Move ;set death speed lda $53 ;Use level for Speed value cmp #%11111000 ;If more than this use default speed bmi _dsp lda #%11111000 ;Default speed _dsp ora #%11110000 ;Set high nibble so counter counts up to 255 sta $57 ; Set Speed counter #2 *** Moves the minitaur to the next position if it is time. (Checks for player *** hits) ;move death movedeath = * ldy $55 ;Baddy Xpos ldx $56 ;Baddy Ypos jsr cgloadcell ; load the cell/point the zps (ANDs by #$1f) sta ($d1),y ;store cleared value _newy iny ;increase xpos cpy #$28 ;less than 40? bmi _go ;don't reset ldx $46 ;ypos of player stx $56 ;ypos of death ldy #$00 ;clear xpos counter _go sty $55 ;Set baddy xpos lda #$ff ;Get all bits! (see loadcell) jsr loadcell ;load the cell/point the zps sta $59 ;Save cell value (withh all possible bits) and #%11000000 ;and by EBCM bits cmp #