Sunday, August 31, 2014

VCSMC - Design and Philosophy

This is the third article of my series of postings on vcsmc. You can find the first and second ones by following the links.

When approaching the idea of a media player for the Atari 2600 I knew that I would want to create a custom cartridge that could run on an unmodified console. The homebrew community for Atari has done this before, see the Harmony Flash Cartridge for example. This cart allows for running a variety of 2600 ROMs off of an SD memory card. There's an ARM chip inside of the Harmony cart which provides an on-console UI program as well has handling the loading of the selected ROM for play.

My original vision for the custom cartridge was that it could be a Raspberry Pi or other Linux SoC board that could process an input video signal in real time and output 6502 bytecode at the Atari clock speed 1 MHz for rendering by the Atari. However as I've dug in to the challenge of getting the Atari to render video with any kind of recognizable image fidelity I've come to understand that this may not be tractable, or at minimum I should come up with some intermediate goals that are more in the realm of the achievable.

My current thinking and designs are centered around software that generates 6502 bytecode streams to render individual frames of video at 60 Hz with no constraints on overall program size. In order to maximize render quality every CPU clock cycle available should be used to manipulate the TIA state machine to create output. This creates very large programs, in fact with load/store combinations at 4 bytes each and at 15 per scanline, and 180 vertical scanlines plus overhead audio data during the vertical blank a reasonable upper bound is 16 KB per frame, which is four times the original maximum size (without bank switching) for an entire 2600 program!

At 60 frames per second the overall rate of data to the Atari will be a bit under 1 MBps, or 8 Mbps, which is right around the standard quality video bitrate YouTube recommends for uploading 1080p video. For now, I am focused on offline programs that generate these rather large blobs of bytecode (an 8-minute video will be about 480 MB), and then create hardware that can read the bytecode blobs and drip-feed them to the Atari frame-by-frame.

The two programs in green, aufit and picc, are the offline processing components I am writing. My thinking for these programs is situated somewhere between a compiler/linker and a video compression tool. Furthermore, I have found it useful at times to model this software along the lines of lossy data compression. Although the data rates are not enviable in terms of a modern video compression algorithm, the video is obviously going through a substantial loss of quality as it is made suitable for rendering on the Atari - to wit 24-bit color down to 7-bit fixed palette and 16-bit 48 KHz stereo audio to 15 or 32 KHz 4-bit. As I will outline in subsequent postings it has also become obvious that perceptual coding is very important in this, as in all lossy schemes, in order to retain any feeling of image quality.

Notice that the overall flow has changed a bit from the written description in my first posting. This is because as I have implemented parts of picc I've come up against the challenge of scheduling state changes on the 6502. In an effort to squeeze every drop of potential image fidelity out of the 2600 I have been working on algorithms to try and reuse registers and allow for scheduling of state transitions earlier than they might be needed, so long as it doesn't impact rendering on the previous scanline. This has turned into quite the rabbit hole, more about it in a subsequent posting, but suffice to say that what values need to be set in the TIA audio hardware, and when, has become an important consideration for picc scheduling.

Sunday, July 13, 2014

VCSMC - Palette and Screen Dimensions

This is the second in a series of posts describing the video player software I'm putting together for the Atari 2600, called vcsmc. You can read the first post here.

There's no graphics buffer on an Atari 2600. Meaning, there's no area within memory of this device where a programmer can set pixel colors as they are computed, for display on the next scan out of the screen. Rather, the 6502 CPU marches in lock step with the CRT beam as it scans out each individual row of pixels on the screen while controlling a simple state machine, called the Television Interface Adaptor or TIA, that ultimately modulates the output beam to desired color. Timing is very tight, with 76 CPU clock cycles per scan line. A load to register/store to state machine address combination takes 5 clock cycles, and so the programmer is challenged to create interesting graphics by strategic choice of state changes to the TIA during scan out of the individual lines of the image. The ability to express complex imagery on the VCS output screen takes a level of programming mastery and creativity that is part of what makes Atari programming so much fun.

On NTSC machines the output color modulator supports 128 unique colors:

There are some advanced techniques, like ChronoColor, which may allow the Atari to represent a broader palette but for now I wanted to focus on the original palette, I think it is beautiful and colorful and I'm excited to see how video content will look rendered in these colors.

In terms of logical screen dimension the VCS renders 160 horizontal pixels on each output scanline. Documentation on vertical dimension seems more inconsistent. Most tutorial documents indicate 192 vertical scanlines as a standard but if you look at comments in emulator code like Stella it seems that legacy games used a variety of different vertical scanline counts based largely on timing considerations, with values varying from 80 to 230 vertical scanlines! Since in most games the time when the CRT beam is on, the primary (if not sole) concern of the code is rendering the graphics, this means that during vertical refresh and vertical blank, when the beam is off, is really the only time remaining for the code to update I/O, sound, AI, physics, or any other concerns the game software may have. There are also, it should be noted, generous affordances within the TIA to allow for halving vertical resolution by repeating each scanline twice.

Modern aspect resolutions seem a long distance from 160:192, roughly 16:19! Since vcsmc output is most likely to be rendered on 16:9 aspect ratio displays we will try to approximate 320 horizontal pixels with our 160 and cut vertical height to 180. Hopefully rendering 16:9 pixels on 93% of the vertical space on an actual 16:9 display won't look too funky. Thankfully the screen size constants are centrally located in the code so tweaking them later won't be too painful.

Up next, a discussion of an overall design approach for vcsmc.

Saturday, May 24, 2014

VCSMC - A Media Player for the Atari 2600

I recently resigned from my position as Technical Lead of the Game Consoles team at YouTube. I loved my time there, I spent 3 years in that role and built the team from 2 people to 12 while working on various game consoles, old and new.

When my wife and I purchased a home in the Santa Cruz mountains, however, I knew that my days of commuting to San Bruno were numbered. So I had to resign that position and move to Mountain View to work on Google Chrome for Windows.

As a parting gift my manger gave me an Atari 2600 console which he had customized with the YouTube logo and colors. A long-standing pipe dream on my team while developing on other game consoles had been that we would eventually port our work to the Atari. The painstaking care that he had taken in building this console was really touching. It made me think, why not try to develop a media player for it? And thus the dream was born. I've been working hard on the project for over a month now so I figured it was time to (briefly) lift my head up from the code and talk a little about my design, plans, progress, and next steps.

Please understand that I am doing this on my spare time and on my own personal computer hardware, outside of the context of Google Inc and YouTube. As with all content on this blog my actions, thoughts, and opinions expressed here are mine alone and in no way represent the policy or plans of my employer. All trademarks and copyrights are owned by their respective owners.


When studying the Atari 2600 Video Computer System (VCS) hardware it quickly became apparent that my original vision of porting enough of a web browser to render the YouTube HTML5 TV client was probably beyond the scope of work that I could accomplish in the space of a couple of months in my free time. Additionally, there's a rich homebrew scene for the Atari 2600 and one of the primary aesthetics of it is a real respect for the capabilities of the hardware and not trying to exceed that or expand it, but rather to use it creatively to create the best experience possible within those constraints.

My plan now is to develop an Atari 2600 program that can render arbitrarily long video content. This content has been preprocessed by a few host computer programs that convert input still images and sound into 6502 bytecode, largely instructions to the Atari to manipulate the TIA state machine to generate the resultant output. I call this suite of host computer programs, and the Atari cartridge rendering software VCSMC, in joking homage to the excellent XBMC.

I preprocess a video using ffmpeg to demultiplex the audio and individual image frames. Each frame is processed by a program I am calling picc, short for picture compiler. Most of my development effort has been focused here so far, I am saving sound work for later. The sound compiler program would logically be called sndc. The output of picc is asssembly code or 6502 binary bytecode that when executed on a VCS renders an individual frame. A final program called ldav will link the individual frame codes with the audio code into a single binary blob usable by the host console program.

This is where I have allowed one concession to myself for modernity - I am placing no limits whatsoever on code size. Original VCS programs where limited to 4 kilobytes of program size. Several bank-switching tricks emerged during the product life span of the VCS which pushed that number as high as 8 or even 16K. Homebrew developers have subsequently devised new bank-switching strategies that can push ROM size to 32K (with DPC+or even 64K on simulators (like 4A50).

While I could allow some restrictions on code size this would ultimately result in making additional sacrifices with the quality of the image rendered. As I am already losing a lot of capability when trying to render complex imagery at 60 frames per second on the Atari I didn't want to have to worry about losing even more. Plus even with the most extreme compression and loss of quality imaginable I calculate that making the entire program fit into ROM size restrictions would only allow for a few seconds of video. 

Since I want to push the graphics capability of the VCS to its very limit I want to be able to use every instruction available on every scan line. Assuming an average storage cost of 1 byte per machine cycle that means a single frame is over 13K! My plan is to modify an existing or build-alike something similar to the Harmony Cartridge that will allow for arbitrary simple program size. There is one catch, of course, which is that these programs can't jump or call subroutines without great dificulty, but this is not a problem for my design as it is essentially streaming video and audio, so doesn't need to do any branches. This means that ldav will most likely output a simple blob format that a runtime program will be able to provide to the appropriate banks for 6502 execution at runtime.

That pretty much sums up my overall approach to the problem. Feel free to check out the code on GitHub. I'll talk more about my progress rendering video content in subsequent posts. Happy hacking!