During the Adobe Max Conference I had the chance to sit in on “Flash Player Internals” session put on by Jim Corbett and Lee Thomason. Of all the sessions I attended at Max, this was by far the best and most informative for me. It was deep, fast and packed with little nuggets that I had heard in passing but never had fully explained. I am going to try and recap the session as best as I can and if you see any incorrect data please let me know!
Update: Rick Reitmaier (Sr. Scientist at Adobe) has provided some feedback on this article.
A Word Of Caution
One of the topics that keeps coming up when people talk about the new Flash Player is that is provides Just In Time (JIT) compilation. So what does this mean and how does it work? That, my friends, is the big one. What I am about to do is take a single paragraph of notes, taken from a single slide at the presentation and try to decipher what was said into something semi-understandable (at least to me). I make no promises, but since the presentation I have done a bit of research on this topic and I feel like I am starting to make sense of it all. If you have questions or see issues with my analysis, please feel free to post comments (I want to try them out!) . Okay, here we go…
What Is This JIT?!
The concept of JIT’ing code is to take byte code, compile it into the supported machine language for the proper chipset and enable reuse. Wha? Yeah, that is what I thought when I first read into it. Before I jump whole hog into this, let me take a split second to go over (or refresh) your knowledge on machine code.
Machine code is the compiled code that can be run directly by the CPU. When we compile an application using a language like C++ we are take our code and compile it down to a set of machine code instructions that can be passed to the CPU. The benefits of compiling an application is that it is in the machine code form when it gets delivered to the user. This complied version is built only for the specific OS/hardware match you are delivering it to and it runs very, very fast. The downside is that if you want to take your app from OS X to Linux you need to recompile your application for the new OS/hardware and then give the Linux user that version (this example ignores any OS specific code of course).
This is where interpreted languages come in to play. The benefit of interpreted languages is that you can take your code raw (uncompilable) and in-essence compile it line by line in real time. This means that you only have to write and compile your code once. From there you can run your code on any OS/Hardware that has a supported interpreter. This is where the Java slogan Compile Once, Run Anywhere starts to shine.
The Flash Player is really just an interpreter, but instead of line by line compiling by the interpreter we pre-compile the code into Byte Codes and then pass these Byte Codes to the interpreter. Typically, the AS3 byte code is handed one by one to the VM which then leverages the specific platforms interpreter (aka Flash Player) to compile the passed data to the proper machine code. Byte Code is optimized for performance, but you still have to compile each one at a time as they are passed into the interpreter. This becomes a performance bottleneck if your code makes a call to a previously referred Byte Code (ala function). If this is the case (and it happens ALL the time) the same Byte Code has to be re-complied when it is queued up. Not exactly fast nor efficient.
So now we get to just-in-time. JIT is the idea that we take the commonly used Byte Codes, compile them once and then store the compiled version in memory. When the Byte Code is required again the complied code is pulled from memory and handed to the CPU, allowing the Player to skip the compile step and therefore massively increasing the applications execution speed.
The Flash Player 9’s JIT engine is a two pass system. The first pass uses the Macromedia Intermediate Representation (MIR) process to generate a map-like data structure of the code. The MIR map is sorta like a flow chart that diagrams the code. This MIR allows for the JIT engine to then analyze and begin optimizing the code before the second pass creates the machine code. Once the MIR is generated there are few types of optimizations that also are applied during the first pass of the JIT engine:
- Early Binding From what I can determine this means that the JIT itself types the values of the references at the start of th JIT vs. late binding the types when the value is actually in hand. This is a tough one to verify since so little is out there as of now but a comment by Edwin Smith (inventor of Tamarin) kind of led me to this conclusion. One thing to note, is for this optimization to occur you need to type your AS3 code. You will constantly hear people say that typing your code increases performance, and it makes a lot more sense due to early binding.
- Constant Folding This is the process of finding values that never change, such as constants, integer values and calculations that only occur once. The JIT engine then replaces the calculated values in the MIR vs. having references to the method or instance. A little more info on constant folding.
- Copy and Constant Propagation This ties into Constant Folding by providing the service that looks up the constant and replacing the calculated values, but it doesn’t stop there. This process also looks to simplify conditional branches by determining the outcome and if it always returns in a specific way. If so then it is replaced with the result versus leaving in the conditional calculation. Here are examples of how constant propagation works.
- Common Sub-Expression Elimination (CSE) This is the process that finds calculations that are used in multiple places and determines if it makes sense to replace the calculation with a variable that holds the calculated value. By doing this it can greatly simplify how much data has to be calculated. Here are a few examples how how Common Sub-Expression Elimination can optimize calculations.
Once the MIR is completed and optimized, the second pass begins. During this pass the native machine code for the OS/Hardware is generated and the instruction selection algorithm is run upon the MIR to generate the most optimal instruction set. The goal of the instruction selection algorithm is to generate the least amount of code possible but at the same time covering all the required functionality. By limiting the code this requires less processing time for the CPU. Next up is the Register Allocation where the JIT attempts to manage the CPU registers (or slots) in the most optimal way. Lee mentioned that the JIT uses a linear scan algorithm to find the stalest register and uses that for the next slot. This algorithm greatly decreases the processing time for by providing a much simpler calculation versus the typical computational heavy Register Allocation process.
Finally (and this is in no real order) Dead Code Elimination (DCE) is applied. This is the process of hunting down unreachable or un-executable code and removing it from the instruction set. This helps reduce the overall size of the instruction set and making it perform quicker by reducing the amount of data. If you feel like it, read up on the theory of DCE.
Wrap this JIT up, dar’nab it!
Once this is all completed you have a nice set of clean machine code to run. As you can tell this is not a trivial process and to apply this to everything could potentially take more time then just compiling the byte code directly in the classic interpreter fashion. Yet when done correctly you can see a massive performance increase for your code. Oh, and one little thing Lee mention… this is the first JIT with a dynamic enabled language.
So there you go, a semi-in-depth look at the world of JIT in Flash Player 9. This has actually been a ton of fun to write since I had to look up 90% of this on the web before I attempted to get my thoughts down. I finally feel like I understand the concepts behind JIT and hopefully you do too after reading this!