Looking at the world of media: from music to RIA.

Flash Internals: Just-in-time (JIT) compilation (part four)

November 12th, 2007 Posted in ActionScript, Flash Player, Rich Internet Applications

rune_flash.gifDuring the Adobe Max Conference I had the chance to sit in on “Flash Player Internals” session put on by Jim Corbett and Lee Thomason. Of all the sessions I attended at Max, this was by far the best and most informative for me. It was deep, fast and packed with little nuggets that I had heard in passing but never had fully explained. I am going to try and recap the session as best as I can and if you see any incorrect data please let me know!

Read Part One: Motto’s and Percentages. Read Part Two: Byte Code, Control Tags, Renderers… oh my! Read Part Three: The ActionScript Virtual Machine

Update: Rick Reitmaier (Sr. Scientist at Adobe) has provided some feedback on this article.

A Word Of Caution

One of the topics that keeps coming up when people talk about the new Flash Player is that is provides Just In Time (JIT) compilation. So what does this mean and how does it work? That, my friends, is the big one. What I am about to do is take a single paragraph of notes, taken from a single slide at the presentation and try to decipher what was said into something semi-understandable (at least to me). I make no promises, but since the presentation I have done a bit of research on this topic and I feel like I am starting to make sense of it all. If you have questions or see issues with my analysis, please feel free to post comments (I want to try them out!) . Okay, here we go…

What Is This JIT?!

The concept of JIT’ing code is to take byte code, compile it into the supported machine language for the proper chipset and enable reuse. Wha? Yeah, that is what I thought when I first read into it. Before I jump whole hog into this, let me take a split second to go over (or refresh) your knowledge on machine code.

Machine code is the compiled code that can be run directly by the CPU. When we compile an application using a language like C++ we are take our code and compile it down to a set of machine code instructions that can be passed to the CPU. The benefits of compiling an application is that it is in the machine code form when it gets delivered to the user. This complied version is built only for the specific OS/hardware match you are delivering it to and it runs very, very fast. The downside is that if you want to take your app from OS X to Linux you need to recompile your application for the new OS/hardware and then give the Linux user that version (this example ignores any OS specific code of course).

This is where interpreted languages come in to play. The benefit of interpreted languages is that you can take your code raw (uncompilable) and in-essence compile it line by line in real time. This means that you only have to write and compile your code once. From there you can run your code on any OS/Hardware that has a supported interpreter. This is where the Java slogan Compile Once, Run Anywhere starts to shine.

The Flash Player is really just an interpreter, but instead of line by line compiling by the interpreter we pre-compile the code into Byte Codes and then pass these Byte Codes to the interpreter. Typically, the AS3 byte code is handed one by one to the VM which then leverages the specific platforms interpreter (aka Flash Player) to compile the passed data to the proper machine code. Byte Code is optimized for performance, but you still have to compile each one at a time as they are passed into the interpreter. This becomes a performance bottleneck if your code makes a call to a previously referred Byte Code (ala function). If this is the case (and it happens ALL the time) the same Byte Code has to be re-complied when it is queued up. Not exactly fast nor efficient.

So now we get to just-in-time. JIT is the idea that we take the commonly used Byte Codes, compile them once and then store the compiled version in memory. When the Byte Code is required again the complied code is pulled from memory and handed to the CPU, allowing the Player to skip the compile step and therefore massively increasing the applications execution speed.

The Flash Player 9’s JIT engine is a two pass system. The first pass uses the Macromedia Intermediate Representation (MIR) process to generate a map-like data structure of the code. The MIR map is sorta like a flow chart that diagrams the code. This MIR allows for the JIT engine to then analyze and begin optimizing the code before the second pass creates the machine code. Once the MIR is generated there are few types of optimizations that also are applied during the first pass of the JIT engine:

  • Early Binding From what I can determine this means that the JIT itself types the values of the references at the start of th JIT vs. late binding the types when the value is actually in hand. This is a tough one to verify since so little is out there as of now but a comment by Edwin Smith (inventor of Tamarin) kind of led me to this conclusion. One thing to note, is for this optimization to occur you need to type your AS3 code. You will constantly hear people say that typing your code increases performance, and it makes a lot more sense due to early binding.
  • Constant Folding This is the process of finding values that never change, such as constants, integer values and calculations that only occur once. The JIT engine then replaces the calculated values in the MIR vs. having references to the method or instance. A little more info on constant folding.
  • Copy and Constant Propagation This ties into Constant Folding by providing the service that looks up the constant and replacing the calculated values, but it doesn’t stop there. This process also looks to simplify conditional branches by determining the outcome and if it always returns in a specific way. If so then it is replaced with the result versus leaving in the conditional calculation. Here are examples of how constant propagation works.
  • Common Sub-Expression Elimination (CSE) This is the process that finds calculations that are used in multiple places and determines if it makes sense to replace the calculation with a variable that holds the calculated value. By doing this it can greatly simplify how much data has to be calculated. Here are a few examples how how Common Sub-Expression Elimination can optimize calculations.
One thing that I am curious about and which I can not answer is; how much of the code is in the MIR? Is it all the code? Some of the code? is it multiple MIRs? One thing that was mentioned during the presentation is that JIT is only applied to some of the code where it makes sense to optimize. For now I am just going to assume that its all there in the MIR and the JIT then figures out what to do.

Once the MIR is completed and optimized, the second pass begins. During this pass the native machine code for the OS/Hardware is generated and the instruction selection algorithm is run upon the MIR to generate the most optimal instruction set. The goal of the instruction selection algorithm is to generate the least amount of code possible but at the same time covering all the required functionality. By limiting the code this requires less processing time for the CPU. Next up is the Register Allocation where the JIT attempts to manage the CPU registers (or slots) in the most optimal way. Lee mentioned that the JIT uses a linear scan algorithm to find the stalest register and uses that for the next slot. This algorithm greatly decreases the processing time for by providing a much simpler calculation versus the typical computational heavy Register Allocation process.

Finally (and this is in no real order) Dead Code Elimination (DCE) is applied. This is the process of hunting down unreachable or un-executable code and removing it from the instruction set. This helps reduce the overall size of the instruction set and making it perform quicker by reducing the amount of data. If you feel like it, read up on the theory of DCE.

Wrap this JIT up, dar’nab it!

Once this is all completed you have a nice set of clean machine code to run. As you can tell this is not a trivial process and to apply this to everything could potentially take more time then just compiling the byte code directly in the classic interpreter fashion. Yet when done correctly you can see a massive performance increase for your code. Oh, and one little thing Lee mention… this is the first JIT with a dynamic enabled language.

So there you go, a semi-in-depth look at the world of JIT in Flash Player 9. This has actually been a ton of fun to write since I had to look up 90% of this on the web before I attempted to get my thoughts down. I finally feel like I understand the concepts behind JIT and hopefully you do too after reading this!

  1. 8 Responses to “Flash Internals: Just-in-time (JIT) compilation (part four)”

  2. By Alan on May 7, 2009

    Wow this is incredible stuff. I just saw Jim talk at FITC in Toronto and I've been thirsting for more since.

    I do have a question that maybe you can answer. Suppose I have a movieclip in my library set to export, lets call the class BlueSquare. In this example nothing is on the stage, and the library has a movieclip set to export. Nested in that class is another movieclip, and it has an instance name of nestedMovieClip.

    Now somewhere in my code, I create an instance of that class BlueSquare:

    var square:BlueSquare = new BlueSquare();

    then I want to modify that nested movieclip:

    square.
    nestedMovieClip.x = 100;

    My question: At compile time, does the player know the existence of that nested movie clip, and therefore can the JIT create a reference to it on the first pass before it is used?

    Whew.

    Again, thanks for this article, I wish there was more stuff on this.

  3. By James on May 7, 2009

    Alan,
    I assume by “compile time” you mean the Flash Player's compilation of the byte code to machine code so that it can be executed, not the Flash compiler you use to create the SWF.

    It depends upon when the construction of the nested clip occurs. By the time the Player gets to the your x value change the nested clip has had to be constructed and therefore the player will know about the instance.

    The bigger question is if the method will be compiled by the JIT or not. JIT'ing doesn't occur on an Object or Class level, but on a bytecode level. Its possible that setting the x value may be JIT'ed but the MovieClip will not be compiled down entirely.

    What is not clear to me, is when the actual JIT and MIR process is applied. Is it a look ahead process examines the entire app stack, or does it interpret every bytecode stack as its processed and on second execution does it apply the JIT? I have a feeling that the MIR is built on the first pass of the bytecode execution and the next time the code is called JIT is applied.

    If that is the case, then your reference to the x value is not JIT'ed the first time you make the call, but the second time you do it will be JIT'ed. This is a really interesting topic, because at DevelopmentArc we have begun looking at performance increases around JIT optimization and knowing this could really help.

    I hope this answered your question, or at least pointed you in the right direction.

    J.

  4. By ultraky on May 10, 2009

    Thanks for taking the time to answer, I'm no engineer but trying hard to understand.

    Before when I said 'Compiler' I was thinking of when Flash converts AS3 into Byte Code, but I was wrong to call it that. When that .swf is created, is it doing anything else besides just converting AS3 into Byte Code? I thought it was also pre-allocating and organizing 'stuff' for when that bytecode is loaded.

    Reading your article again, I can now see there there is a 2nd compilation when the Byte Code is loaded into Virtual Machine, and the VM then allocates 'stuff'. After that does the JIT take over and task out commands? Am I close?

    So far, it seems that the situation I referred to before is an area of interest to those who are looking to optimize. I have been doing lots of benchmarking using gSkinner's testing suite. So far my hypothesis has been correct, and I am seeing massive, massive performance gains when nested clips are declared before functions are performed on them. Also, the more objects that are inside that Movieclip ( either drawing objects, text fields, or symbols ) the more time it takes to access that specific object.

    If I understand how the JIT works, this makes sense because the JIT doesn't know about that nested object before being told to manipulate it, it has to search for it. Does that make sense?

    I will do a proper write up very shortly on my blog when I have really tested everything, when that happens I would love to hear your thoughts.

  5. By James on May 11, 2009

    You where correct in calling the AS3 to ByteCode process, compilation. The issue semantical issue that we face is that there is technically two compilation passes in the new Flash Player world. The first compilation pass is creating the SWF (which wraps the ByteCode) the second is when the AVM+ takes the ByteCode and compiles it to machine code for the specific platform the Player is running on.

    When the first SWF compilation is being run there are optimization being applied, but they are limited to what the ByteCode can represent. I don't know exactly what optimizations are being made during this phase, but essentially all the Classes, objects, functions, properties, etc are being boiled down into commands that are executed by the AVM at run-time. The AVM is then responsible for converting these commands into machine code (second compilation) that the specific OS can understand. The reason we use the ByteCode interim step is that we want the same SWF to run on multiple OSs. This is why the Flash Player is still considered an interpreter because the SWF is not an OS native compiled application.

    I feel your description is really close, what you are describing is the benefit of sealed classes versus dynamic classes. MovieClip is marked as a dynamic class, which means that you can attach new properties to it at run-time. Looking back at your first example, _nestedMovieClip is not a defined property on the MovieClip class, you have added this property. To access this property the AVM has to lookup the property on the instance to first see if it exists and if so then it can access it. This process is similar to using “for in” on an object, you have to loop over all the properties trying to see if it has the one you want. This is a time consuming process as you have found out through your testing.

    More then likely, you are seeing the performance benefit by making a reference to the property and then accessing the reference because the lookup no longer has to occur every time you access the property. This is the same principle of using a pre-calculated length value when looping over an array instead of accessing the “.length” property each time. The act of accessing the property, i.e. “myArray.length” requires the lookup and calculation of the length to occur every time we perform a loop. If we crate a variable before we start the loop and set it to “myArray.length” then the loop will no longer call the length getter during every pass, making our loop run a lot faster. If we apply this to your example, then we can avoid the lookup by storing the instance in a reference variable and therefore the AVM only has to lookup the value once, to set the reference and then can access it faster from there.

    We can actually take this one step further. If you read or see most peoples articles/presentations on Flash performance they always start with “use strong typing”. The reason behind using strong typing is the same as making a reference, it helps mitigate AVM lookup time. If we take your BlueSquare example we want to make sure that BlueSquare has _nestedMovieClip declared as a property of the Class and that it is typed to MovieClip. By having this reference on the BlueSquare Class the AVM will not have to do full scanning lookup (ex: for in) on the BlueSquare instance to see if _nestedMoveiClip is both declared and has a value and that the value “x” also exists on the _nesteMovieClip instance.

    By explicitly declaring _nestedMovieClip we have defined BlueSquare as a sealed class, which will give us a lot of performance gains at run-time. Dynamic Classes are great (Classes that can have properties added at runtime) but they come with a huge cost. I always recommend avoiding dynamic classes unless you absolutely have to and even then, try not to use them.

  6. By Annie on Sep 30, 2009

    Hi, I have a question. In AVM2, both of the interpreter and JIT can compile the AS code.But when to use interpreter,when to use JIt?

  7. By Lorenzo on Nov 3, 2009

    Dear James,

    I really appreciate your blog!!! :) I have a question for you. You talked about SWF Frame and Flash Player Frame. So, could you give me the definition for these terms? SWF Frames are contained into a swf file and are like a logical units that contain assets and as code. Instead, a Flash Player frame is the time period between two consecutive enterframe events (controlled by frame rate). So,what is the relationship between them? In particular, I would know if they are paired each other or not.

    Thanks in advance. Best Regards.

  8. By James Polanco on Nov 3, 2009

    Lorenzo, From what I understand the concept of a SWF frame and a Player Frame are essentially the same. When using Flash Authoring to create animations you are telling the Player to render the changes defined for each frame. When the player runs the SWF, it will attempt to play each SWF frame as defined, i.e. render the content changes.

    The core difference between the frame in the SWF vs. the player frame is how the frame is constructed. I recommend checking out Sean Christmann’s article on how frames are synthesized in the player. http://www.craftymind.com/2008/04/18/updated-elastic-racetrack-for-flash-9-and-avm2/

    Even with this approach, the frames still logically map together from a conceptual perspective. For me, I look at them as one in the same… even though they maybe defined technically different under the hood.

  9. By Lorenzo on Nov 4, 2009

    Dear James,

    thank you for your reply! :)

    Frames can be a tricky topic! They are difficult to understand. I will do research on them!! Thank you again. Best Regards.

You must be logged in to post a comment.