Boosting Bytecode Efficiency: The Power of GCC’s Label as Value
Display Technology, UI Design, Virtual Insights
GEMscript and Virtual Machines
If you’ve been using GEMstudio, you’re probably familiar with our programming language, GEMscript. We designed GEMscript to be a user-friendly, C-like language with the intention of enabling a “write once, run anywhere” approach. This means it can be used seamlessly across all our platforms, including GEMplayer on PC and various hardware devices.
GEMscript is a VM (virtual machine) based language, meaning your code gets compiled into “bytecode” and runs in a VM interpreter instead of being compiled down to native machine code. This allows us to achieve our goal of running the same compiled code across multiple platforms. Additionally, by sandboxing GEMscript from our OS in a VM, we avoid some pitfalls of writing in native C, such as unsafe memory access and code execution. However, there’s a big trade-off when using VMs: speed.
It’s no secret that VMs are generally slower than native machine code. Therefore, optimizing VMs is crucial, particularly for limited hardware where every bit of speed counts. So, I rolled up my sleeves and started exploring ways to optimize our code. And guess what? I found a neat and easy trick for speeding up opcode dispatch. This article discusses the performance improvements I achieved by optimizing opcode dispatch in GEMscript using GCC’s “Label as Value” feature, demonstrating a significant speed increase.
Enhancing VM Performance: Speeding Up Opcode Dispatch
When I started looking at our VM, I realized that focusing on opcode dispatch could yield significant performance gains. Efficient opcode dispatch is key to faster execution because it reduces the overhead of interpreting each instruction in the VM. Let me put this in simpler terms:
Imagine you’ve got a list of vocabulary words to study. One way to do this is by using the index at the back of a dictionary. For each word, you:
Look up the page number in the index.
Flip to the correct page to read about the word.
Return to the index for the next word.
Doing this for each word is straightforward but slow and tedious.
Now, imagine using index cards with all the vocabulary words printed in order. For each word, you:
Read the information on the top index card.
Move instantly to the next card.
No more flipping back and forth! This method is much faster and more efficient, even though it takes a bit of effort upfront to set up the index cards. Similarly, by optimizing our opcode dispatch, we can make our VM run instructions more quickly and efficiently.
The Basics of VM Interpreting
Let’s start with a basic interpreter loop for a VM, which is similar to using the dictionary index method. Instead of a list of vocabulary words, we have bytecode, which is a list of opcodes in memory. Instead of searching an index for the correct page, we use a switch statement, with each opcode case representing a different operation. This switch case is in a loop, so we repeatedly look for our current opcode and execute it until we run out of opcodes. Here’s a simple example in C:
void runVM(){
uint32_t pc = 0; // program counter
while (1) {
uint32_t opcode = memory[pc++];
switch (opcode) {
case OP_ADD:
// do addition operations here
break;
case OP_SUB:
// do subtraction operations here
break;
...
}
}
}
Threaded Code Execution: A More Efficient Approach
We can do better! Remember the index card approach? The programming concept that mirrors this is called threaded code. It lets us execute an opcode and then move directly (or indirectly) to the next opcode, creating one continuous thread of execution. The previous example isn’t a continuous thread because it must jump back to the start of the loop and search again for every opcode.
Using labels in C, we can replicate this approach by using `goto` to jump between instructions instead of searching for them in a switch statement. However, there’s a challenge: how can we know which label to go to from a list of opcode values? We need to map the opcodes to the corresponding labels in our function so we can easily jump to the right place.
void runVM(){
uint32_t pc = 0; // program counter
uint32_t opcode = memory[pc++];
// FIX: find first label from opcode?
goto FIRST_LABEL;
OP_ADD:
// do addition operations here
// FIX: find next label from opcode?
goto NEXT_LABEL;
OP_SUB:
// do subtraction operations here
// FIX: find next label from opcode?
goto NEXT_LABEL;
...
}
Introducing GCC Label Values: A Game Changer
While digging around for ways to speed up our VM, I discovered that a very common way to implement threaded code in C is with a specific feature in GCC called “Label as Value”. Even after eight years of working with C and C++, I didn’t know about this! The official docs say that by using the `&&` operator, you can reference the address of a label within a function. The GCC docs also describe the exact scenario we are in:
“Another use of label values is in an interpreter for threaded code. The labels within the interpreter function can be stored in the threaded code for super-fast dispatching.”
Optimizing Our VM with GCC Label Values
Armed with this new knowledge, I set out to optimize our VM by storing label addresses and mapping opcodes to these labels. Here’s a revised example where our bytecode is stored in `memory[]`, and we have a macro called `NEXT()` to jump to the next label based on the current opcode.
#define NEXT(pc) // explained later...
void runVM(){
// pointer to first opcode loaded into memory
uint32_t* opcode = &memory[0];
// array of opcode labels
// (GCC Label as Value)
static const void * const opcodelist[] = {
&&OP_ADD,
&&OP_SUB,
...
}
// find first label and jump
NEXT(opcode);
OP_ADD:
// do addition operations here
NEXT(opcode);
OP_SUB:
// do subtraction operations here
NEXT(opcode);
...
}
Now, we have labels for each operation and a table of addresses to these labels. The `NEXT()` macro lets us jump to the next opcode efficiently. Depending on our choice of indirect or direct threading, this macro can be implemented in different ways.
Indirect Threading: Taking an Extra Step
First, I tried using indirect threading. This approach sticks to the logic we talked about earlier: when the interpreter jumps to a new opcode, it first looks up the mapped opcode-to-label address indirectly. We store our label addresses in `opcodelist`, so by using the opcode as an index into `opcodelist`, then dereferencing and jumping to the address, we’re using indirect threading.
Here’s how the `NEXT()` macro looks with indirect threading:
#define NEXT(pc) goto *opcodelist[*pc++]
// Example: opcode points to an index of opcodelist
uint32_t* opcode = &memory[0];
NEXT(opcode); // expands to `goto *opcodelist[*opcode++]`
Direct Threading: Streamlining Execution
While indirect threading is simple and fast, it’s not quite like our index card analogy. With the index cards, they were pre-made and set up in order. You just had to move to the next card, which already had the vocab word on it. We can achieve something similar with direct threading. If we can fix up our opcodes in memory before executing them, then we don’t need to use the opcodes as a map to label addresses; the opcodes themselves can be the addresses.
Here’s how the `NEXT()` macro looks with direct threading:
#define NEXT(pc) goto **pc++
// Example: opcode points to a pointer to the label in opcodelist
uint32_t* opcode = &memory[0];
NEXT(opcode); // expands to `goto **opcode++`
Of course, this requires the opcode size to match the address size of your platform (32bit, 64bit, etc.), but for us it works perfectly. This also requires our VM to do the fixup when the bytecode is first loaded, but it’s only a one-time speed decrease that doesn’t impact the overall speed of execution much. Let’s assume in our examples that this fixup is already applied when using direct threading.
Measuring the Speed Boost
To test these optimizations, I wrote a new, simple VM specifically for benchmarking purposes, separate from GEMscript. This test VM can handle basic arithmetic and conditional jumps. To evaluate its performance, I used a prime number algorithm to find the 65,535th prime number and tested four scenarios:
Native C
VM with a Switch Statement
VM with Indirect Threading (GCC labels as value)
VM with Direct Threading (GCC labels as value)
All options were compiled with GCC and -O3 optimizations. Here are the results:
Native: 70ms
Switch Statement: 1140ms
Indirect Threading: 400ms
Direct Threading: 380ms
Normalized to the switch statement speed:
Native: 16.29x faster
Switch Statement: 1x faster (reference speed)
Indirect Threading: 2.85x faster
Direct Threading: 3x faster
Wrapping Up
These optimizations didn’t match the speed of native C execution, but that was expected. However, we did achieve a 2.85x to 3x performance boost by adjusting our interpreter loop to threaded code using GCC’s labels as values! For such a simple change, that’s a very welcome speed increase.
We’re excited to roll out these optimizations in an upcoming release of GEMstudio. These changes will make a difference in the performance of any code heavy applications that may have slowed down in the past. To stay updated on the latest features and enhancements to GEMstudio (like this one), sign up for our newsletter: [link for newsletter sign-up goes here]
Understanding the Aesthetic Usability Effect The Aesthetic-Usability Effect refers to a user’s tendency to perceive more aesthetically pleasing designs as more usable. This phenomenon, deeply rooted in human psychology, plays a crucial role in the user experience and interface design. The principle suggests that users are more likely to tolerate minor usability issues in a product or system if they find its design appealing. This overview aims to shed light on this intriguing effect by defining it, exploring supporting research findings, and delving into the psychological principles that explain why.
A Real-Time Operation System (RTOS) fundamentally differs from general-purpose operating systems like Windows or macOS. While the typical OS can afford occasional delays or a leisurely approach to task management (imagine casually stirring a risotto while chatting with guests), an RTOS must adhere to strict timing constraints (think of deftly flipping a steak at just the right second for the perfect sear). The stakes are high, and there’s no room for error. What Defines a Timing Critical Application? Timing critical applications are those in which the correct functioning of a system within.
Introduction: Welcome back! In the first post in this series, we delved into the world of DIY UI design for embedded systems and introduced the concept of using off-the-shelf UI Kits and Element Packs. Today, we’re taking the next step in this journey. You’ve chosen your UI Kit, and now it’s time to extract the assets you need to create a cohesive and visually stunning user interface. This blog will focus on two popular types of UI Kits: Photoshop documents (.psd) and Figma files (.fig). We’ll cover the essentials of working
A Real-Time Operation System (RTOS) fundamentally differs from general-purpose operating systems like Windows or macOS. While the typical OS can afford occasional delays or a leisurely approach to task management (imagine casually stirring a risotto while chatting with guests), an RTOS must adhere to strict timing constraints (think of deftly flipping a steak at just the right second for the perfect sear). The stakes are high, and there’s no room for error.
The Aesthetic-Usability Effect refers to a user’s tendency to perceive more aesthetically pleasing designs as more usable. This phenomenon, deeply rooted in human psychology, plays a crucial role in the user experience and interface design. The principle suggests that users are more likely to tolerate minor usability issues in a product or system if they find its design appealing.
Imagine you’re an architect tasked with designing a skyscraper. Every aspect, from the materials used to the structural framework, needs to be meticulously planned to withstand environmental stresses, ensuring the building stands tall and secure for decades. Designing electronic systems, particularly those involving sensitive components like touchscreens, involves a similar level of precision and foresight.
If you’ve been using GEMstudio, you’re probably familiar with our programming language, GEMscript. We designed GEMscript to be a user-friendly, C-like language with the intention of enabling a “write once, run anywhere” approach. This means it can be used seamlessly across all our platforms, including GEMplayer on PC and various hardware devices.
As a seasoned firmware engineer, I’ve encountered my fair share of perplexing bugs. But few have been as challenging and enlightening as an insidious SDRAM initialization bug I stumbled upon in the free software provided by a prominent chip manufacturer. In this blog post, I’ll take you through the journey of how this bug was discovered, the process of unraveling its mysteries, and the eventual triumph of fixing it.
The continuing advancement in capacitive touch technology has made it possible for modern capacitive touch screens to become the leading, or primary, user interface of choice. Early capacitive touch screens were limited in capability, whereas today’s touch screens can detect multiple fingers, reject water, know when gloves are worn, and work through thick protective glass or acrylic.
In the realm of User Experience (UX) Design, aesthetics extend far beyond the mere appearance of a product. They encompass the overall sensory experience a user encounters when interacting with a digital interface. This includes the layout, color scheme, typography, and imagery that collectively evoke an emotional response. Aesthetics in UX design play a pivotal role in the digital landscape, as they significantly influence user engagement, satisfaction, and, ultimately, the success of a product or service.
In the realm of embedded firmware engineering, creating a product that not only functions flawlessly but also boasts a superior Human-Machine Interface (HMI) is a challenge worth embracing. For engineers with advanced technical experience but limited exposure to User Interface (UI) and User Experience (UX) design, differentiating your HMI from the competition may seem daunting. Fear not – in this guide, we’ll explore practical strategies to set your HMI apart without delving into the intricacies of UI/UX design.
As engineers, our primary focus is on functionality and performance. We thrive on solving complex problems and pushing the boundaries of technology. But when it comes to UI design, we often find ourselves out of our depth. You want it to be intuitive, visually appealing, and seamlessly integrated with your project. This type of design requires a different set of skills – skills that many of us simply don’t possess.
In the ever-evolving world of digital experiences, the terms User Experience (UX) and User Interface (UI) are often used interchangeably, leading to confusion about their roles and significance. Understanding the intricacies of product design requires a clear distinction between User Experience (UX) and User Interface (UI) design. While often used interchangeably, these two disciplines encompass different aspects of the product development process and directly impact the usability and aesthetic appeal of the final product. This article explores the definitions, roles, and importance of both UX and UI design, shedding light on their unique contributions to creating successful products.
In the ever-evolving world of touchscreen technology, two types of touchscreen technology have predominantly occupied the market: resistive and capacitive touchscreens. Each of these technologies offers unique features and caters to different applications. Let’s dive into a comparative analysis to understand their distinct characteristics and help you make the correct choice for your application.
In today’s digital era, businesses have come to realize the importance of providing a seamless and enjoyable User Experience (UX). It is no longer just a nicety but a strategic necessity. Besides enhancing user satisfaction, a well-crafted UX can significantly impact a company’s bottom line. In this article, we explore the business case for UX and delve into the tangible benefits of investing in User Experience. And discuss how it can translate into a substantial Return on Investment (ROI).
You ask, we deliver! From brand new features to better functionality, we are constantly rolling out new improvements requested by our users. Learn more about what’s new in GEMstudio Pro 4.0.
Since the last official release of GEMstudio Pro version 3.4.0.2, we have added many customer requested features, some major enhancements and a list of bug fixes.
Dickensian scenes of miniature Christmas villages have been a staple of indoor holiday decorations at my house for decades. This year I wanted to spruce things up with something a little different, incorporating Amulet’s MK-070C-HP display and GEMstudio Pro software.
Terrifying Jack-O-Lanterns take a fair amount of planning and artistry. This is not one of those. In this one-day build I make an electrifying Jack-O-Lantern using the spookiest components that haunt my closet. Read how I jazzed up my pumpkin with Amulet’s STK-043-HP and Arduino Uno to light up our porch on Halloween.
Amulet can do Wireless?? File this under the “I didnt know it could do that” category. The MK-070C-HP actually has a special header dedicated to many types of devices that conform to the Digi XBee™ form factor, meaning you can add pre-certified wireless functionality quickly and easily. This project demonstrates the use of an Esprissif ESP8266 “Bee” to take a stock thermostat demo and add live forecast data from the weather service Wunderground.com.
What happens when you put 2 dueling coders in the same room, with the same goal of designing a GUI, but one person does it the “old fashioned” way, and the other with the current GEMstudio Pro™? Watch the video to see how it all plays out. Who will win – will it be Johnny or Minta? Place your bets and watch the short video!
The word is getting out – it’s really easy to incorporate Adruino with Amulet displays!
One user who wanted to add a full color LCD to Arduino wrote, “The easiest method to communicate data between Arduino and any LCD display, can be found with the Amulet Technologies’ Arduino library. Amulet has cleverly taken out the need to know their communication protocol or any type of serial communication.”
Have you ever wondered how to change the language on your HMI touchscreen display to Chinese, French, Spanish, or maybe … Klingon?
Amulet has simple step-by-step instructions to do just that! In our example, we show you how to use the multi-language feature in Amulet’s GEMstudio Pro™ software to translate from English to Klingon (but you can do the same steps for any language). Beam me up, Scotty!
Author Bio:
Ian Klask is a firmware engineer passionate about C and C++ programming. He began his journey through video game development and DIY electronics projects. Ian now channels his creativity and technical skills into optimizing embedded systems. Follow his posts for insights and projects in software and embedded firmware engineering.