The process of checking and verifying that the system clocks of computers are in sync with the time source is known as time synchronization. Nowadays, with a large number of contemporary computers spanning across locations and are performing time-critical operations, it is essential to have the clocks are synchronized and accurate with in the order of few tens of Nano-seconds. Some of the use cases for such a it might be necessary to time stamp event occurrences, co-ordination of media broadcasting, phase corrections in small cell base stations, power generation, air traffic control, timing stock trading. One of the easiest and proven mechanisms is the use the constellations of GPS/GNSS satellites and create a globally acceptable clock with high accuracy. 

The key components of Time synchronization are Grandmaster, Master and slave. 

  • The Grandmaster clock is the major time source in a multi-clock network, sending time downstream to other master clocks. It has exceptionally accurate timing synchronization.
  • There could be options Master clocks acting as distributor aligning Grandmaster and slave clocks.  
  • A slave clock is a device or clock that synchronizes with the master clock but does not provide timing.

In this blog, with the advent of powerful embedded systems, we will discuss in detail about the GPS grandmaster and realizing it with a low-cost ARM based Embedded Linux system and the associated technologies. 

GPS Grandmaster

As mentioned, Grandmaster clock is the primary source of clock in the network. Some of the major features expected of a Grandmaster includes:

  • Accuracy – This is the most important feature of Grandmaster, and it is determined by system design, timestamping accuracy, and many algorithms such as BMCA (Best Master Clock Algorithm) and processes that run in the system (e.g., filtering, servo, etc). 
  • Scalability – It refers to the overall number of physical interfaces a Grandmaster can have as well as the number of clock instances it can handle. 
  • Resiliency– It is the capacity to handle numerous timing inputs that act as alternate time sources.  
  • Portability – Sometimes it is essential to have the Grandmaster mobile. 

Today, every part of the world is practically covered with Global navigation satellite systems (GNSS)such as USA’s NAVSTAR Global Positioning System (GPS), Europe’s Galileo

Today, every part of the world is practically covered with Global navigation satellite systems (GNSS)such as USA’s NAVSTAR Global Positioning System (GPS), Europe’s Galileo, Russia’s Global’naya Navigatsionnaya Sputnikovaya Sistema (GLONASS), Indian Regional Navigation Satellite System (IRNSS), China’s BeiDou Navigation Satellite System. These satellites not only provide navigation data but also are time-transfer systems. Even a low-cost GPS receiver can provide accurate time information with stability very close to one part in ten to the fourteenth over one day (1ns/day).

Grandmasters can be created with such GNSS/GPS based receivers. A typical, GPS Grandmaster architecture looks like as shown in the below diagram 

GPS Grandmaster - Block Diagram
GPS Grandmaster – Block Diagram

GPS-PPS Synchronization 

Typically, GPS receivers provides the time of the day (ToD) information over a serial interface such as RS232/USB Serial as NMEA text. As this is not sufficient to synchronize, the GPS receivers provide a synchronization mechanism called pulse per second (PPS). This pulse, which has a rising edge synchronized with the GPS second, is of high accuracy, and can be used to discipline local clocks in order to keep them in sync with Universal Time (UT).

With a capable timing system inside the embedded Linux, it is possible to maintain the system clock with in few Nano-seconds of UT. With the system clock synchronized, now it has to be transferred to the slaves via a standard mechanism such as PTP or NTP.

NTP Server 

One of the early and widely used Time Synchronization protocols is NTP – Network Time Protocol. The hierarchical architecture of NTP is divided into strata. Atomic clocks, like those in GNSS satellites, and GPS are examples of stratum 0 devices at the very top. Stratum 1 servers, also known as primary time servers, have a one-on-one direct link with a Stratum 0 clock, can achieve microsecond-level synchronization with Stratum 0 clocks, and can connect to other Stratum 1 servers for quick sanity checks and data backup. For tighter synchronization and increased accuracy, Stratum 2 servers can link to numerous primary time servers. NTP can support up to 15 strata, although each one reduces client synchronization by a little amount compared to Stratum 0. 

Because NTP networks are software-based, all timestamp queries must wait for the local operating system, they have more latency and poorer accuracy. NTP provides a precise enough time resolution for most enterprises to settle conflicts quickly, but those requiring a much higher level of synchronization need to go for more precise PTP. 

PTP Server 

PTP, or Precision Time Protocol, is a network-based time synchronization standard that aims at nanosecond or even picosecond-level synchronization rather than millisecond-level synchronization of NTP. 

Vis a vis NTP’s software-based approach, PTP timestamping is particularly precise because it uses hardware timestamping. 

A total of four messages are exchanged between the master and slave in every PTP sequence:  

  • The master’s first sync message to the slave 
  • A slave to master sync message is sent as a follow-up
  • A message from the slave to the master requesting a postponement
  • The master sends a final delay response message to the slave

There are four different timestamps produced by this sequence: 

  • T1 is the time when the master sends the first sync message
  • T2 is the time when the slave receives the first sync message
  • T3 is when the slave requests a delay 
  • T4 is when the delay request is received by the master 

During the delay response phase, the master delivers all four timestamps to the slave, and the slave can calculate the network latency between the master and slave in both directions. 

IEEE 1588 enabled Ethernet PHY 

As mentioned earlier, PTP needs a dedicated hardware time stamping mechanism. This is possible with single-chip Ethernet Physical Layer Transceiver (PHY) that are provided with IEEE 1588 based timestamping. While these are very similar to conventional Ethernet PHY’s, they have high precision timer that can timestamp transmission/receive packets in pico-second resolution. Some of the 1588 enabled PHY’s includes are Renesas UPD60611, Microchip KSZ8441, TI DP83869HM, Broadcom BCM81384 etc.

Embedded Linux based Grandmaster  

Earlier days, it called for very powerful dedicated system to achieve time synchronization. Nowadays, even low-cost systems have enough power to act as Grandmasters. It is possible to achieve the high precision with some support from hardware such as IEEE1588 based timestamping. There is a plethora of open-source projects addressing the needs and it is quiet easy to create Embedded Linux based Grandmaster systems.

Some of the utilities that can be used are :

Ptp4l 

Ptp4l is an IEEE-compliant implementation of the PTP. It implements both network master and slave clocks. For Grandmaster implementation, the master functionality can be used which will consider system clock as reference clock. Typical output of Ptp4l running as master on eth0 port is as follows: 

ptp4l[1760.714]: port 1 (eth0): LISTENING to MASTER on ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES 

ptp4l[1760.715]: selected local clock 0e8a76.fffe.6b8917 as best master 

ptp4l[1760.715]: port 1 (eth0): assuming the grand master role 

Phc2sys 

Phc2sys is an application that synchronizes the system clock with a PTP hardware clock (PHC). 

PHC follows PTP time in hardware time stamping mode, while the system clock follows UTC time. phc2sys maintains the time difference between these two clocks in nanoseconds 

Ts2PHC 

ts2phc can be used to synchronizes PTP Hardware Clocks (PHC) to external time stamp signals. A single source may be used to distribute time to one or more PHC devices.  

 In addition to above tools, testp2p utility can be used to perform various operations such as driving external signal at PPS, setting/getting PTP time and date etc.,

With the GPS/PPS inputs, it will be possible to synchronize the Realtime clock to globally accepted accurate time. The PTP server can serve this time to other slaves.  

Running phc2sys will produce the following output :

CLOCK_REALTIME phc offset 1635162324159518479 s0 freq +0 delay 160546 

CLOCK_REALTIME phc offset 1635162324159518096 s1 freq –375 delay 160606 

CLOCK_REALTIME phc offset 0 s2 freq -375 delay 160606 

CLOCK_REALTIME phc offset 8 s2 freq -367 delay 160545 

 CLOCK_REALTIME phc offset 12 s2 freq -368 delay 160540 

 CLOCK_REALTIME phc offset -25 s2 freq -365 delay 160541 

As it can be seen the time offset between the PHC and the System clock is shown in the offset information. The System clock is synchronized if the offset is continuously less than 30 ns.  

The offset is kept within a nanosecond range of +/-50 nanoseconds. The clock servo states are indicated by the s0, s1, and s2 strings:  

  • s0- unlocked 
  • s1- clock step 
  • s2- locked 

The clock will not be stepped until the Servo state is locked (s2) (slowly adjusted). The freq value is the frequency adjustment of the clock in parts per billion (ppb). 

NTP servers too, synchronize the NTP client with server time (Stratum 1). 

Other management protocols such as SNMP can be used to monitor various clocks in the system and their characteristics like accuracy, precision, resolution, current synchronization states etc. 

About Embien

Embien Technologies is a high-tech services provider in the embedded systems segment catering to such niche requirements. We have helped customers achieve sub- 30 nanoseconds compliance for PTP and sub-100 microseconds accuracy for NTP with Linux based embedded systems. Our other credentials include GPS anti-jamming system development, MIR-DIAL- Mid-infrared differential absorption LIDAR systems etc.

With our understanding of Android Boot process and profiling of Android Boot Time Analysis, we will now start the actual process of Android boot time optimization.

There is no single solution to improve the boot time applicable to all platforms. The boot process is dependent on each peripheral involved right from DRAM speed, storage device performance, processor speed and also on the customer requirements and kind of application area being used etc. Termed as “Tooth paste effect” by Tim Bird, shortening a process somewhere here will end up lengthening something else over there.

Another important point to remember is that the configuration used for improving must be same as the finally deployment. For example, working on SD Card file system (For ease of programming) to improve the boot time and then finally deploying the result in a NAND file system, will lead nowhere as the data rates of NAND and SD Card will differ.

The following sections describe various techniques module by module for Android boot time optimization.

Bootloader

The boot loader performs a critical initialization that will affect the whole process – DRAM initialization. As the DRAM is going to host the text region and data section, an improper initialization will result in significant performance degradation.

Further the Boot loader copies uboot or Linux kernel image from a non-volatile storage medium such as NAND or SD Card. So the same has to be configured to run in the maximum possible frequency. DMA operations can be used to improve the data transfer through put.

Obviously the processor needs to be run in the highest possible frequency and with I-cache and D-cache enabled.

Unnecessary initializations like initializing display can be removed from the code. But in some applications, it will be required to display images immediately on boot up. For example, it may be required to show a company logo or to show battery status. In this case, it will not be possible to remove the display configuration.

Also the number of boot loader levels should be reduced if possible. This is possible depending on the processor implementation.

Kernel

Using an uncompressed kernel “bImage” rather than a compressed “zImage” will avoid the delay to uncompress it run time. But this will increase the size of kernel image to be copied, there by significantly dependent of on the target system processor and data transfer speeds.

All the unnecessary driver modules should to be removed. A minimal set of device drivers can be built initially and other necessary drivers can be configured as modules and loaded later. A details study has to be made on the device initialization time, time required to copy the increased kernel image (in case of statically linked driver) or load time of a module over the file system.

File systems

Generally a non-removable media is used as the root file system. Sometimes a ramdisk image can be used as a root file system with certain functionalities there by reducing time to start the init scripts. This can be achieved by using uImage. Once again this will increase  the time required to copy it at the boot loader stage.

Also an optimized file system format can be used for the root file system for faster access. It is preferred to have ubifs for NAND and ext4 for SD Cards.

Android

The below sections describes the techniques to improve the functionality inside the Android subsystem.

Init.rc

For a faster processing, avoid starting unnecessary services and daemons. Also since JellyBean, there are lot of changes in file permissions. So the logcat output can be used to analyse any discrepancies in permissions and corrected.

Android preload classes and resources

In ZygoteInit, Andorid loads all the classes and resources that it will need for further execution. The boot time can be reduced significantly by reducing the number of classes and resources to be preload or even avoiding them all.

This is the easiest process and most visible step in the optimization process. The preloading can be disable by just making the variable “PRELOAD_RESOURCES” false (only to disable preloading resources) or commenting the function “preload()” on “frameworks/base/core/java/com/android/internal/os/ZygoteInit.java” (to disable both resources and classes).

But disabling preloading will directly increase in launch time of individual application after boot up. As a compromise, the number of classes to be preloaded can be manually configured in the frameworks/base/preloaded-classes file.

Android System service

By default, the SystemServer starts all android system services. But some of them may not be applicable for the current system. For example, a device may not have a Bluetooth module, thereby has no need for the BlueTooth manager service. Thus a necessity analysis of services and disabling unnecessary ones, will reduce the start time.

Scanning Application Packages

The package manager, up on start up, scans all the available packages in the app directories. Hence the service can be expedited be removing unnecessary applications from the system. It is also possible to delay scanning of packages by custom changes in the package manager implementation and using a customized home application.

Precreate directories

In some systems, certain directives are created in the on-init process. If it is possible, these activities can be done offline once during file system creating there by significantly reducing the time to boot.

Bootanimation

For a faster boot it is better to avoid or use a shorter boot animation. The bootanimation.zip can be tactically placed on the best file system for optimum performance.

Other Techniques

Some other techniques that can be used to speed up boot are:

Storage Device Speed/Processor Execution Speed Tune up

Based on bootchart, the resource requirements of major services can be analyzed and be spread across time to avoid overloading of processor as well as the disk usage.

Disabling All Debug Options

All the debug information like kernel level debugs, debug services like logcat, strace, bootchart etc can be removed on final release. The loglevel in android init.rc can also be reduced.

Proper shutdown

If the system is shutdown improperly (in a non-battery backed up device), the file systems will not be un-mounted properly. So when the system is booted next time, it takes some time to recover the file system errors. So it is necessary to make sure that the system is shutdown properly.

Conclusion

With the above mentioned techniques, it is possible to reduce the boot time of the android system. Once again it is iterated that the effectiveness of each technique is highly dependent on the system. But with some time and a lot of patience, it is possible to achieve significant difference and make the Android boot faster.

Various technological advances are being made in this area. Once of them is being captured in this Linaro page. For any queries or requirements on Android Porting and/or Android Boot Time Optimization, feel free to contact us.

Manikandan J
03. October 2013 · Write a comment · Categories: Android, Linux, Technology · Tags:

On the previous post “Android Boot Process”, various steps involved in booting up the Android is explained. Before we start optimizing the Android boot process, each step has to profiled and analyzed to identify the critical delays. There are various tools and techniques available to understand the resource usage (both time and processing), which will be discussed in the below sections.

As the “Observer effect” notes that the measurements of certain systems cannot be made without affecting the system, using some of these tools will affect the actual performance. For example, using “bootchart” tool increases the system load marginally as it has to write to file system for storing the sampled data. But since we are not dealing with quantum mechanics, we can safely use them for improving the process and then disabling them in the release. Now about the tools.

Kernel Prints with Time stamps

The basic and primary technique is to enable the debug capabilities in the kernel. This option, up on enabled, will output debug prints along with the time stamp information. It can be enabled in the kernel by “make menuconfig” configuration using the following options

Kernel hacking

  • –>Kernel debugging
  • –> stacktrace
  • –> Enable dynamic printk() support
  • –> Kernel low-level debugging functions
  • –> Early printk

During run time, the prints are output in the console.

Logcat with higher log level

Android provides a service started by the “init” process to log various debug information.  The service can be enabled in init.rc file as follows

service logcat /system/bin/logcat -r 1000 -n 10 -v time -f /data/local/logcat.log

It provides options to print the debug options on process and even thread level along with the time stamp. The logcat can also be started from console command line without needing it to be run as a service and the output directly printed on the console.

The verbosity of the log can be increased by configuring a higher number with 8 as the upper limit. So changing the default log level 3 to log level 8 in init.rc file will help us understand some of the internal processes and execution time of the same.

A detailed description of this tool is available in Android Developer page.

strace

Another valuable tool in documenting performance of each process is the “strace” tool. It helps us to trace the system calls executed by the associated process. To enable strace, in the /init.rc file, rather than starting a process directly, “strace” has to be started with the process as the arguments.

For example to document the zygote process, instead of the usual command,

service zygote /system/bin/app_process -Xzygote /system/bin –zygote –start-system-server

following can be used:

service zygote /system/xbin/strace -tt -o/data/boot.strace /system/bin/app_process -Xzygote /system/bin –zygote –start system-server

strace generally outputs a lot of information and can be used in advanced debugging.

BootChart

Bootchart is tool to generate a graphical representation of android boot sequence. The greatest advantage of this tool is that the processor usage and disk usage are logged together and hence give a fairly good understanding of the cause of execution delays. This is internally supported in the Android and  is enabled during compilation, by exporting a environment variable as follows,

‘export INIT_BOOTCHART=true’

Followed by usual Android compilation process.

To instantiate boot chart process, create a file in /data directory with the number of seconds to log the boot information, typically by

‘echo 60 > /data/bootchart-start’

The up on booting the Android, the bootchart logs all the necessary information in the /data/data/bootchart/ directory.

The contents can be transferred to the host PC and by using a “python” based tool called “bootcharting”, the graphical representation can be created. To generate bootchart image by ‘system/core/init/grab-bootchart.sh’

The following is a typical output of a boot chart.

Bootchart - Capturing Android Boot process

As we can see, start time of each process along with their disk usage and execution pattern can be inferred.

Other tools:

There are other tools available for profiling the android processes. Some of them are

Tool

Description

OProfile For Statistical profiling of entire system’s running code
Perf Performance Counters for Linux
Dalvik Method Tracer Profiling application performance graphically
Stop watch Measure boot up time manually

Since Android boot time optimization needs a lot of iterations, it is better to create an ecosystem to easily capture and process these profiling information using scripts and other techniques.

Now armed with detailed profiling information, we will look in to various Optimization techniques in our next post.