Live Review: Accuracy is upgraded to 5 seconds, SSAR original LOAD5S indicators have multi-hard core?

Editor’s note: This article is sorted out from the Long Lizard SIG Technology Week, the author Wenmaoquan, Ali Cloud Computing Platform Division SRE operation and maintenance expert, is a member of the Drain Community Tracking Diagnosis SIG core member. This article takes you to understand the basic functions and use of SSAR, and initially learn SSAR to solve the diagnosis of stand-alone OS issues. Live video playback has been launched to Dragon lizard Community official website: Home – Support – Video, welcome to watch.

I. System Performance Analysis Tool SSAR function positioning

Speaking of performance analysis, they have to mention “The Performance” book, which is the industry milestone classic book. Chapter 4, Chapter 4, Brendan tells us that observations mainly include: Counters, Track, Sampling, and Monitoring.

Depending on the data acquisition mode and data real-time two dimensions, performance observation tools can be categorized: 1) The upper left corner A is a direct reading counter real-time data, TOP, PS these commands We are very common, they Reading the / proc / this directory, the data in this is the kernel to help us account. 2) The upper left corner is also based on the counter, but it is a tool for recording historical information. For example, the most commonly used SAR tool, which we collect this tool for system performance monitoring tools. There is also a TSAR tool in Ali, and there is an open source software ATOP abroad. System Performance Monitoring Tools can retrore historical data on the one hand, and real-time mode data is also available. 3) Tracking the sampling tool in the upper right corner C zone, the trace TRACEPOINT, KPROBE, etc., the PERF tool is mainly the sampling tool, which we collectively referred to as these two types of tools are tracking sampling tools. Currently these tools are only available in real time. If you don’t bind other tools, use them to trace historical data, they are unable to provide. 4) The lower right corner D area, we believe that you can use the tool to use the tool to use the target. The C zone tool is only responsible for obtaining kernel key data, and the B zone tool is only responsible for data acquisition and data acquisition, and the two standard data interfaces are used.

We focus on the construction of 2 quadresses in the upper left B area and the upper right C region in performance analysis engineering practice. Today we are mainly discussed by the construction of system performance monitoring tools in the B area. In the future, I will introduce you to the tool construction of the C area.

In the specific understanding of system performance monitoring and tracking sampling tools, we agree with the following three philosophy: 1) The core counter is low, no additional overhead. In contrast, tracking sampling tools or less have some operation overhead, such as Kprobe’s use may also cause some stability risks. Therefore, we tend to maximize the value of excavating counter information. For example, if you want to know the information directly memory recovery, the kernel counter has provided a lower cost acquisition direct memory recycling and asynchronous memory recycling indicators, it is completely unnecessary to use FTRACE to monitor direct memory recycling. On the other hand, the fine particle size kernel data that cannot be covered by the kernel counter must also rely on kernel tracking sampling tools. For example, when IOPS is high, we want to know the specific file information of each IO read and write, and there is no relevant information in the kernel counter. Only if the tracking sampling tool focuses on its core task, there are more stable and reliable boutique tools. 2) Existing system performance monitoring tools, except singular monitoring tools, there are many white screening surveillance platforms. White screening monitoring platforms are generally collected to the central database and then concentrate. When the Balance Monitoring Platform collects more detailed counter information, it is inevitable that there will be problems with storage costs, and it is not advisable to collect excessive data to the white screening monitoring platform. But at the same time, for some routine indicators, such as CPU, memory, and network usage, the white screening monitoring platform is used, which can be used to greatly improve observable. So high-frequency routine indicators use white screening monitoring platforms, still require a more abundant single-machine system performance monitoring tool to use. At some critical moments, they must also rely on these low-frequency data to analyze and position problems. 3) Traditional black screen system performance monitoring tools have been relatively comparable for many years. In 2010, 2015, 2020 indicators change is not particularly large. For example, in the new version of the kernel counter, there are more than 400 network extensions, 116 only TCP extensions, but actually regular system performance monitors this piece of TCP network indicator, and use no more than 15 indicators, most indicator value did not be excavated. The kernel counter indicators of these networks have practical value when many network-related issues occur.

Based on these concepts, we believe that the development of a system performance monitoring tool with more data indicators, shorter iterative cycle, more stable performance, to address increasingly complex system performance issues.

Today we have to introduce the SSAR tools, which is such a system performance monitoring tool, and is already open source in well-known operating system community. It covers all the features of the traditional system performance monitoring SAR tool, and expands more of the whole machine level indicators, which adds process-level metrics and features of the LOAD indicators. The diagnosis of Load high issues is a unique function of this tool. . Open source software ATOP is also such a similar function system performance monitoring tool. Open channels learned that Friends also deploy ATOP tools in large-scale, this shows that other Internet companies have also experienced the importance of monitoring tools with such a functional positioning.

Second, system performance analysis tool SSAR profile

System Performance Monitoring Tool SSAR Open Source, where is a more detailed Chinese Help Manual, and there is RPM and DEB package in the package directory, and other operating systems can be compiled themselves.

The SSAR tool is divided into a collector, an inner-layer general querger, an outer layer enhanced a querger and a classic querier several. 1) A resident process for collector SRESAR: C language implementation, records data to local disks, collecting data content including: (a) Machine data acquired by file unit, MEMINFO, STAT, VMSTAT, etc .; (b) 24 indicators process level data; (c) unique LOAD5S indicators and detailed R or D-state thread details data; Rule logic parsing file data. 3) Classical querger TSAR2: Python language implementation, package SSAR commands, compatible with TSAR commands; 4) Enhance the querier SSAR +: Python language implementation, package SSAR commands, planning is the main core of the future SSAR tool, suitable for Put more complex data logic in the Python language implementation layer.

Third, the system performance analysis tool SSAR supports rapid development iteration

Traditional SAR tools only collect some fixed indicators, and the simultaneous indicator parses are used to collect data using C language. In this mode, it is not very difficult to expand new indicators, that is, expand new indicator development iterative cycle is particularly long.

SSAR tools completely subvert the architectural design of traditional SAR tools, made a lot of changes in product design and programming, which allows us to increase new counter indicators for fast and low development costs.

1) If we need to pay attention to a new indicator, this indicator is not acquired. For the traditional system performance monitoring tool SAR, the modified release iterative cycle is very long, and it may be several months to issue a gradual grayscale release process in the middle. When learning kernel knowledge or solving production problems, new issues can not work so long. However, SSAR tools are filed as collected units and do not need to modify the code, directly modify the configuration file, and restart the Srerar acquisition process to collect a new data source file. The new acquisition file can add a row configuration item in the File area in the sys.conf file, where SRE_PATH is the data source location, and the cfile is stored in the data file, and TURN is open collection.

2) The SSAR tool is abstracting some general processing logic to the generic querier SSAR command. After the file acquisition is configured, only one SSAR command can query the display data, perfectly realize the development iteration of minute level cycle. Where cfile specifies the storage file name, line specifies that line 3, Column specifies the 5th to 15 lines, and the metric specifies the original value, and the alias specifies the indicator name.

3) SSAR will put more complex data logic to the querier of the outer Python language. This time is easy to adapt to changes in data format in the Python language. The format of some indicators in the kernel will vary with the changes in the kernel version, such as TCP’s TIMEWAITOVERFLOW indicators are different from the number of columns in version of 3.10, 4.9 and 4.19. At this point, you can easily get the column value via the Python language and then passed to the SSAR universal querier. Even in the face of various unknown formats in future kernel versions, we can easily deal with it in the Python language querier. You can debug and upgrade the Python querier at any time, such as CP Tsar2 / TMP/ Whether it is a stand-alone environment DEBUG or a script batch, you can perform a lightweight operation.

Fourth, SSAR Tool Machine Indicators Use Introduction

The SSAR command displays the whole machine indicator information, where the end is the incremental indicator, indicating an increase in incremental value per second in the current time and the previous time. The scale indicator is endless, only the instantaneous value of the current time.

Use the HELP option to get the machine indicator usage help information. The meaning of each option parameter is as follows, the option parameters have some features:

1) -f option Specifies the end time point of the display data, the -b option specifies the start time point of the display data, the -R option specifies the length of time interval length of the display data, and three options simply specify 2. -f option defaults to the current time, the -r option default value of 300 minutes. 2) Most of the option parameter values ??can support input decimals, and the unit supports days, time, minute, seconds, such as 1.2d, 5.5H, 60M, and 1S. If there is no unit suffix, the default unit suffix is ??M minutes, such as 60 mm. 3) -h option is used to hide Header information, so that the output is more convenient for the analysis of the various shell commands, and the API outputs JSON format data, making the output result more convenient for advanced languages ??such as Python scripts. 4) Small o options are used to specify the field information of the output data column, and multiple column indicators are separated by commas. For multiple field outputs commonly used, field combination options, such as –cpu, – MEM. The big O option is used to add output indicator information based on the existing field combination output. Therefore, the large O option can be used simultaneously with -CPU -mem, etc., while small O options and large O options and other fields are mutually exclusive. The acquisition of SSAR to the whole machine indicator is filed by document, through TOML format profile /etc/ssar/sys.conf configuration and switch file acquisition: 1) src_path = ‘/ proc / stat’ represents the acquisition data source file location To / proc / stat; 2) cfile = ‘stat’ indicates that the saved data file suffix is ??stat; 3) TURN option controls whether the current acquisition is turned on, set to true to turn on the acquisition; 4) Gzip option controls whether the acquisition file is compressed Format storage, set to true to turn on the compression;

SSAR supports two data extraction methods, predefined methods, and custom methods.

SSAR predefined indicator extract data is suitable for the scene of direct consumption data using the SSAR command. The predefined indicator can be flexible in the sys.conf file, the following 2 examples illustrate how to configure a predefined indicator:

1) Configuration Item User Represents the Name User, the data file suffix is ??STAT, take the difference between the second column of the line of the CPU, and the output field width is 10 bytes. 2) Configuration Item INSEGS indicates that the indicator name is INSEGS, the data file suffix is ??SNMP, takes the difference in data of the 11th column of the 8th line, and the output field width is 10 bytes.

After configuring a predefined indicator, you can further configure the View view, and the aggregate predefined indicator is in one view. This is the source of the SSAR -CPU command – CPU.

SSAR also supports custom indicators to extract data indicators, custom indicators extract data, suitable for use in the Python language package. The following is some examples of the use of some custom indicators: 1) Take the memfree value in MemInfo, the field name is named-o’Metric = C | cfile = memin | line_begin = Memfree: | Column = 2 | Alias ??= free’2) Take the difference between the 8th line of the 8th line of SNMP SSAR-O’METRIC = D: CFILE = SNMP: line = 8: Column = 13: Alias ??= retranssegs’

3) Display the real-time mode data of the cpu0 to the cpu15

SSAR-O’METRIC = D | cfile = stat | line = 2-17 | column = 5 | alias = idle_ {line}; ‘- f + 100

Customize how to use:

1) CFILE to specify the suffix name in the data file, need to be consistent with the value of the CFILE in the [File] section in the sys.conf configuration file; 2) Line directly specifies the number of rows where the indicator is located, Line and Line_Begin cannot specify at the same time ; 3) Line_begin Specifies the line of lines in the row in which the indicator matches the key string, and it is necessary to ensure that the entire file is unique; 4) Column specifies the number of columns in the specific line, column is separated by space; 5) Metric Specifies that the value output is obtained by the above rule, or the difference output of adjacent time is taken, the value is c represents the original value output, and the difference is used to take the difference value output; 6) ALIAS is used to specify the title of the current indicator output. Key value in the name or JSON format.

TSAR Tools is a classic system performance monitoring tool for Ali Group. Based on the SSAR command, the TSAR2 command using the Python language package is almost fully compatible with the TSAR command. The indicator collection of each module: TSAR2 command is not only very powerful, but also the cost of development costs: 1) TSAR2 development cycle is only less than 2 weeks; 2) TSAR2 command is compatible with Python code in the TSAR functionality, only 600 lines; 3) TSAR2 has also added 4 sets of network diagnostic indicators on the basis of the original basic function, and does not consider the pre-pre-research time, and the code implementation time of 4 sets of indicators is only used for 2 hours. 4 groups of network diagnostic indicators: TSAR2 – TCPOFO, TSAR2 – Retran, Tsar2 –Tcpdrop, Tsar2 –TCPERR.

Based on SSAR’s good scalability and low expansion development threshold, it is common for uniform problems for soft interrupt distribution. It uses Python language to implement 3 sets of powerful diagnostic functions:

1) First, the CPUTOP subcommand of TSAR2 can set out the highest nuclear row of each nuclear unclear CPU usage (SIRQ). N1 represents the highest CPU information, N2 represents the CPU information high of the ranking sequence. For example, the three values ??of N1 at the time of 11:35 indicate that the SIRQ soft interrupt usage rate of the 54 nuclear CPU is 21.84% of the 54th nuclear CPU resource. From the part of the data displayed in the figure, see the 54-based nuclear frequent soft interrupt SIRQ uses a higher case.

2) If we are not sure that the 54-Nuclear CPU is accurate, it is also possible to specify the change of the SIRQ of the CPU54 core through the CPU view of the TSAR2 command.

3) Once the CPU core number of the soft interrupt CPU usage exception can be determined, the specific soft interrupt information of the problem can be found through the IRQTOP subcommand of TSAR2. Specify the No. 54 CPU in the IRQTOP subcommand, and sort the number of IRQ numbers in the soft interrupt is 155, the corresponding IRQ name is Virtio3-INPUT.1, and the number of soft interruptions at 11:50:48 seconds is as high as 1.9K .

The IRQTOP subcommand only supports real-time mode by default. If you need to open the history mode, you can remove the -l option, and you also need to modify the configuration file sys.conf to open the acquisition of the Interrupts data file.

Such a strong CPUTOP and IRQTOP sub-order are also in the Python language, and it is easy to implement through 400 lines of code. It also shows the flexible advantage of developing new system performance monitoring functions under SSAR architecture.

V. Use of SSAR Tools Process Index

SSAR’s procs subcommands can display multi-process information, which is equivalent to the output of the Linux PS command that can display any historical time.

SSAR’s procs subduction options can refer to the SSAR procs -h command, and the option parameters have the following features:

1) -f, -b and -r options, like only 2 options, the -f option defaults to the current time, the -r option default value for 5 minutes. In the case of a multi-process sub-command, only 2 time data is taken at the beginning and end time, respectively. The instantaneous scale indicator only displays the end time value. 2) Powerful field sorting function supports multi-field sequencing, first sorted in the first field, the same value is sorted by the second field. The order of the sort field indicates the sort order, and the ordering field before the order is sorted. Each field has a system’s internal order rule, usually sorted in descending order, such as RSS, attribute indicators, sorted in ascending order, such as PID. 3) After all processes are sorted, the -l option limits the number of rows of output processes. 4) Two characteristics of the indicator combination – Job and –sched for process groups and scheduling issues.

SSAR’s proc subcommand display single process portrait historical time information. 1) The -p option is a required parameter, which is used to specify the PID information of the process you need to display. 2) -f option Specifies the end time point, the -b option specifies the start pointing point, the -r option specifies the time span, and the three options simply specify 2. -f default value is the current time, the -r default value of 300 minutes. 3) -H Option Hide Header Information More Easy to SHELL Script Resolution, – API Option Output JSON Format Data is more convenient for advanced languages ??such as Python scripts. 4) Small O options are used to specify the fields of the output data column, and multiple column indicators are separated by commas. For high-frequency common multi-fields, the field combination options are also provided, such as –cpu, – MEM. The large O option is used to add output field indicators based on both field combination output. Therefore, the large O option can be used simultaneously with -CPU -mem, etc., while small O options and large O options and other fields are mutually exclusive. 5) Left brackets indicates that 7:22 The process has ended. This feature can help determine the beginning and end time range of a particular process. By a simple example, how to fill the number of threads through the SSAR process class indicator diagnostic line. The Linux kernel has a parameter kernel.pid_max = 131072, this parameter sets the maximum number of machines of the machine. 1) A machine has an exception process in a non-working period to create a large number of threads to explode threads. At 3 o’clock in the morning, the number of machine threads soared to 131.1k, and the duration is extremely short, and 3 o’clock 1 point has been recovered. 2) Under the traditional conditions, the occurrence of such a scene is simply not waiting for manual login to grab the field information. Nowadays, there is an automatic collection of history information, we can wait until working hours, using SSAR’s multi-process subcompellent subsequent acquisition reason. NLWP (NUMBER OF LIGHT Weight Process) represents a process of threads, using the -k options to sort the NLWP fields can be found that the PID 1045 is a Java process that causes the thread.

3) Do not worry-free classmates, can also add this problem time 2021-03-30 T03: 00: 00: 131024, confirm that the total number of machine total threads is determined 131.1K.

Sixth, SSAR Tools LOAD Indicators Use Introduction

The LOAD1 indicator in the traditional system performance monitoring tool although more accurate than the LOAD5 and LOAD15 indicators, it is still unable to meet the accuracy of the time range when the troubleshooting problem is not met. SSAR has an original load5S indicator at home and abroad, which allows us to increase the accuracy of LOAD to the accuracy of 5 seconds. The accuracy of the LOAD5S indicator is not only reflected in the acquisition frequency, and the LOAD5S indicator is the number of threads of R + D and the global variable Active value in the kernel data structure. In order to accurately understand the Linux Load and Load5s indicators, perform the following experimental operations:

Find an experimental machine, first compile a program uninterruptible that can simulate the D-state thread. Then start 100 seconds with the stress command to exit after 30 seconds. About 20 seconds, re-batch cycles, start 1000 Uninterruptible processes. After the execution is completed, the effect is as shown in the following figure.

1) The green time area, 5 minutes and 52 seconds, LOAD5S and LOAD1 are in a low water level, and there is no doubt that the machine load pressure is low.

2) The first red time area is accompanied by the stress command, LOAD5S and LOAD1 have risen at the same time, 6 minutes and 07 seconds, the LOAD5S value has reached 78, and LOAD1 starts to rise to 6.27, but this LOAD1 is far away. It is not possible to reflect the situation of threads running concurrently on the current machine. Over time, 6 minutes 32 seconds this moment, LOAD1’s value slowly rose to 39.22.3) The first blue time area, with the stress command process exit, Load5s has quickly fell back to a low water level, 6 points 37 seconds this moment’s LOAD5S value also rapidly reduces a very low value 0, the R + D state thread of the machine has almost no, the system does not have any pressure. But at this time, LOAD1 remains at a high value of 36.08. It will be apparent that the traditional LOAD1 indicator is still lagging after the system load pressure disappears.

The same phenomenon can also be observed in 2 red areas and 2 blue areas. The common law is the beginning of the red area, and the LOAD5S value representing the number of R + D state threads is significantly high, but the LOAD1 at the same time is still low. The blue area begins, the LOAD5S value of the number of threads represents the R + D state has been 0, but the same time LOAD1 is still higher. The above experiment is sufficient to explain that the LOAD5S is more accurately reflected in the system load pressure, and the load is not accurately used by the LOAD1 value to determine the load. So we need to use the LOAD5S index to replace the LOAD1 indicator to accurately determine the time range of the machine load. Here, it emphasizes that the LOAD5S indicator is completely cleverly acquired through engineering method, without any dependence on the kernel module.

In addition to the LOAD5S indicator, in the above solution, a set of indicators is also provided for comprehensive evaluation system load. Where the LOAD5S is the sum of the threads of the R + D, RunQ is the number of R state threads at the time. Threads is the total number of threads, so the Threads value is the ceiling of the LOAD5S value, the Threads maximum is limited by the kernel parameter setting limit.

The SSAR tool also triggers the acquisition of the LOAD details according to the size ratio of the LOAD5S and CPU core. The front of the ACTR is the number of concurrent R status threads. ACTD is the number of concurrent D-state threads, ACT is ACTR And ACTD sum. When we need to understand the detailed factors composed of LOAD, you can further display the details information of ACTR and ACTD with the LOAD2P subcommand.

The acquisition time that did not trigger the logs collection collection, the ACT value does not exist, displayed as a short line. You can filter the time at which the ACT exists in the output of the LOAD5S subcommand output, that is, SSAR LOAD5S -Z.

Then use the load2p subcommand to display more detailed Load details information, and the option -c specifies the time value of the Load details you need to display. Load2P subtitions can output 6 views of information, LoadRD, Stack, Loadr, Loadd, PSR, and StackInfo.

1) LOADRD view Displays thread information for all R and D status at the specified time, each thread includes thread state, thread ID, process ID, CPU core number, thread scheduling priority, and command name. SSAR only collects thread information of the R and D state.

2) The Stack view shows the D-state call stack after the sample, up to 100 samples. Each D-State Thread Model Stack contains thread ID, process ID, process name, stack top function position, stack top function name, and complete call stack.

3) Based on the information of the above two views, the LOAD2P subcommand combines four view information, and the four aggregation view Loadr, LoadD, PSR, and StackInfo are displayed by default. The LoadR view is appropriate to diagnose LOADR’s high-time process-level factors.

4) LoadD view Press PID to aggregate D-state thread information, which is suitable for the process level factor for the diagnosis of LOADD.

5) The PSR view is suitable for diagnostic binding.

6) The StackInfo view is based on the call, and the reason for the diagnosis of the D status thread is particularly useful when the LOADD is high.

7. SSAR tool profile description

The main profile of the SSAR tool is /etc/ssar/ssar.conf, which is divided into [main], [load], and [proc].

1) [main] Some configuration options are primarily used to set some of the options for the overall SSAR tool, [LOAD] and [PROC] corresponding to the option of the LOAD information acquisition and process information collection section, the configuration option of the whole machine information is introduced in the foregoing The /etc/ssar/sys.conf file is set independently.

2) The DURATION_THRESHOLD option setting is up to 168 hours. Inode_Use_threshold, disk_use_threshold, and disk_available_threshold When any condition is not satisfied, stop the data collection, and began to delete data from time old to the latest step-by-step data, to try to set up the conditions not established, and have been deleted to only The latest hour data directory. This disk space processing logic for the SSAR tool can say that no additional disk space is used. 3) LOAD5S_FLAG, PROC_FLAG, and SYS_FLAG respectively control the acquisition of the collector’s three parts, the embedded system can also close the load5s_flag and proc_flag and then configure Sys.conf, so that the data source that only collects only. 4) Scatter_second options are used to disperse the acquisition time of each host in a large scale cluster. 5) LOAD5S_THRESHOLD Settings LOAD Details Collect the trigger threshold, and the server of different roles This threshold needs to be personalized according to their respective characteristics. Eight, SSAR Tools CPU Usage Integrated Analysis

There are 2% CPU in the Linux Top command, one is at the head, and is additionally in each process information. With regard to the relationship between the two, we will briefly introduce the use of the SSAR tool.

In order to accurately understand the Linux CPU Usage Indicator, the following experiment is performed on a 4-core machine. The terminal executes the command stress -t 120-c 3, and the execution time is 23:20 seconds, and the end will be ended after 120 seconds. The B terminal performs the TOP command at the same time, as shown.

Understand the CPU Usage Indicators:

1) On the terminal, wait for the Stress to perform the end for 2 minutes, execute the TSAR2 command. 23:25 The User value is 75.60, and its meaning represents the average USER CPU within the 60 seconds of 23:24 to 23:00, which is 75.60%. Here, it is seen that the User value of TSAR2 is equivalent to the US value 75.2 of the TOP command. The difference is that TSAR2 is a 60 second average, TOP is the average of 3 seconds before running, but because the three stress commands run smoothly, TOP The 3 second average is basically representing a mean of 60 seconds.

2) Continue the SSAR command on the A terminal, and the USER / S value of 23:25 is 301.35. The CPU value of SSAR is taken from / proc / stat, and the unit is the number of kernel Tick. The x86_64 system is 100 Ticks per second, namely 100 Hz. The total number of 4 CPUs is 400 Ticks per second. .

3) We can view the Python source code of TSAR2, and you can learn that the User value of TSAR2 is the percentage of the SSAR’s USER value for all CPU usage.

4) Continue to execute the SSAR Procs subcommand on the A terminal. The PUCPU value of three Stress processes in 23:25 is 100.0. The meaning of each 100.0 here is that the average user space CPU usage rate of this process is from 24 minutes to 23:00 am to 23:00. The calculation process is divided by the CPU time sheet of the process, and then multiplied by 100, i.e., percentage 100. This algorithm is consistent with the% CPU at the lower half of the TOP command.

5) Here we can see such a data relationship 301.25 ¡Ö 100+ 100+ 100+ 3.3, but the machine USER CPU value 301.35 is a natural time multiplied 100Hz, and the process grade CPU is multiplied by 100. With the SSAR – CPU and SSAR Procs – CPU, you can already associate the entire CPU usage of the whole machine and the process level CPU usage.

6) The relationship between the US value and the% CPU in the process information in the last TOP command will naturally establish it. A little difference is that in TOP is to further calculate the overall CPU usage once a percentage ratio.

Nine, SSAR tool memory recycling case

In various scenarios high of LOAD, there is a number of LOADs with a number of R state threads that are high due to the high SYS CPU usage. SSAR comprehensive indicator system, a thorough presence of this scene process from multiple angles. In order to accurately explain the problem, it is necessary to review the related concepts of the lower core memory recycling. As shown, when the entire machine Free memory is lower than the yellow line LOW threshold, the asynchronous memory recovery thread KSWAPD of the kernel begins to be awakened, and the KSWAPD will reclaim memory while the other process is applied for memory. When the entire machine Free memory has passed the red line MIN threshold, trigger the direct memory recovery, all memory applications from the user space will be blocked, the thread state is converted to a D state. At this point, only memory applications from the kernel space can continue to use the free memory below the MIN value. Subsequent machine Free memory is gradually restored to the Green Line HIGH threshold, the KSWAPD thread stops memory recycling. The following is based on SSAR’s data indicators. Step by step, ho