Skip to content

This is not a usual PV/UV log analyse tool, but another perspective of web log. It provide fine grained(the minimum is every minute) trouble shooting and performance optimization in a specified time period, based on it's flexible and various summaries.

License

jkklee/abs-log

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

中文

This is not a usual PV/UV log analyse tool, but another perspective of web log. It provide fine grained(the minimum is every minute) trouble shooting and performance optimization in a specified time period, based on it's flexible and various summaries.

Dependencies

  • Python 3.4+
  • pymongo-3.7.2+
  • MongoDB-server 3.4+

First, let's make clear on a couple of terms
uri:current URI in request(without the args/query_strings);
request_uri:full original request URI (with arguments);
args:arguments in the request line.(refer to nginx)
uri_abs and args_absRefers to the string after the abstract processing of uri and args(In order to classify).
Eg:"/sub/0/100414/4070?channel=ios&version=1.4.5" will convert to uri_abs:"/sub/*/*/*",args_abs:"channel=*&version=*" after the 'abstract processing'

Functions

  1. Provide a unified log analysis portal via which you can view the summary analysis of logs generated by the site on all servers. Also you can filter by time period and server name.
  2. Support for three main classify of analyse: request_uriip and response_code. And each is based on three aspect: hits, response_size, response_time. In addition, different sub-items have their own characteristics.
  3. request_uri analysis can directly show which kinds of requests are in large quantity, which kinds of requests are time-consuming and which kinds of requests take up the traffic. In addition, it can show the distribution changes of each index with time in different time granularity (minute, ten_min, hour, day) of a certain kind of request. Also the distribution of different args_abs of a special a uri_abs.
  4. IP analysis divides all requests into three sources (from_cdn/proxy, from_reverse_proxy, from_client_directly). Each of the three sources displays the IP address of the top N of requests. And can show the time distribution of each index of a certain IP requests; The distribution of different uri_abs indexes generated by a specific IP can also be analyzed.

Features

  1. Core idea: the request_uri is abstractly categorized. The variable parts are represented as "*", so that the invariant parts can represent a specific type of request. It's actually a different way of looking at logs, The "minimum analysis unit" changed from "a specific of log line" to "a certain type of request".
  2. Support both Plaintext and json log format.
  3. Easy to configure: no regex, simply copy the log_format defined in nginx into the config file.
  4. A more accurate description of response time and response size is achieved through the four-quantile concept. In most cases, because of the extremum, the arithmetic mean doesn't represent the distribution of these values very well.
  5. Support for custom abstraction rules with the flexibility to specify whether and how certain parts of a request should be abstracted.
  6. Since the log_analyse script is run regularly on the web server (like distributed), the efficiency of the log_analyse is very important. In my testing case (disk: 3*7200rpm RAID5, gigabit LAN), the processing speed was between 20000-30000 lines/s.

About implementation:

log_analyse.py deployed on the web server and run through the crontab. Use re module in python to parse every line, get the value of uri, args,time_local, status, body_bytes_sent, request_time, server_name... Then process these values to generate a intermediate result which stored in MongoDB.

log_show.py serves as the entry point of all final analyse/view, take a further process with the intermediate result.

Handling of abnormal requests
The main problem with separating fields by 'spaces' or 'double quotation' is that you can't take all exceptions into account when dealing with abnormal records (for a variety of reasons, and in a variety of styles), so the project use re module rather than simply split() function. In this project, the "tolerable" abnormal records are processed by some judgment logic in the code; as for the "intolerable" abnormal records, an empty string is returned and the log line is recorded in the file.

log_show.py usage:

[ljk@demo ~]$ log_show --help
Usage:
  log_show <site_name> [options] request [distribution [<request>]|detail <uri>]
  log_show <site_name> [options] ip [distribution <ip>|detail <ip>]
  log_show <site_name> [options] error [distribution <error_code>|detail <error_code>]

Options:
  -h --help                   Show this screen.
  -f --from <start_time>      Start time. Format: %y%m%d[%H[%M]], %H and %M is optional
  -t --to <end_time>          End time. Format is same as --from
  -l --limit <num>            Number of lines in output, 0 means no limit. [default: 5]
  -s --server <server>        Web server hostname
  -g --group_by <group_by>    Group by every minute, every ten minutes, every hour or every day,
                              valid values: "minute", "ten_min", "hour", "day". [default: hour]

  distribution                Show distribution(about hits,bytes,time,etc) of:
                              all or specific 'request', the specific 'ip', the specific 'error_code' in every period.
                              Period is specific by --group_by
  detail                      Show details of:
                              detail 'args' analyse of the specific 'uri'(if it has args);
                              detail 'uri' analyse of the specific 'ip' or 'error_code'

  Notice: it's best to put 'request_uri', 'uri' and 'ip' in quotation marks.

all subcommand support the -f-t-s options, which can filter by start/end time and single server.

request subcommand:

Analyze the data of a given web site that is in the mongodb

[ljk@demo ~]$ log_show api request -l 3
====================
Total_hits:999205 invalid_hits:581
====================
      hits  percent      time_distribution(s)         bytes_distribution(B)            uri_abs
    430210   43.06%  %50<0.03 %75<0.06 %100<2.82   %50<61 %75<63 %100<155              /api/record/getR
    183367   18.35%  %50<0.03 %75<0.06 %100<1.73   %50<196 %75<221 %100<344            /api/getR/com/*/*/*
    102299   10.24%  %50<0.02 %75<0.05 %100<1.77   %50<3862 %75<3982 %100<4512         /view/*/*/*/*.js
====================
Total_bytes:1.91 GB
====================
     bytes  percent      time_distribution(s)         bytes_distribution(B)            uri_abs
   1.23 GB   64.61%  %50<0.04 %75<0.1 %100<1.96    %50<17296 %75<31054 %100<691666     /api/NewCom/list
 319.05 MB   16.32%  %50<0.02 %75<0.05 %100<1.77   %50<3862 %75<3982 %100<4512         /view/*/*/*/*.js
 167.12 MB    8.55%  %50<0.19 %75<0.55 %100<2.93   %50<3078 %75<3213 %100<11327        /api/getR/com/*/*
====================
Total_time:117048s
====================
 cum. time  percent      time_distribution(s)         bytes_distribution(B)            uri_abs
     38747   33.10%  %50<0.03 %75<0.06 %100<2.82   %50<61 %75<63 %100<155              /api/record/getR
     22092   18.87%  %50<0.03 %75<0.06 %100<1.73   %50<196 %75<221 %100<344            /api/getR/com/*/*/*
     17959   15.34%  %50<0.19 %75<0.55 %100<2.93   %50<3078 %75<3213 %100<11327        /api/getRInfo/com/*/*

Through the above example, the ranking of three dimensions of "hits/bytes/time" and the distribution of response time and response size within the specified time (from 00:00:00 to the current time of the day by default) can be observed. If a uri_abs that has fewer hits but return larger bytes or takes more time, then we should paying more attention on that uri_abs.

ip subcommand:

Display IP based analysis results

[ljk@demo ~]$ log_show.py api ip -l 2
====================
From_cdn/Proxy:              hits  hits(%)       bytes  bytes(%)  time(%)
====================       199870    99.94   570.51 MB    99.99    99.99
          Last_cdn_ip
       xxx.57.xxx.189        1914     0.96   696.18 KB     0.12     0.68
      xxx.206.xxx.154        1741     0.87     1.56 MB     0.27     0.98
      User_ip_via_cdn
       xxx.249.xxx.56         787     0.39   154.82 KB     0.03     0.23
        xxx.60.xxx.86         183     0.09     1.05 MB     0.18     0.13
====================
From_reverse_proxy:          hits  hits(%)       bytes  bytes(%)  time(%)
====================           66     0.03    68.83 KB     0.01     0.01
    User_ip_via_proxy
       xxx.188.xxx.21           2     0.00     1.53 KB     0.00     0.00
          xxx.5.xxx.4           2     0.00    324.00 B     0.00     0.00
====================
From_client_directly:        hits  hits(%)       bytes  bytes(%)  time(%)
====================           64     0.03     8.32 KB     0.00     0.00
          Remote_addr
        192.168.1.202          29     0.01     58.00 B     0.00     0.00
        192.168.1.200          29     0.01     58.00 B     0.00     0.00

The idea of IP analysis is to classify requests into three categories by source: "From_cdn/Proxy", "From_reverse_proxy", "From_client_directly". Then sort the IP addresses by number of requests within each category

distribution subcommand:

  1. Aggregate statistics for "all requests" or "specified uri/request_uri" at the granularity of "minutes/ten minutes/hours/days".
  2. Aggregate statistics for "specified IP" by "minute/ten minute/hour/day" granularity.
# Example 1: analyze the distribution of the specified request, specify the grouping aggregation by minute, and display 5 rows by default

[ljk@demo ~]$ python log_show.py api request distribution "/view/*/*.json" -g minute                
====================
uri_abs: /view/*/*.json
Total_hits: 17130    Total_bytes: 23.92 MB
====================
    minute        hits  hits(%)       bytes  bytes(%)      time_distribution(s)          bytes_distribution(B)            
201803091654      1543    9.01%     2.15 MB     8.98%  %50<0.03 %75<0.05 %100<1.07   %50<1593 %75<1645 %100<1982        
201803091655      1527    8.91%     2.13 MB     8.88%  %50<0.04 %75<0.05 %100<1.04   %50<1592 %75<1642 %100<2143        
201803091656      1464    8.55%     2.05 MB     8.57%  %50<0.04 %75<0.05 %100<1.03   %50<1592 %75<1642 %100<1952        
201803091657      1551    9.05%     2.15 MB     8.97%  %50<0.03 %75<0.04 %100<0.89   %50<1594 %75<1639 %100<1977        
201803091658      1458    8.51%     2.06 MB     8.61%  %50<0.03 %75<0.04 %100<2.35   %50<1596 %75<1644 %100<2146

Through the above example, the distribution of "/view//.json" in the specified time period can be shown, including hits/bytes/time and the proportion of each indicator in each granularity relative to the total amount. This subcommand also shows the "trend" of each indicator over time.

Note:
The minute field is the aggregate granularity specified,1803091654 means "18-03-09 16:54"
The -g parameter specifies the polymerization of granularity (minute/ten_min/hour/day)
distribution subcommands can use with specific "uri/request_uri" (display how the uri/request_uri distribute over time with a specify granularity) , or not with "uri" (show all request's distribution over time with a specify granularity)

# Example 2: analyze the number of requests/bandwidth generated by the specified IP over time. 
# The default aggregation granularity is hour
[ljk@demo ~]$ python log_show.py api ip -t 180314 distribution "140.206.109.174" -l 0
====================
IP: 140.206.109.174
Total_hits: 10999    Total_bytes: 4.83 MB
====================
      hour        hits  hits(%)       bytes  bytes(%)
2018031306        1273   11.57%   765.40 KB    15.47%
2018031307        2133   19.39%  1004.74 KB    20.31%
2018031308        2211   20.10%     1.00 MB    20.74%
2018031309        2334   21.22%     1.05 MB    21.72%
2018031310        2421   22.01%   850.79 KB    17.20%
2018031311         627    5.70%   226.30 KB     4.57%

Note:
-l 0 means not limit the output lines(output all the results)

detail subcommand:

  1. Analyze a uri in detail to see the distribution of different args
  2. Analyze a specific IP in detail to see how the requests it generates are distributed across different uri_abs

Applicable scenarios: for example, if uri_abs of a certain type is found to be abnormal in some aspect (hits/bytes/time), then the detail subcommand can be used to further analyze such uri_abs and precisely locate the exception caused by which args_abs. Or if you observe an IP access exception, you can drill down to see if the IP is generic or just interested in some uri.

# Example 1:
[ljk@demo ~]$ python log_show.py api -f 180201 request detail "/recommend/update" -l 3
====================
uri_abs: /recommend/batchUpdate
Total_hits: 10069    Total_bytes: 7.62 MB
====================
    hits  hits(%)      bytes  bytes(%)  time(%)      time_distribution(s)          bytes_distribution(B)      args_abs
    4568   45.37%    3.46 MB    45.44%   47.96%  %50<0.06 %75<0.07 %100<0.47   %50<795 %75<845 %100<1484      uid=*&category_id=*&channel=*&version=*
    4333   43.03%    3.25 MB    42.64%   42.30%  %50<0.05 %75<0.07 %100<0.48   %50<791 %75<840 %100<1447      category_id=*&channel=*&uid=*&version=*
     389    3.86%  314.15 KB     4.03%    0.88%  %50<0.03 %75<0.04 %100<0.06   %50<802 %75<850 %100<1203      category_id=*&channel=*&version=*

From the above example, you can observe that the uri "/recommend/update" corresponds to different parameters for each metric. Another incidental finding is that the same parameter combination is not written in the same order in the development of writing parameters. Although it does not affect the function, it will cause some troubles in the accurate application performance monitoring.

Note:
detailSubcommand followed by uri (no arguments, arguments are ignored if they are included)

# Example 2: Analyse how many requests are generated by an IP and the metrics for each (hits/bytes/time)
[ljk@demo ~]$ python log_show.py m -t 180314 ip detail "1.2.3.4"
====================
IP: 140.206.109.174
Total_hits: 10999    Total_bytes: 4.83 MB
====================
    hits  hits(%)      bytes  bytes(%)  time(%)  uri_abs
   10536   95.79%  405.47 KB     8.19%   92.01%  /introduction/watch
     147    1.34%    1.90 MB    39.31%    1.93%  /view/*/*.html
     138    1.25%  407.42 KB     8.23%    2.41%  /chapinfo/*/*.html
      42    0.38%  644.88 KB    13.03%    1.38%  /info/*.html
      30    0.27%  229.98 KB     4.65%    1.14%  /classify/*.json

Deployment:

The script is designed to be placed in the crontab on the web server, and get the final result using log_show.py.

*/15 * * * * export LANG=zh_CN.UTF-8;python3 /home/ljk/log_analyse.py &> /tmp/log_analyse.log

Note

  1. uri_abs and args_abs is the abstraction of the uri and args, default rule as follows:
    uri:the request_uri is separated into several fields by "/" and ".", and abstracted as a "*" if a field is made up entirely of numbers.
    args:replace every value with a "*".
  2. There are a few other interesting functions in the common/common.py

About

This is not a usual PV/UV log analyse tool, but another perspective of web log. It provide fine grained(the minimum is every minute) trouble shooting and performance optimization in a specified time period, based on it's flexible and various summaries.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages