首頁>Program>source

我有一些複雜的日志檔案,需要編寫一些工具来處理它们.我一直在玩awk,但不確定awk是否適合此工具。

我的日志檔案是OSPF協議解碼的print輸出,其中包含各種協議pkts的文字日志及其內容以及用值標識的各種協議欄位.我想處理這些檔案並仅print出与特定pkts有關的日志的某些行.每个pkt日志可以包含该pkt條目的不同行數。

awk似乎能够處理与模式匹配的一行.我可以找到所需的pkt,但是接下来我需要在後面的行中匹配模式,以確定它是否是我要print的pkt.

查看此問题的另一種方法是,我想在日志檔案中隔离几行,並根据几行上的模式匹配来print出這些行,這些行是特定pkt的详细資訊.

由於awk似乎是基於行的,因此我不確定這是否是最好的工具。

如果awk可以做到這一點,怎麼做? 如果没有,關於使用哪種工具的任何建議?

最新回復
  • 5月前
    1 #

    Awk可以轻松檢測模式的多行組合,但是您需要建立一个稱為狀態機的工具.在您的代碼中識別序列。

    考虑此輸入:

    how
    second half #1
    now
    first half
    second half #2
    brown
    second half #3
    cow
    

    您已经看到,很容易識別單个模式.現在,我们可以編寫一个awk程式,该程式仅在上半部行之前直接識別下半部. (使用更複雜的狀態機,您可以檢測到任意序列的模式。)

    /second half/ {
      if(lastLine == "first half") {
        print
      }
    }
    { lastLine = $0 }
    

    如果執行此命令,則会看到:

    second half #2
    

    現在,這个示例非常簡單,而且几乎没有狀態機.有趣的狀態仅持續到if語句的持續時間,而前面的狀態是隱式的,具體取決於lastLine的值。在更規範的狀態機中,您應保留一个 顯式狀態變數和狀態到狀態的轉換,這取決於現有狀態和当前輸入.但是您可能不需要那麼多控製機製。

  • 5月前
    2 #

    Awk實際上是基於記錄的.預設情况下,它会將一行视為一條記錄,但是您可以使用RS(記錄分隔符)變數對其进行更改。

    解決此問题的一種方法是使用sed进行第一遍(如果愿意,也可以使用awk进行此操作),以使用換頁符等不同字元分隔記錄.然後,您可以編寫awk指令碼,在该指令碼中將一組行视為一條記錄。

    例如,如果這是您的資料:

    animal 0
    name: joe
    type: dog
    animal 1
    name: bill
    type: cat
    animal 2
    name: ed
    type: cat
    

    要使用換頁分隔記錄:

    $ cat data | sed $'s|^\(animal.*\)|\f\\1|'
    

    現在,我们將其接受並通過awk.這是有條件地print記錄的示例:

    $ cat data | sed $'s|^\(animal.*\)|\f\\1|' | awk '
          BEGIN { RS="\f" }                                     
          /type: cat/ { print }'
    

    輸出:

    animal 1
    name: bill
    type: cat
    animal 2
    name: ed
    type: cat
    

    編輯:作為奖励,這是使用awk-wardRuby(-014表示使用換頁(八进製代碼014)作為記錄分隔符)的方法:

    $ cat data | sed $'s|^\(animal.*\)|\f\\1|' |
          ruby -014 -ne 'print if /type: cat/'
    

  • 5月前
    3 #

    awk可以从開始模式到結束模式进行處理

    /start-pattern/,/end-pattern/ {
      print
    }
    

    我一直在尋找匹配方式

    * Implements hook_entity_info_alter().
     */
    function file_test_entity_type_alter(&$entity_types) {
    

    如此建立

    /\* Implements hook_/,/function / {
      print
    }
    

    我需要的內容.一个更複雜的示例是跳過行並擦掉非空間部分.請註意,awk是一種記錄(行)和字(由空格分割)工具。

    # start,end pattern match using comma
    / \* Implements hook_(.*?)\./,/function (.\S*?)/ {
      # skip PHP multi line comment end
      $0 ~ / \*\// skip
      # Only print 3rd word
      if ($0 ~ /Implements/) {
        hook=$3
        # scrub of opening parenthesis and following.
        sub(/\(.*$/, "", hook)
        print hook
      }
      # Only print function name without parenthesis
      if ($0 ~ /function/) {
        name=$2
        # scrub of opening parenthesis and following.
        sub(/\(.*$/, "", name)
        print name
        print ""
      }
    }
    

    希望這也有帮助。

    另請參见ftp://ftp.gnu.org/old-gnu/Manuals/gawk-3.0.3/html_chapter/gawk_toc.html

  • 5月前
    4 #

    我使用sendmail日志进行此類操作, 不時。

    给出:

    Jan 15 22:34:39 mail sm-mta[36383]: r0B8xkuT048547: to=<[email protected]>, delay=4+18:34:53, xdelay=00:00:00, mailer=esmtp, pri=21092363, relay=web3., dsn=4.0.0, stat=Deferred: Operation timed out with web3.
    Jan 15 22:34:39 mail sm-mta[36383]: r0B8hpoV047895: to=<[email protected]>, delay=4+18:49:22, xdelay=00:00:00, mailer=esmtp, pri=21092556, relay=web3., dsn=4.0.0, stat=Deferred: Operation timed out with web3.
    Jan 15 22:34:51 mail sm-mta[36719]: r0G3Youh036719: from=<[email protected]>, size=0, class=0, nrcpts=0, proto=ESMTP, daemon=IPv4, relay=[50.71.152.178]
    Jan 15 22:35:04 mail sm-mta[36722]: r0G3Z2SF036722: lost input channel from [190.107.98.82] to IPv4 after rcpt
    Jan 15 22:35:04 mail sm-mta[36722]: r0G3Z2SF036722: from=<[email protected]>, size=0, class=0, nrcpts=0, proto=SMTP, daemon=IPv4, relay=[190.107.98.82]
    Jan 15 22:35:36 mail sm-mta[36728]: r0G3ZXiX036728: lost input channel from ABTS-TN-dynamic-237.104.174.122.airtelbroadband.in [122.174.104.237] (may be forged) to IPv4 after rcpt
    Jan 15 22:35:36 mail sm-mta[36728]: r0G3ZXiX036728: from=<[email protected]>, size=0, class=0, nrcpts=0, proto=SMTP, daemon=IPv4, relay=ABTS-TN-dynamic-237.104.174.122.airtelbroadband.in [122.174.104.237] (may be forged)
    

    我使用類似這樣的指令碼:

    #!/usr/bin/awk -f
    BEGIN {
      search=ARGV[1];  # Grab the first command line option
      delete ARGV[1];  # Delete it so it won't be considered a file
    }
    # First, store every line in an array keyed on the Queue ID.
    # Obviously, this only works for smallish log segments, as it uses up memory.
    {
      line[$6]=sprintf("%s\n%s", line[$6], $0);
    }
    # Next, keep a record of Queue IDs with substrings that match our search string.
    index($0, search) {
      show[$6];
    }
    # Finally, once we've processed all input data, walk through our array of "found"
    # Queue IDs, and print the corresponding records from the storage array.
    END {
      for(qid in show) {
        print line[qid];
      }
    }
    

    获得以下輸出:

    $ mqsearch airtel /var/log/maillog
    Jan 15 22:35:36 mail sm-mta[36728]: r0G3ZXiX036728: lost input channel from ABTS-TN-dynamic-237.104.174.122.airtelbroadband.in [122.174.104.237] (may be forged) to IPv4 after rcpt
    Jan 15 22:35:36 mail sm-mta[36728]: r0G3ZXiX036728: from=<[email protected]>, size=0, class=0, nrcpts=0, proto=SMTP, daemon=IPv4, relay=ABTS-TN-dynamic-237.104.174.122.airtelbroadband.in [122.174.104.237] (may be forged)
    

    這裏的想法是,我將print与要搜尋的字元串的Sendmail佇列ID匹配的所有行.代碼的結構当然是日志檔案結構的产物,因此您需要针對要分析和提取的資料自定義解決方案。

  • 5月前
    5 #

    `pcregrep -M` works pretty well for this.
    

    来自pcregrep(1):

    -M, --multiline

    Allow patterns to match more than one line. When this option is given, patterns may usefully contain literal newline characters and internal occurrences of ^ and $ characters. The output for a successful match may consist of more than one line, the last of which is the one in which the match ended. If the matched string ends with a newline sequence the output ends at the end of that line.

    When this option is set, the PCRE library is called in “multiline” mode. There is a limit to the number of lines that can be matched, imposed by the way that pcregrep buffers the input file as it scans it. However, pcregrep ensures that at least 8K characters or the rest of the document (whichever is the shorter) are available for forward matching, and similarly the previous 8K characters (or all the previous characters, if fewer than 8K) are guaranteed to be available for lookbehind assertions. This option does not work when input is read line by line (see --line-buffered.)

  • java:JAXB將迴圈引用對映到XML
  • c++:內聯功能鏈接