Class: Abnf::Tokenizer

Inherits:

Object

Object
Abnf::Tokenizer

show all

Defined in:

../lib/abnf_tokenizer.rb

Overview

Abnf::Tokenizer is able to preprocess ABNF syntax for the subsequent parsing by the Abnf::Parser

Instance Method Summary (collapse)

- (Tokenizer) initialize constructor
Prepare the RegExp machinery of the tokenizer.
- (Object) tokenize(txt)
Do the tokenization.

Constructor Details

- (`Tokenizer`) initialize

Prepare the RegExp machinery of the tokenizer.

# File '../lib/abnf_tokenizer.rb', line 10

def initialize
  @rex = [
    [ /\A\r?\n/m, :newline ],

    [ /\A[ \t\f]+/m, :space ],

    [ /\A(;[^\r\n]*\r?\n)/m, [ 
                         [ /\A;([^\r\n]*)/m, :comment ], 
                         [ /\A\r?\n/m,   :newline ] 
                       ]
    ],

    [ /\AALPHA/m, :_alpha ],
    [ /\ABIT/m, :_bit ],
    [ /\AVCHAR/m, :_vchar ],       
    [ /\ACHAR/m, :_char ],
    [ /\ACRLF/m, :_crlf ],
    [ /\ACR/m, :_cr ],
    [ /\ALF/m, :_lf ],
    [ /\ACTL/m, :_ctl ],
    [ /\ADIGIT/m, :_digit ],
    [ /\ADQUOTE/m, :_dquote ],
    [ /\AHEXDIG/m, :_hexdig ],
    [ /\AHTAB/m, :_htab ],
    [ /\ALWSP/m, :_lwsp ],
    [ /\AWSP/m, :_wsp ],
    [ /\ASP/m, :_sp ],
    [ /\AOCTET/m, :_octet ],

    [ /\A([A-Za-z][\w\-]*)/m, :symbol ],
    [ /\A<([A-Za-z][\w\-]*)>/m, :symbol ],

    [ /\A"([^"]*?)"/m, :literal ],

    [ /\A%b([01][01\-\.]*)/m, [
                               [ /\A([01]+-[01]+)/m, :range_bin ],
                               [ /\A([01]+)/m, :entity_bin ],
                               [ /\A\./m, :dot ]                            
                              ]
    ],

    [ /\A%d([\d][\d\-\.]*)/m, [
                               [ /\A(\d+-\d+)/m, :range_dec ], 
                               [ /\A(\d+)/m, :entity_dec ],
                               [ /\A\./m, :dot ]                            
                              ]
    ],

    [ /\A%x([a-fA-F\d][a-fA-F\d\-\.]*)/m, [
                                           [ /\A([a-fA-F\d]+-[a-fA-F\d]+)/m, :range_hex ],
                                           [ /\A([a-fA-F\d]+)/m, :entity_hex ],
                                           [ /\A\./m, :dot ]                           
                                          ]
    ],
 
    [ /\A(\d+)/m, :number ],

    [ /\A\*/m, :asterisk ],

    [ /\A=\//m, :eq_slash ],
    [ /\A=/m, :equals ],

    [ /\A\(/m, :seq_begin ],
    [ /\A\)/m, :seq_end ],

    [ /\A\[/m, :opt_begin ],
    [ /\A\]/m, :opt_end ],
   
    [ /\A\//m, :slash ]

  ]
end

Instance Method Details

- (`Object`) tokenize(txt)

Do the tokenization. The string text will be processed into the stream of Tokens. The resultant value is the Array of Mapper::Token structures.

# File '../lib/abnf_tokenizer.rb', line 85

def tokenize( txt )
  tokens = tokenize_internal( txt, @rex )
  tokens.push Mapper::Token.new( :eof )    
end