Class: Abnf::Tokenizer

Inherits:
Object
  • Object
show all
Defined in:
../lib/abnf_tokenizer.rb

Overview

Abnf::Tokenizer is able to preprocess ABNF syntax for the subsequent parsing by the Abnf::Parser

Instance Method Summary (collapse)

Constructor Details

- (Tokenizer) initialize

Prepare the RegExp machinery of the tokenizer.



10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# File '../lib/abnf_tokenizer.rb', line 10

def initialize
  @rex = [
    [ /\A\r?\n/m, :newline ],

    [ /\A[ \t\f]+/m, :space ],

    [ /\A(;[^\r\n]*\r?\n)/m, [ 
                         [ /\A;([^\r\n]*)/m, :comment ], 
                         [ /\A\r?\n/m,   :newline ] 
                       ]
    ],

    [ /\AALPHA/m, :_alpha ],
    [ /\ABIT/m, :_bit ],
    [ /\AVCHAR/m, :_vchar ],       
    [ /\ACHAR/m, :_char ],
    [ /\ACRLF/m, :_crlf ],
    [ /\ACR/m, :_cr ],
    [ /\ALF/m, :_lf ],
    [ /\ACTL/m, :_ctl ],
    [ /\ADIGIT/m, :_digit ],
    [ /\ADQUOTE/m, :_dquote ],
    [ /\AHEXDIG/m, :_hexdig ],
    [ /\AHTAB/m, :_htab ],
    [ /\ALWSP/m, :_lwsp ],
    [ /\AWSP/m, :_wsp ],
    [ /\ASP/m, :_sp ],
    [ /\AOCTET/m, :_octet ],

    [ /\A([A-Za-z][\w\-]*)/m, :symbol ],
    [ /\A<([A-Za-z][\w\-]*)>/m, :symbol ],

    [ /\A"([^"]*?)"/m, :literal ],

    [ /\A%b([01][01\-\.]*)/m, [
                               [ /\A([01]+-[01]+)/m, :range_bin ],
                               [ /\A([01]+)/m, :entity_bin ],
                               [ /\A\./m, :dot ]                            
                              ]
    ],

    [ /\A%d([\d][\d\-\.]*)/m, [
                               [ /\A(\d+-\d+)/m, :range_dec ], 
                               [ /\A(\d+)/m, :entity_dec ],
                               [ /\A\./m, :dot ]                            
                              ]
    ],

    [ /\A%x([a-fA-F\d][a-fA-F\d\-\.]*)/m, [
                                           [ /\A([a-fA-F\d]+-[a-fA-F\d]+)/m, :range_hex ],
                                           [ /\A([a-fA-F\d]+)/m, :entity_hex ],
                                           [ /\A\./m, :dot ]                           
                                          ]
    ],
 
    [ /\A(\d+)/m, :number ],

    [ /\A\*/m, :asterisk ],

    [ /\A=\//m, :eq_slash ],
    [ /\A=/m, :equals ],

    [ /\A\(/m, :seq_begin ],
    [ /\A\)/m, :seq_end ],

    [ /\A\[/m, :opt_begin ],
    [ /\A\]/m, :opt_end ],
   
    [ /\A\//m, :slash ]

  ]
end

Instance Method Details

- (Object) tokenize(txt)

Do the tokenization. The string text will be processed into the stream of Tokens. The resultant value is the Array of Mapper::Token structures.



85
86
87
88
# File '../lib/abnf_tokenizer.rb', line 85

def tokenize( txt )
  tokens = tokenize_internal( txt, @rex )
  tokens.push Mapper::Token.new( :eof )    
end