Skip to content

ABNF Parser - Linphone.org mirror for belr (git://git.linphone.org/belr.git)

License

Notifications You must be signed in to change notification settings

BelledonneCommunications/belr

Repository files navigation

pipeline status

What is Belr

Belr is Belledonne Communications' language recognition library, written in C++11. It parses text inputs formatted according to a language defined by an ABNF grammar, such as the protocols standardized at IETF.

It drastically simplifies the writing of a parser, provided that the parsed language is defined with an ABNF grammar[1]. The parser automaton is automatically generated by belr library, in memory, from the ABNF grammar text. The application developer is responsible to connect belr's parser with its custom code through callbacks in order to get notified of recognized language elements.

It is based on finite state machine theory and heavily relies on recursivity from an implementation standpoint.

The benefits of using belr are:

  • belr is safe: no handly written code to parse the language. No buffer overflow. No mistakes in ABNF interpretation.
  • belr saves time: a lot of human efforts are eliminated because the parser is automatically generated.
  • belr saves space: belr does not generate source code files to insert in your build process either. The parser automaton is created at runtime, in memory.
  • belr is fast, as it was around 50% faster on parsing SIP URIs compared to antlr/antlr3c.
  • belr is flexible: you are free to design your parser API as you want, and simply connect the parser automaton with your API.

License

Copyright © Belledonne Communications SARL, all rights reserved.

Belr is dual licensed:

  • under a GNU GPLv3 license for free (see LICENSE.txt file for details)
  • under a proprietary license, for closed source projects. Contact [email protected] for costs and other service information.

How it works

Let's take a very basic example to understand. Your application first needs to create a Grammar object from a text file contaning the ABNF grammar description:

ABNFGrammarBuilder builder;
// The grammar is constructed from sipgrammar.txt file, plus an additional built-in grammar called 'CoreRules',
// which is used by almost every grammar.
shared_ptr<Grammar> grammar=builder.createFromAbnfFile("sipgrammar.txt", make_shared<CoreRules>());

Then, from the grammar object returned, instanciate a parser object, by telling belr the name of a base class you have defined to represent any element of the language. In the example below, it is called SipElement. The parser object can be used as much time as needed. There is no no need to re-instanciate it each time you need to parse a new input !

ABNFGrammarBuilder builder;
Parser<shared_ptr<SipElement>> parser(grammar);

Now, you have to connect the parser with your own classes in order to have language elements filled into your objects.

parser.setHandler("SIP-URI", make_fn(&SipUri::create)) //tells that whenever a SIP-URI is found, a SipUri object must be created.
		->setCollector("user", make_sfn(&SipUri::setUsername)) //tells that when a "user" field is found, SipUri::setUsername() is to be called for assigning the "user"
		->setCollector("host", make_sfn(&SipUri::setHost)) //tells that when host is encountered, use SipUri::setHost() to assign it to our SipUri object.
		->setCollector("port", make_sfn(&SipUri::setPort));

Here, we have instructed our belr parser to invoke our SipUri::create() each time it recognizes a SIP-URI. This method must simply return a new SipUri instance. We also told him, that each time the user part of a SIP URI is recognized, the SipUri::setUsername(const std::string& user) method must be called to fill the recognized user part into the created SipUri instance. Similarly, we assign the host part with SipUri::setHost() method, and the port part with the SipUri::setPort() method.

Finally, you can now parse a SIP-URI:

size_t parsedSize;
string inputToParse = "sip:[email protected]";
shared_ptr<SipElement> ret = parser.parseInput("SIP-URI", inputToParse , &parsedSize);
//if the sip uri is recognized, the return value is non null and you can cast it into a SipUri object.
if (ret){
	shared_ptr<SipUri> sipUri = dynamic_pointer_cast<SipUri>(ret);
	// Do what you want with the SipUri object...
}

The full example is in tools/belr-demo.cc.

One last thing to know. Grammar creation from text files requires many computations, which can slow down the startup of your application. Fortunately, a solution exists: use the belr-compiler tool to generate a binary representation of the grammar, saved to disk and included as a resource in your application. You can view the binary grammar as a kind of byte-code representing the language automaton. Then your application can simply instanciate the Grammar object by loading it from disk. It is hundred times faster.

Dependencies

  • bctoolbox[2]: our portability layer

Build Belr

cmake . -DCMAKE_INSTALL_PREFIX=<prefix> -DCMAKE_PREFIX_PATH=<search_prefixes>

make
make install

Limitations

Belr doesn't handle non-deterministic ABNF grammars. For example:

token       =  1*(alphanum / "-" / "." / "!" / "%" / "*"
my-element  =  token "!" 

The problem of this grammar is that "!" is part of token, and also the termination character my-element. It is non-deterministic: when the automaton finds a "!", it can recognize it as a token, or as the last element of my-element.

Unfortunately it is not so rare to encounter this kind of situation. Belr's current logic will be to include the "!" into token, all the time, because this is the first one that matches in the sequence.

The solution for this would be to have belr explore both possibilities, however this is not implemented as of today (2019-09-17). Most of time, a workaround exists by re-writing the problematic grammar rule in such a way that it is no longer this ambiguity.

Build options

  • CMAKE_INSTALL_PREFIX=<string>: install prefix
  • CMAKE_PREFIX_PATH=<string>: column-separated list of prefixes where to search for dependencies
  • ENABLE_STRICT=NO: build without strict compilation flags (-Wall -Werror)
  • ENABLE_TOOLS=NO: do not build tools (belr-demo, belr-parse)

Note for packagers

Our CMake scripts may automatically add some paths into research paths of generated binaries. To ensure that the installed binaries are striped of any rpath, use -DCMAKE_SKIP_INSTALL_RPATH=ON while you invoke cmake.

Rpm packaging belr can be generated with cmake3 using the following command: mkdir WORK cd WORK cmake3 ../ make package_source rpmbuild -ta --clean --rmsource --rmspec belr--.tar.gz


About

ABNF Parser - Linphone.org mirror for belr (git://git.linphone.org/belr.git)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages