NAME
    Hailo - A pluggable Markov engine analogous to MegaHAL

SYNOPSIS
    This is the synopsis for using Hailo as a module. See hailo for
    command-line invocation.

        # Hailo requires Perl 5.10
        use 5.010;
        use Any::Moose;
        use Hailo;

        # Construct a new in-memory Hailo using the SQLite backend. See
        # backend documentation for other options.
        my $hailo = Hailo->new;

        # Various ways to learn
        my @train_this = ("I like big butts", "and I can not lie");
        $hailo->learn(\@train_this);
        $hailo->learn($_) for @train_this;

        # Heavy-duty training interface. Backends may drop some safety
        # features like journals or synchronous IO to train faster using
        # this mode.
        $hailo->train("megahal.trn");
        $hailo->train($filehandle);

        # Make the brain babble
        say $hailo->reply("hello good sir.");
        # Just say something at random
        say $hailo->reply();

DESCRIPTION
    Hailo is a fast and lightweight markov engine intended to replace
    AI::MegaHAL. It has a Mouse (or Moose) based core with pluggable
    storage, tokenizer and engine backends.

    It is similar to MegaHAL in functionality, the main differences (with
    the default backends) being better scalability, drastically less memory
    usage, an improved tokenizer, and tidier output.

    With this distribution, you can create, modify, and query Hailo brains.
    To use Hailo in event-driven POE applications, you can use the
    POE::Component::Hailo wrapper. One example is
    POE::Component::IRC::Plugin::Hailo, which implements an IRC chat bot.

  Etymology
    *Hailo* is a portmanteau of *HAL* (as in MegaHAL) and failo
    <http://identi.ca/failo>.

Backends
    Hailo supports pluggable storage and tokenizer backends, it also
    supports a pluggable UI backend which is used by the hailo command-line
    utility.

  Storage
    Hailo can currently store its data in either a SQLite, PostgreSQL or
    MySQL database, more backends were supported in earlier versions but
    they were removed as they had no redeeming quality.

    SQLite is the primary target for Hailo. It's much faster and uses less
    resources than the other two. It's highly recommended that you use it.

    This benchmark shows how the backends compare when training on the small
    testsuite dataset as reported by the utils/hailo-benchmark utility
    (found in the distribution):

                             Rate DBD::Pg DBD::mysql DBD::SQLite/file DBD::SQLite/memory
        DBD::Pg            2.22/s      --       -33%             -49%               -56%
        DBD::mysql         3.33/s     50%         --             -23%               -33%
        DBD::SQLite/file   4.35/s     96%        30%               --               -13%
        DBD::SQLite/memory 5.00/s    125%        50%              15%                 --

    Under real-world workloads SQLite is much faster than these results
    indicate since the time it takes to train/reply is relative to the
    existing database size. Here's how long it took to train on a 214,710
    line IRC log on a Linode 1080 with Hailo 0.18:

    *   SQLite

            real    8m38.285s
            user    8m30.831s
            sys     0m1.175s

    *   MySQL

            real    48m30.334s
            user    8m25.414s
            sys     4m38.175s

    *   PostgreSQL

            real    216m38.906s
            user    11m13.474s
            sys     4m35.509s

    In the case of PostgreSQL it's actually much faster to first train with
    SQLite, dump that database and then import it with psql(1), see failo's
    README <http://github.com/hinrik/failo> for how to do that.

    However when replying with an existing database (using
    utils/hailo-benchmark-replies) yields different results. SQLite can
    reply really quickly without being warmed up (which is the typical
    usecase for chatbots) but once PostgreSQL and MySQL are warmed up they
    start replying faster:

    Here's a comparison of doing 10 replies:

                            Rate PostgreSQL MySQL SQLite-file SQLite-file-28MB SQLite-memory
        PostgreSQL        71.4/s         --  -14%        -14%             -29%          -50%
        MySQL             83.3/s        17%    --          0%             -17%          -42%
        SQLite-file       83.3/s        17%    0%          --             -17%          -42%
        SQLite-file-28MB 100.0/s        40%   20%         20%               --          -30%
        SQLite-memory      143/s       100%   71%         71%              43%            --

    In this test MySQL uses around 28MB of memory (using Debian's
    my-small.cnf) and PostgreSQL around 34MB. Plain SQLite uses 2MB of cache
    but it's also tested with 28MB of cache as well as with the entire
    database in memory.

    But doing 10,000 replies is very different:

                           Rate SQLite-file PostgreSQL SQLite-file-28MB MySQL SQLite-memory
        SQLite-file      85.1/s          --        -7%             -18%  -27%          -38%
        PostgreSQL       91.4/s          7%         --             -12%  -21%          -33%
        SQLite-file-28MB  103/s         21%        13%               --  -11%          -25%
        MySQL             116/s         37%        27%              13%    --          -15%
        SQLite-memory     137/s         61%        50%              33%   18%            --

    Once MySQL gets more memory (using Debian's my-large.cnf) and a chance
    to warm it starts yielding better results (I couldn't find out how to
    make PostgreSQL take as much memory as it wanted):

                       Rate         MySQL SQLite-memory
        MySQL         121/s            --          -12%
        SQLite-memory 138/s           14%            --

  Tokenizer
    By default Hailo will use the word tokenizer to split up input by
    whitespace, taking into account things like quotes, sentence terminators
    and more.

    There's also a the character tokenizer. It's not generally useful for a
    conversation bot but can be used to e.g. generate new words given a list
    of existing words.

UPGRADING
    Hailo makes no promises about brains generated with earlier versions
    being compatable with future version and due to the way Hailo works
    there's no practical way to make that promise. Learning in Hailo is
    lossy so an accurate conversion is impossible.

    If you're maintaining a Hailo brain that you want to keep using you
    should save the input you trained it on and re-train when you upgrade.

    Hailo is always going to lose information present in the input you give
    it. How input tokens get split up and saved to the storage backend
    depends on the version of the tokenizer being used and how that input
    gets saved to the database.

    For instance if an earlier version of Hailo tokenized "foo+bar" simply
    as "foo+bar" but a later version split that up into "foo", "+", "bar",
    then an input of ""foo+bar are my favorite metasyntactic variables""
    wouldn't take into account the existing "foo+bar" string in the
    database.

    Tokenizer changes like this would cause the brains to accumulate garbage
    and would leave other parts in a state they wouldn't otherwise have
    gotten into.

    There have been more drastic changes to the database format itself in
    the past.

    Having said all that the database format and the tokenizer are
    relatively stable. At the time of writing 0.33 is the latest release and
    it's compatable with brains down to at least 0.17. If you're upgrading
    and there isn't a big notice about the storage format being incompatable
    in the Changes file your old brains will probably work just fine.

ATTRIBUTES
  "brain"
    The name of the brain (file name, database name) to use as storage.
    There is no default. Whether this gets used at all depends on the
    storage backend, currently only SQLite uses it.

  "save_on_exit"
    A boolean value indicating whether Hailo should save its state before
    its object gets destroyed. This defaults to true and will simply call
    save at "DEMOLISH" time.

  "order"
    The Markov order (chain length) you want to use for an empty brain. The
    default is 2.

  "engine_class"
  "storage_class"
  "tokenizer_class"
  "ui_class"
    A a short name name of the class we use for the engine, storage,
    tokenizer or ui backends.

    By default this is Default for the engine, SQLite for storage, Words for
    the tokenizer and ReadLine for the UI. The UI backend is only used by
    the hailo command-line interface.

    You can only specify the short name of one of the packages Hailo itself
    ships with. If you need another class then just prefix the package with
    a plus (Catalyst style), e.g. "+My::Foreign::Tokenizer".

  "engine_args"
  "storage_args"
  "tokenizer_args"
  "ui_args"
    A "HashRef" of arguments for engine/storage/tokenizer/ui backends. See
    the documentation for the backends for what sort of arguments they
    accept.

METHODS
  "new"
    This is the constructor. It accepts the attributes specified in
    "ATTRIBUTES".

  "learn"
    Takes a string or an array reference of strings and learns from them.

  "train"
    Takes a filename, filehandle or array reference and learns from all its
    lines. If a filename is passed, the file is assumed to be UTF-8 encoded.
    Unlike "learn", this method sacrifices some safety (disables the
    database journal, fsyncs, etc) for speed while learning.

  "reply"
    Takes an optional line of text and generates a reply that might be
    relevant.

  "learn_reply"
    Takes a string argument, learns from it, and generates a reply that
    might be relevant. This is equivalent to calling learn followed by
    reply.

  "save"
    Tells the underlying storage backend to save its state, any arguments to
    this method will be passed as-is to the backend.

  "stats"
    Takes no arguments. Returns the number of tokens, expressions, previous
    token links and next token links.

SUPPORT
    You can join the IRC channel *#hailo* on FreeNode if you have questions.

BUGS
    Bugs, feature requests and other issues are tracked in Hailo's issue
    tracker on Github <http://github.com/hinrik/hailo/issues>.

SEE ALSO
    *   POE::Component::Hailo - A non-blocking POE wrapper around Hailo

    *   POE::Component::IRC::Plugin::Hailo - A Hailo IRC bot plugin

    *   <http://github.com/hinrik/failo> - Failo, an IRC bot that uses Hailo

    *   <http://github.com/bingos/gumbybrain> - GumbyBRAIN, a more famous
        IRC bot that uses Hailo

    *   Hailo::UI::Web - A Catalyst and jQuery powered web interface to
        Hailo available at hailo.nix.is <http://hailo.nix.is> and as
        hailo-ui-web <http://github.com/avar/hailo-ui-web> on GitHub
        <http://github.com>

    *   HALBot - Another Catalyst Dojo powered web interface to Hailo
        available at bifurcat.es <http://bifurcat.es/> and as
        halbot-on-the-web
        <http://gitorious.org/halbot-on-the-web/halbot-on-the-web> at
        gitorious <http://gitorious.org>

    *   <http://github.com/pteichman/cobe> - cobe, a Python port of MegaHAL
        "inspired by the success of Hailo"

LINKS
    *   hailo.org <http://hailo.org> - Hailo's website

    *   <http://bit.ly/hailo_rewrite_of_megahal> - Hailo: A Perl rewrite of
        MegaHAL, A blog posting about the motivation behind Hailo

    *   <http://blogs.perl.org/users/aevar_arnfjor_bjarmason/hailo/> - More
        blog posts about Hailo on Ævar Arnfjörð Bjarmason's blogs.perl.org
        <http://blogs.perl.org> blog

    *   Hailo on freshmeat <http://freshmeat.net/projects/hailo> and ohloh
        <https://www.ohloh.net/p/hailo>

AUTHORS
    Hinrik Örn Sigurðsson, hinrik.sig@gmail.com

    Ævar Arnfjörð Bjarmason <avar@cpan.org>

LICENSE AND COPYRIGHT
    Copyright 2010 Hinrik Örn Sigurðsson and Ævar Arnfjörð Bjarmason
    <avar@cpan.org>

    This program is free software, you can redistribute it and/or modify it
    under the same terms as Perl itself.