How to parse Avaya CDR into MySQL database with Perl

This is a follow-up to my earlier post about setting up CDR processing with Kiwi Syslog Server. In that post, I discuss how to capture the data to a text file. But of course, if we are capturing CDR, we probably want to analyze it, right?

So this post is dedicated to capturing the data to a MySQL database. It is not for the faint-of-heart though. It requires a little knowledge of MySQL and willingness to read a tiny bit of perl. I am happy to help you set this up, though. Either via this blog as time allows, or I would love to work for you as an hourly contractor – and I love all things telecom!

If you don’t have a Linux server in your network that you can use for your own purposes, I highly recommend one. Any Linux distribution will do, and you probably have a VMWare infrastructure to support it. If you need to justify it, I can help – perhaps even this blog entry will help. Tell your boss you need to perform text analysis of your PBX data. This blog entry assumes you have Linux, but I suppose it will work on Windows with a LAMP stack (but good luck with local firewall rules in Windows).

So first. you must create a MySQL database to store the call records. Let me know if you need help doing this. Since I have to start somewhere with this post, I will assume you have access to a MySQL database. You just need to create a table to store the data. Here is the statement to do that:

drop table if exists cdr_syslog;
create table cdr_syslog (
 id integer auto_increment primary key,
 method varchar(8),
 reported_on datetime,
 deduced_starttime datetime,
 source_ip varchar(32),
 from_number varchar(24),
 to_number varchar(24),
 call_type varchar(12),
 call_time varchar(16),
 direction varchar(3),
 duration_ss integer,
 call_hash varchar(64),
 pbx_time_of_day_hh varchar(2),
 pbx_time_of_day_mm varchar(2),
 pbx_duration_h varchar(1),
 pbx_duration_mm varchar(2),
 pbx_duration_6s varchar(1),
 pbx_condition_code varchar(1),
 pbx_access_code_dialed varchar(4),
 pbx_access_code_used varchar(4),
 pbx_dialed_number varchar(23),
 pbx_calling_number varchar(14),
 pbx_account_code varchar(15),
 pbx_authorization_code varchar(7),
 pbx_frl varchar(1),
 pbx_incoming_ckt varchar(3),
 pbx_outgoing_ckt varchar(3),
 pbx_feature_flag varchar(1),
 pbx_attendant_console varchar(4),
 pbx_incoming_tac varchar(4),
 pbx_node_number varchar(2),
 pbx_ins varchar(5),
 pbx_ixc varchar(3),
 pbx_bcc varchar(1),
 pbx_ma_uui varchar(1),
 pbx_resource_flag varchar(1),
 pbx_packet_count varchar(5),
 source_line varchar(128),
 created_on datetime
);

alter table cdr_syslog add index i_call_hash(call_hash);
alter table cdr_syslog add index i_deduced_starttime(deduced_starttime);
alter table cdr_syslog add index i_from_number(from_number);
alter table cdr_syslog add index i_to_number(to_number);

Those statements create the table and a few indexes to speed things up. The indexes for call_hash and deduced_starttime are used when inserting records. The indexes for from_number and to_number are added since these are probably the most common fields to query. Typically, I will add an index for any field that appears in my ‘where’ clause, but I’m no DBA so there are probably better ways to tune it. However, this works fine for me and should for you as well.

I am using a Kiwi Syslog server to capture the CDR. But the cool thing about Kiwi is that you can “forward” the message along to another server. So I configured mine to send it to my Linux server. This is the screenshot for that forward command within Kiwi:

kiwi-send-logs-out-via-syslog

This simply sends the Syslog message to my server on UDP port 514. This is the typical Syslog port. Then I have a Perl script running on my server. That Perl script has some dependencies

use strict;
use DBI;
use Net::Syslogd;
use Digest::MD5 qw(md5_hex);
use POSIX qw(setsid);

You don’t need to be familiar with Perl to run this script, but you will need to make sure these packages are installed. For example, if I have a script that runs this line:

use Roger::Wilco;

When I try to run the script, I will get this error:

Can't locate Roger/Wilco.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .) at syslog.pl line 2.
BEGIN failed--compilation aborted at syslog.pl line 2.

So if you try to run my CDR script and you get any kind of “Can’t Locate” error, then you need to install that package. To to this, type this at a command line:

Perl -MCPAN -e shell

And that will get you to the CPAN command line where you can install packages.

# perl -MCPAN -e shell
Terminal does not support AddHistory.

cpan shell -- CPAN exploration and modules installation (v1.9402)
Enter 'h' for help.

cpan[1]>

And from here, you can type

install Net::Syslogd

Or whatever other packages you need to install. Note that case matters, and that’s a double colon between the words. When it asks you a yes/no question, say yes. If all goes well, then it should install fine. CPAN installs all dependencies. If you get errors, the most common is you don’t have a c++ compiler on your machine. I never did figure out how to fix that in Windows. In Linux, there are lots of tutorials. With that in mind, here is the Perl script that will catch the CDR forwarded from Kiwi. Be sure to read my comments at the end.

#!/usr/bin/perl

### Read unformatted CDR data from an Avaya server
### This script uses Net::Syslogd to listen to standard syslog messages
### The Avaya phone systems send to TCP ports. Unfortunately, this script is to listen from 
### A 'proxy' instance of CDR. for example, a Kiwi Syslog Server that forwards syslog messages
### out to another server.
### see my blog http://rogerthephoneguy.com/?p=404 for details


### Usage!
### set the $dir folder to where you store text files from your syslog server
### and run this script with 'perl cdrd.pl file' to tell the script to process files
### or run as 'perl cdrd.pl' to just watch the processing fly by
### or run as 'perl cdrd.pl -d' as a daemon process
### to stop the daemon, just create a file matching $killfile
### Or do a 'ps -ef |grep cdrd.pl to find the PID
### and kill -9 that pid
### not the best way to manage daemons, but we are telephone engineers here, not linux admins, right?

use strict;
use DBI;
use Net::Syslogd;
use Digest::MD5 qw(md5_hex);
use POSIX qw(setsid);

my $db;
my $dir = "/home/roger/avaya/cdr/data/";
my $killfile = "/home/roger/avaya/cdr/cdrd.end";


sub db_open {
        my $dsn = "DBI:mysql:dbname_ie_avaya:localhost";
        my $user = "dbuser";
        my $pass = "dbpassword";
        while (!($db)) {
                $db = DBI->connect($dsn, $user, $pass);
        }
}

sub db_close {
        if($db) {
                $db->disconnect();
        }
        undef($db);
}

sub sql { #check
        my ($sql) = (@_);
        my $sth = $db->prepare($sql);
        if(!($sth->execute())) {
                print "WARNING - unable to execute $sql\n";
        }
        $sth->finish();
}

sub generic_create {
        my ($table, $fields, $values) = (@_);
        my $sql = "insert into $table ($fields) values ($values)";
        my $sth = $db->prepare($sql);
        if(!($sth->execute())) {
                print "WARNING - unable to insert with $sql\n";
        }
        $sth->finish();
}

sub generic_read {
        my ($select, $table, $where) = (@_);
        my @datarow;
        my $sql = "select $select from $table where $where limit 1";
        my $sth = $db->prepare($sql);
        if($sth->execute()) {
                @datarow = $sth->fetchrow_array();
        } else {
                print "unable to execute $sql\n";
        }
        $sth->finish();
        return $datarow[0];
}

sub one {
        my ($sql) = (@_);
        my @datarow;
        my $sth = $db->prepare($sql);
        if($sth->execute()) {
                @datarow = $sth->fetchrow_array();
        } else {
                print "unable to execute $sql\n";
        }
        $sth->finish();
        return $datarow[0];
}

sub process_cdr_line {
    my $cdr = shift;
    chomp($cdr);
    my ($logtime,$facsev,$remoteaddr,$message) = split(/\t/,$cdr);
    my $md5message = md5_hex($message);
    #print "012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789\n$message";
    #here are the start and length of each data point within the message of an "unformatted" cdr line
    #Nobody knows what some of these are. It's a mystery today
    #your condition codes may be different, so you should watch your data closely for several days
    my @parses = (
        0,2,     #time of day hours
        2,2,     #time of day minutes
        4,1,     #duration hours
        5,2,     #duration minutes
        7,1,     #duration tenths
        8,1,     #condition code
        9,4,     #access code dialed
        13,4,    #access code used 
        17,15,     #dialed number
        32,10,     #calling number
        42,15,     #account code
        57,7,     #auth code
        66,1,     #frl
        67,3,     #incoming ckt
        70,3,     #outgoung ckt
        73,1,     #feature flag
        70,4,     #attendant console
        76,4,     #incoming tac
        82,2,     #node number
        84,4,     #INS
        88,3,     #IXC
        92,1,     #BCC
        93,1,     #MAUUI
        94,1,    #Resource flag
        95,4    #packet count
    );
    my $values = '';
    my $insert;
    my $i;
    my $calltime;
    my $duration;
    my $direction = 'unk';
    my $dialed;
    my $calling;
    my $node = '01';
    my $condition;
    for($i=0; $i<25; $i++) { my $start = $parses[$i*2]; my $length = $parses[$i*2+1]; my $part = substr($message,$start,$length); #print "this $i from $start with length $length is '$part'\n"; $part =~ s/^ *//g; $part =~ s/ *$//g; if($i == 0) { #time of day hh $calltime = "$part:"; } elsif($i == 1) { #time of day mm $calltime .= $part; } elsif($i == 2) { #duration h $duration = $part * 60 * 60; } elsif($i == 3) { #duration mm $duration += ($part * 60); } elsif($i == 4) { #duration 1/10 min (6 sec) $duration += ($part * 6); } elsif($i == 5) { #condition code $condition = $part; if($part == 9) { $direction = "in"; } elsif($part == 7) { $direction = "out"; } } elsif($i == 8) { #dialed number $dialed = $part; $dialed =~ s/E$//; if(($dialed ne '6226') && ($dialed =~ /^6[234]\d\d$/)) { $node = '44'; } } elsif($i == 9) { #calling number $calling = $part; $calling =~ s/E$//; if(($dialed ne '6226') && ($dialed =~ /^6[234]\d\d$/)) { $node = '44'; } } elsif($i == 17) { #incoming tac if($part =~ /^#0\d\d/) { $node = '44'; } } elsif($i == 18) { #node number $part = $node; } $values .= "'$part',"; } if($direction eq "in") { my $temp = $dialed; $dialed=$calling; $calling=$temp; } if(($dialed =~ /^\d*$/) && ($calling =~ /^\d*$/)) { #dialed_name and calling_name are filled from a 'display_name' table that is not part of this demo. #for more information, contact me through RogerThePhoneGuy.com and we can chat about how this table is filled my $dialed_name=''; #generic_read("name", "avaya_display_name","station='$dialed'"); my $calling_name=''; #generic_read("name", "avaya_display_name","station='$calling'"); $dialed_name =~ s/'/\\'/g; $calling_name =~ s/'/\\'/g; #print "logtime:$logtime\ncondition $condition\ndirection $direction\ndialed $dialed ($dialed_name)\ncalling $calling ($calling_name)\nduration $duration\n"; $insert = "insert into cdr_syslog values (null,'syslog','$logtime',null,'$remoteaddr','$calling','$calling_name','$dialed','$dialed_name','type','$calltime','$direction',$duration,'$md5message',"; $insert .= "$values '$message',now())"; my $check = one("select count(*) from cdr_syslog where call_hash = '$md5message'"); #print "hash check = $check\n"; if($check == 0) { #We insert the md5 of this line, to prevent double-logging. This allows us to re-process the cdr as much as necessary w/o dups in the database #print "$insert\n"; sql($insert); sql("update cdr_syslog set deduced_starttime = DATE_SUB(reported_on,INTERVAL duration_ss SECOND) where call_hash='$md5message'"); } else { #print "duplicate\n"; } } else { $insert = "insert into pbx_syslog values (null,'$message')"; #sql($insert); } } sub daemonize { #check open STDIN, '/dev/null' or die "Can't read /dev/null: $!"; open STDOUT, '>>/dev/null' or die "Can't write to log: $!";
    open STDERR, '>>/dev/null' or die "Can't write to log: $!";
    defined(my $pid = fork)   or die "Can't fork: $!";
    exit if $pid;
    setsid                    or die "Can't start a new session: $!";
    umask 0;
}

if($ARGV[0] eq "-d") {
    daemonize;
}

db_open();

if($ARGV[0] eq "file") {
    opendir(DH, $dir);
    my @files = readdir(DH);
    closedir(DH);
    foreach my $file (@files) {
        open CDR, "$dir$file";
        my $i = 0;
        while (my $line = ) {
            $i++;
            my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday) = localtime(time);$mon++;$year+=1900;
            $line =~ s/[^!-~\s]//g; #strips unprintable
            process_cdr_line($line);
            #die if($i>2);
        }
        print "$dir$file - $i lines\n";
        close CDR;
        unlink "$dir$file";
    }

} else {
    my $syslogd = Net::Syslogd->new()
      or die "Error creating Syslogd listener: ", Net::Syslogd->error;

    while (!(-e $killfile)) {
        my $message = $syslogd->get_message();
        if (!defined($message)) {
            printf "$0: %s\n", Net::Syslogd->error;
            exit 1;
        } elsif ($message == 0) {
            next
        }

        if (!defined($message->process_message())) {
            printf "$0: %s\n", Net::Syslogd->error
        } else {
            my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday) = localtime(time);$mon++;$year+=1900;
            my $logtime = sprintf("%04d-%02d-%02d %02d:%02d:%02d",$year,$mon,$mday,$hour,$min,$sec);
            my $remoteaddr = $message->remoteaddr;
            my $remoteport = $message->remoteport;
            my $facility = $message->facility;
            my $severity = $message->severity;
            my $time = $message->time;
            my $hostname = $message->hostname;
            my $msg = $message->message;
            my $line = "$logtime\t$facility.$severity\t$remoteaddr\t$msg\n";
            print $line;
            #now $line looks like any line from the CDR files
            #so this next part works via syslog or files
            $line =~ s/[^!-~\s]//g; #strips unprintable
            process_cdr_line($line);
        }
    }
    unlink $killfile;
}

db_close();

There’s a lot in that script. Things you must know:

  1. You’ll need to replace the database name, username, and password in the “db_open” subroutine.
  2. The comments at the top explain how to run it to process files, or to capture syslog live, or even how to run as a daemon.
  3. Each CDR line is hashed and included in the record, so don’t worry about processing the CDR files multiple times. It will NOT insert duplicate records.
  4. When processing files, the script will delete the file when done. So always copy the CDR files into a separate folder before running this script.
  5. I kludged this together from a working Perl script. I removed some identifying information, moved some variables around, and added a bunch of comments. If you get any errors, let me know – it’ll probably be really easy to fix.
  6. For any hobbyist Perl developers out there, this script includes some handy database functions that insert records, return rows, and return one record.
  7. This script is for information purposes only. Naturally, I don’t provide any warranties, guarantees, or assertions that it will work with your systems. Run it at your own risk.
  8. If you’d like me to help set this up, I’d love to work for you. I love telecom and I’m always available for hire as a consultant. I would LOVE to be your telephone guy!

If you’re planning on trying this, then I’m really flattered. Let me know how it goes. Mine has been running for about two years and I cannot tell you how handy it is to have that much CDR data in a MySQL table. I have a PHP script that I use to analyze the data. Do you want to see that?

Thanks all,

Roger

 

6 thoughts on “How to parse Avaya CDR into MySQL database with Perl

  1. Ivan

    hi,

    My question is not for this post exactly:
    is there a way to automate data extraction from AVAYA whit pentaho or other software

    Thanks

    Reply
    1. roger Post author

      I don’t know about Pentaho, but I also have a Perl script that extracts data from CMs- specifically stations and registration status. I also have a PHP script that extracts a TON of data including station status, which is quite handy. But the PHP script uses telnet whereas the Perl script uses SSH in case your security policy doesn’t allow telnet.

      It’s not ready for public consumption yet, and it’s a little complicated. There are tons of customizations for my site so I need to clean it, comment it. Are you interested in seeing it? What kind of data did you want to extract? I’m using it to pull data every 15 minutes and send me a change report. This way name changes, IP address changes, registration status changes, call-forwarding, and things like that get emailed to me. The awesome part is I have a history of changes in my PBX, which is nice. Things like “who used to have this extension?” and “when was the last time this extension was logged in?” and “what was former employee Rhoda Morgenstern’s extension that we deleted a year ago?”

      I’m not sure if I’ll be able to squeeze it into one post. Does it sound interesting?

      Reply
  2. Ivan

    Hi,

    Many thanks for response, of course it sound interenting.
    I dont have skills on php or perl, but if you share it i can try to implement it on my system with the help of developer guys.
    I think all of this is whit 513 console and i have all UID for work with them.

    Thanks and greets.

    Reply
  3. John

    Hello Roger,
    First, I have to say GREAT BLOG!!! I know you are busy with your Jolly Roger company now, however if you can squeeze in the PHP script/blog that would be great. I’m looking forward to it. Hope, you can fit it sometime.
    Best of luck on your new venture with the Jolly Roger company. Make that $$$$ and save the world!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *