Monday, 18 January 2021

Parsing CREATE TABLE statement with Regular Expression

Overview

Usually, we start developing a new system with designing database and then writing codes. If you are not a fan of ORM, you will have to manually writing the code for the data objects and it is very time consuming. To avoid writing the data objects manually, we develop code generator that will generate the basic codes and then we customize it.

To develop code generator that generate data object code, the generator must first understand the table structure. There are two ways to do that.

  1. For MSSQL, we query the INFORMATION_SCHEMA view for gathering the field name, data type and it's attributes.
  2. Parse the CREATE TABLE statement.

The first option is quite easy and straight forward. The catch is that it is database dependant and might not supported in certain database engine. The second option should be more viable because of the CREATE TABLE statement is very much following the ANSI-SQL standard with some minor database dependant feature.

In this article, we will focus in parsing the CREATE TABLE statement with RegEx using JavaScript.

Ways of parsing text

Let's see how many ways of parsing text:

  1. Munching the CREATE TABLE statement character by character

    The naive way of parsing text is to write a program to check one character after another and then look for the "terminator" (i.e., a symbol that mark the end of the current section or statement). Parsing text in this way is error prone and it will take a lot efforts in guessing and testing.

  2. Using RegEx

    With RegEx, it requires to find the pattern in the text and then come out with the expression. Luckily, SQL syntax is well defined and writing the expression should be easier.

    The good thing about this method is that you are focusing on finding a better expression instead of deciding how many characters to be read.

Here it goes

We are going to develop the parser using JavaScript which runs in Node.js and it parses SQLite CREATE TABLE statement. Let's create a .js file, copy the codes and run it in Node.js.


let sql = `   create table if not exists tb_sales (
    id integer primary   key
    , dt date
    , prod_id integer   not   null
    , qty integer default (0) not null
    , unit_price   real
    , amt real 
    , doc_no text null
, cust_code text

, modified_on datetime default current_timestamp,
modified_by text
);
`;

let result;

// The regex for each component:
let rg_tb = /(create\s+table\s+if\s+not\s+exists|create\s+table)\s(?<tb>.*)\s*\(/gim;
let rg_fld = /(\w+\s+text.*|\w+\s+varchar.*|\w+\s+char.*|\w+\s+real.*|\w+\s+float.*|\w+\s+integer.*|\w+\s+int.*|\w+\s+datetime.*|\w+\s+date.*)/gim;
let rg_fld2 = /(?<fname>\w+)\s+(?<ftype>\w+)(?<fattr>.*)/gi;
let rg_not_null = /not\s+null/i
let rg_pk = /primary\s+key/i
let rg_fld_def = /default\s(.+)/gi

// look for table name
result = rg_tb.exec(sql);
console.log('TABLE NAME:', result.groups.tb);

let fld_list = [];

while ((result = rg_fld.exec(sql)) != null) {
    let f = result[0];

    //reset
    rg_fld2.lastIndex = 0;
    let fld_def = rg_fld2.exec(f);

    // remove the field definition terminator.
    let attr = fld_def.groups.fattr.replace(',', '').trim();

    // look for NOT NULL.
    let nullable = !rg_not_null.test(attr);

    // remove NOT NULL.
    let attr2 = attr.replace(rg_not_null, '');

    // look for PRIMARY KEY
    let is_pk = rg_pk.test(attr2);

    // look for DEFAULT
    let def = rg_fld_def.exec(attr2);
    if (def && def.length > 0) {
        def = def[1].trim();
    }
    else {
        def = null;
    }

    // append to the arr
    fld_list.push({
        name: fld_def.groups.fname.trim(),
        type: fld_def.groups.ftype.trim(),
        nullable: nullable,
        pk: is_pk,
        def: def,
        //attr0: attr
    });
}

console.table(fld_list);

Code explains:

  • The CREATE TABLE statement in the above code has a few different formatting, some extra spaces, the comma (field definition terminator) is not always at the beginning of the line and extra blank line. Our target is to make it harder to parse so that we can find out how robust is our expressions.

  • Below the CREATE TABLE statement, we are declaring all the necessary RegEx for each 'section' in the statement. The expression includes,

    • rg_tb - this expression searches for 'CREATE TABLE' and extract the table name.
    • rg_fld - this expression searches for the field definition. The caveat of this expression is that it is able to extract the field definition on per line basis. If any one line contains more than one field, then, it won't work properly.
    • rg_fld2 - this expression is for extracting the field name, data type and other attributes. The most important is that we must set 'rg_fld2.lastIndex=0' before we call rg_fld2.exec(f). This is because after exec(), the lastIndex will be set. The next call to exec() will continue from where it was earlier.
    • rg_not_null - this expression is for detecting NOT NULL keyword. By definition, all fields are nullable (NULL keyword is optional). But for non-nullable field, NOT NULL keyword must exist. This is the reason why we are detecting NOT NULL.
    • rg_pk - this expression is for look for PRIMARY KEY keyword that appear in the field attribute section. In case you have a composite key, this expression will not work properly.
    • rg_fld_def - this expression is to extract the DEFAULT attribute.
  • The result of the above code looks like this:

TABLE NAME: tb_sales
┌─────────┬───────────────┬────────────┬──────────┬───────┬─────────────────────┐
│ (index) │     name      │    type    │ nullable │  pk   │         def         │
├─────────┼───────────────┼────────────┼──────────┼───────┼─────────────────────┤
│    0    │     'id'      │ 'integer'  │   true   │ true  │        null         │
│    1    │     'dt'      │   'date'   │   true   │ false │        null         │
│    2    │   'prod_id'   │ 'integer'  │  false   │ false │        null         │
│    3    │     'qty'     │ 'integer'  │  false   │ false │        '(0)'        │
│    4    │ 'unit_price'  │   'real'   │   true   │ false │        null         │
│    5    │     'amt'     │   'real'   │   true   │ false │        null         │
│    6    │   'doc_no'    │   'text'   │   true   │ false │        null         │
│    7    │  'cust_code'  │   'text'   │   true   │ false │        null         │
│    8    │ 'modified_on' │ 'datetime' │   true   │ false │ 'current_timestamp' │
│    9    │ 'modified_by' │   'text'   │   true   │ false │        null         │
└─────────┴───────────────┴────────────┴──────────┴───────┴─────────────────────┘

So, everything looks good except that the rg_fld is not able to extract multiple fields that appears on a single line. Let's try to solve it by adjusting the expression rg_fld for more than one field definition in the same line. Consider that the paid_amt real appear next to the qty real. The code will look like this:

let sql2 = `unit_price   real
    , amt real 
, qty real, paid_amt real	 
`;

let r2;
while ((r2 = rg_fld.exec(sql2)) != null) {
    console.log('=>', r2[0])
}

And the result shown below is not what we are expecting. The problem is that qty real, paid_amt real was treated as one field instead of two fields.

=> unit_price   real
=> amt real 
=> qty real, paid_amt real

Let's tweak the expression by declaring a new one so that it look for the field definition with terminator (comma symbol) or without the terminator (i.e., the last field definition does not have terminator).

let sql2 = `unit_price   real
    , amt real 
, qty real, paid_amt real	 
`;

let r2;

let t2 = /\w+\s+real\s*,|\w+\s+real.*/gim
while ((r2 = t2.exec(sql2)) != null) {
    console.log('=>', r2[0])
}

And the result is close to what we are looking for and the comma was included in the result which we don't want.

=> unit_price   real
    ,
=> amt real 
,
=> qty real,
=> paid_amt real

Now, let's remove the comma and next line symbol manually.

let sql2 = `unit_price   real
    , amt real 
, qty real, paid_amt real	 
`;

let r2;

let t2 = /\w+\s+real\s*,|\w+\s+real.*/gim
while ((r2 = t2.exec(sql2)) != null) {
    let f = r2[0].replace('\r', '').replace('\n', '').replace(',', '');    
    console.log('=>', f)
}

The following result is clean and ready to apply rg_fld2 expression.

=> unit_price   real    
=> amt real 
=> qty real
=> paid_amt real

By the way, this parser has not been completed. You may have to figure out how to incorporate t2 expression into rg_fld.

Useful links

You will find the CREATE TABLE statement in detail in the following link,

https://sqlite.org/lang_createtable.html

There are many sites in the Internet which explains RegEx in details. Here's two of it which I find very helpful.

https://flaviocopes.com/javascript-regular-expressions/

https://javascript.info/regular-expressions

To test the regular expression that you have written, you may rely on this tool,

https://regexr.com/

Short note for RegEx

  • \w - ASCII alphanumeric.
  • "." - matches any character except newline.
  • \s - white space
  • "+" - one or more occurrence.
  • "*" - zero or more occurrence.
  • "(..)" - the bracket is for creating a 'group'. Please refers to rg_fld2 variable in the above code.

Friday, 13 September 2019

Vertical scrollbar causing column width different between header and cell in spreadsheet

Vertical scrollbar causing column width different between header and cell in spreadsheet. This is a common issue when the HTML page that display on a computer. Whereas, it won't have issue if it is on Android or iOS because the scrollbar is floating on top of the content.

To explain this issue, let says we have a spreadsheet (or "table" view) built with DIV which looks like below:

<style>
.header {
  display:flex;
  flex-direction:row;
}

.item {
  display:flex;
  flex-direction:row;
}

.c1 {
  width:50%;
}

.item-list {
  overflow: hidden;
  overflow-y: scroll;
  height: 80px;
}
</style>

<div class="my-table">

<div class="header">
  <div class="c1">Date</div>
  <div class="c1">Ref #</div> 
</div> 
<div class="item-list">
  <div class="item">
    <div class="c1">1 Sep 2019</div>
    <div class="c1">0001</div> 
  </div>
  <div class="item">
    <div class="c1">1 Sep 2019</div>
    <div class="c1">0002</div> 
  </div>
  <div class="item">
    <div class="c1">1 Sep 2019</div>
    <div class="c1">0003</div> 
  </div>
  <div class="item">
    <div class="c1">1 Sep 2019</div>
    <div class="c1">0004</div>
  </div>
  <div class="item">
    <div class="c1">1 Sep 2019</div>
    <div class="c1">0005</div>
  </div>
</div>
</div>


With the CSS for item-list, a vertical scrollbar will appear once the number of items increased.

The problem with this CSS is that the date and reference number column in the header will have a different width if you compared it with the column in the items.

To solve this, you need to use this style:

(1) you have to use "sticky:

.header{
  position: sticky;
  top: 0;
  left: 0;
  background-color: white;
}

(2) Move the overflow attribute for "item-list" to "my-table":

.my-table {
  position:relative;
  overflow: hidden;
  overflow-y: scroll;
  height: 80px;
}

And that solves the column width issue when there is a vertical scrollbar.


Monday, 27 May 2019

Server-sent event

Server sent event supports the server push from server to browser only.

In Javascript

var sse = null;

function runNow() {
    if (!sse) {
        sse = new EventSource('/sse_stream');

        sse.addEventListener('message',
            function (e) {
                document.getElementById('lbl1').innerText = e.data;
            });

    }
}

function stop() {
    if (sse) {
        sse.close();
        sse = null;
    }
}


Reference:

https://www.html5rocks.com/en/tutorials/eventsource/basics/
https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events

Web worker

Summary of web worker

  • Every web worker is a thread on it's own!! Instantiate too many worker instances might degrade your computer performance.
  • The worker will not be able to access the HTML page.
  • All data exchange/communication must be done through postMessage().
  • With web worker, the communication between the HTML/JS and the worker will be more complex.
  • Good thing is that process in the web worker won't affect the HTML page update/redraw.


The JavaScript to that create a web worker 

This script will be added to the HTML page like a normal JS file.

var w;
var lbl1;

function onStart() {
    if (w) {
        return;
    }

    w = new Worker('demo_worker.js');

    lbl1 = $('#lbl1');

    w.onmessage = function (e) {
        if (e.data.t == 'number') {
            lbl1.text('running value=' + e.data.i);
        }
        else {
            console.log(e.data)
        }
    }
}


function onStop() {
    if (w != null) {
        w.terminate();
        w = null;
    }
}

function onSayHelo() {
    if (w) {
        w.postMessage('helo from tab...' + (new Date()));
    }
}

The web worker Javascript file "demo_worker.js"

  • This file will run in it's own thread. 
  • All data exchange must be done through postMessage().
  • The worker is able to execute AJAX call.
  • The worker will not be able to access the HTML page.

var i = 0;

function timedCount() {
    i = i + 1;

    postMessage({
        t: 'number',
        i: '<b>'+ i + '</b>',
        ts: new Date()
    });

    setTimeout("timedCount()", 1000);   
}

function testAjax() {
    var xhr = new XMLHttpRequest();

    xhr.onreadystatechange = function (e) {
        if (xhr.readyState == 4 && xhr.status == 200) {
            // upon loaded succesfully, pass it to the HTML page.
            postMessage({
                t: 'data',
                data: xhr.response
            });
        }
    }

    xhr.open('get', 'test-data.txt');
    xhr.send();   
}

// message from the HTML page.
self.onmessage = function (e) {
    console.log('worker...', e.data)
}

timedCount();
setTimeout(testAjax, 3000)

Monday, 29 October 2018

Print the call stack in console

Here's the code in printing the call stack in console:

var stack = new Error().stack;   
console.log(stack);

Tuesday, 21 November 2017

CSS fill pattern

To create the fill pattern image, check out the following website:

     http://www.patternify.com/

/*dots*/

.inactive_cell2 {
    background-image: url("data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAQAAAAECAYAAACp8Z5+AAAAEklEQVQImWNgYGD4z0AswK4SAFXuAf8EPy+xAAAAAElFTkSuQmCC");
    background-repeat: repeat;
}

To create strips without using any image data,

    https://css-tricks.com/stripes-css/

/*stripes*/
.inactive_cell {   
    background: repeating-linear-gradient( 48deg, #fefefe, #e0e0e1 5px, #eaeaea 5px, #eaeaea 10px );
}

We are moving

We are moving this blog to our new blog site: https://ciysys.com/blog/nodejs.htm