MATCH_RECOGNIZE - Greedy vs. Reluctant Quantifiers

View All Tutorials Login to Run Tutorial

Tutorial MATCH_RECOGNIZE - Greedy vs. Reluctant Quantifiers
Description In this tutorial we will review the concepts of greedy and reluctant quantifiers. Pattern quantifiers that attempt to match as many instances as possible of the token to which they are applied are referred to as greedy, i.e. they will gobble up as many rows as possible. This is the default mode of operation. The examples in this post use the built-in measures to help show the difference between greedy and reluctant matching.
Tags PATTERN MATCHING, MATCH_RECOGNIZE,
Area SQL Analytics
Contributor Keith Laker (Oracle)
Created Wednesday October 12, 2016
Modules 7

Module 1
Overview
Finding patterns with SQL

Recognizing patterns in a sequence of rows has been a capability that was widely desired, but not really possible with SQL until now. There were many workarounds, but these were difficult to write, hard to understand, and inefficient to execute.

The aim of this tutorial is to explain the difference greedy and reluctant quantifiers. Pattern quantifiers that attempt to match as many instances as possible of the token to which they are applied are referred to as greedy, i.e. they will gobble up as many rows as possible. This is the default mode of operation.
Module 2
Reset my tutorial environment
Just in case...

If at anytime you need to restart this tutorial then simply run the following statement to reset your environment
```
DROP TABLE ticker PURGE;
```

Module 3

Create my dataset

Creating our ticker data set

First step is to setup our data table and then populate it with data

CREATE TABLE ticker (SYMBOL VARCHAR2(10), tstamp DATE, price NUMBER);

BEGIN
INSERT INTO ticker VALUES('ACME', '01-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '02-Apr-11', 17);
INSERT INTO ticker VALUES('ACME', '03-Apr-11', 19);
INSERT INTO ticker VALUES('ACME', '04-Apr-11', 21);
INSERT INTO ticker VALUES('ACME', '05-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '06-Apr-11', 12);
INSERT INTO ticker VALUES('ACME', '07-Apr-11', 15);
INSERT INTO ticker VALUES('ACME', '08-Apr-11', 20);
INSERT INTO ticker VALUES('ACME', '09-Apr-11', 24);
INSERT INTO ticker VALUES('ACME', '10-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '11-Apr-11', 19);
INSERT INTO ticker VALUES('ACME', '12-Apr-11', 15);
INSERT INTO ticker VALUES('ACME', '13-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '14-Apr-11', 25);
INSERT INTO ticker VALUES('ACME', '15-Apr-11', 14);
INSERT INTO ticker VALUES('ACME', '16-Apr-11', 14);
INSERT INTO ticker VALUES('ACME', '17-Apr-11', 14);
INSERT INTO ticker VALUES('ACME', '18-Apr-11', 24);
INSERT INTO ticker VALUES('ACME', '19-Apr-11', 23);
INSERT INTO ticker VALUES('ACME', '20-Apr-11', 22);

INSERT INTO ticker VALUES('GLOBEX', '01-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '02-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '03-Apr-11', 13);
INSERT INTO ticker VALUES('GLOBEX', '04-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '05-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '06-Apr-11', 10);
INSERT INTO ticker VALUES('GLOBEX', '07-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '08-Apr-11', 8);
INSERT INTO ticker VALUES('GLOBEX', '09-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '10-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '11-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '12-Apr-11', 9);
INSERT INTO ticker VALUES('GLOBEX', '13-Apr-11', 10);
INSERT INTO ticker VALUES('GLOBEX', '14-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '15-Apr-11', 12);
INSERT INTO ticker VALUES('GLOBEX', '16-Apr-11', 11);
INSERT INTO ticker VALUES('GLOBEX', '17-Apr-11', 8);
INSERT INTO ticker VALUES('GLOBEX', '18-Apr-11', 7);
INSERT INTO ticker VALUES('GLOBEX', '19-Apr-11', 5);
INSERT INTO ticker VALUES('GLOBEX', '20-Apr-11', 3);

INSERT INTO ticker VALUES('OSCORP', '01-Apr-11', 22);
INSERT INTO ticker VALUES('OSCORP', '02-Apr-11', 22);
INSERT INTO ticker VALUES('OSCORP', '03-Apr-11', 19);
INSERT INTO ticker VALUES('OSCORP', '04-Apr-11', 18);
INSERT INTO ticker VALUES('OSCORP', '05-Apr-11', 17);
INSERT INTO ticker VALUES('OSCORP', '06-Apr-11', 20);
INSERT INTO ticker VALUES('OSCORP', '07-Apr-11', 17);
INSERT INTO ticker VALUES('OSCORP', '08-Apr-11', 20);
INSERT INTO ticker VALUES('OSCORP', '09-Apr-11', 16);
INSERT INTO ticker VALUES('OSCORP', '10-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '11-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '12-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '13-Apr-11', 11);
INSERT INTO ticker VALUES('OSCORP', '14-Apr-11', 15);
INSERT INTO ticker VALUES('OSCORP', '15-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '16-Apr-11', 16);
INSERT INTO ticker VALUES('OSCORP', '17-Apr-11', 14);
INSERT INTO ticker VALUES('OSCORP', '18-Apr-11', 12);
INSERT INTO ticker VALUES('OSCORP', '19-Apr-11', 11);
INSERT INTO ticker VALUES('OSCORP', '20-Apr-11', 9);

commit;

END;

Module 4
Viewing my data
Quick review of our ticker data

Now let's check to see how many rows are in our dataset
```
SELECT count(*) FROM ticker;
SELECT symbol, min(tstamp), max(tstamp), count(*) FROM ticker GROUP BY symbol;
```
You should have 60 rows of data spread across three symbols (ACME, GLOBEX, OSCORP) with 20 rows of data for each ticker symbol. Our ticker data for each symbol starts on April 1 and ends on April 20.

You can view the full data set using the following code:
```
SELECT * FROM ticker ORDER BY symbol, tstamp;
```
Compared to previous MATCH_RECOGNIZE tutorials, here there is a small change to the data for ticker symbol ACME. The price on Apr-16 has been changed to 14 so that there are now three consecutive rows (15-Apr, 16-Apr and 17-Apr) with the same value (14) for price.
```
SELECT * FROM ticker 
WHERE symbol = 'ACME' 
ORDER BY tstamp;
```

Module 5

Using Greedy Quantifiers

Now we have ties in our ticker data - watch out for greedy quantifiers

If we now search for V-shaped patterns and define the down and up variables so that there is potential for ties by using <= and >= in their respective definitions then we can explore this idea of greedy quantifiers by using the following code:

SELECT symbol, tstamp, price, mn, pattern, 
first_down, first_price, last_up, last_price
FROM ticker MATCH_RECOGNIZE ( 
  PARTITION BY symbol ORDER BY tstamp 
  MEASURES MATCH_NUMBER() AS mn,
           CLASSIFIER() as pattern,
           FIRST(strt.tstamp) AS first_down,
           FIRST(strt.price) as first_price,
           LAST(up.tstamp) AS last_up,
           LAST(up.price) as last_price
  ALL ROWS PER MATCH 
  PATTERN (strt down+ up+) 
  DEFINE 
        down AS (price <= PREV(price)),
        up AS (price >= PREV(price))
)
WHERE symbol = 'ACME'
ORDER BY symbol, tstamp;

Note that on April 17 we match the row to the down variable because that variable is being greedy. It could have been matched to the 'up' variable because the price is actually equal to the previous row but 'down' took precedence.

Module 6

Using Reluctant Quantifiers

What if we change down and make it reluctant by using the ? quantifier:

SELECT symbol, tstamp, price, mn, pattern, 
first_down, first_price, last_up, last_price
FROM ticker MATCH_RECOGNIZE ( 
  PARTITION BY symbol ORDER BY tstamp 
  MEASURES MATCH_NUMBER() AS mn,
           CLASSIFIER() as pattern,
           FIRST(strt.tstamp) AS first_down,
           FIRST(strt.price) as first_price,
           LAST(up.tstamp) AS last_up,
           LAST(up.price) as last_price
  ALL ROWS PER MATCH 
  PATTERN (strt down+? up+) 
  DEFINE 
        down AS (price <= PREV(price)),
        up AS (price >= PREV(price))
)
WHERE symbol = 'ACME'
ORDER BY symbol, tstamp;

Now you can see that because down is reluctant it matches as few occurrences as possible and where a row could be mapped to either down or up it is up that takes precedence.

Module 7
Why is this important?
Obviously this can impact the way that rows are matched to your pattern. Therefore, you need to think carefully about how you are going to manage the situation where rows could be matched to more than one variable - do you have preference for which variable wins?

If you have measures that are tied to specific variables then these will be impacted by whether the variable is greedy or reluctant. Obvious examples are where you are performing some sort of calculation such as averages, sums or counts.

Therefore, when you are constructing your pattern please think very carefully about how greedy you want your matching process to be as it processes your data set.

Additional Information

Database on OTN SQL and PL/SQL Discussion forums
Oracle Database
Download Oracle Database

MATCH_RECOGNIZE - Greedy vs. Reluctant Quantifiers

Overview

Finding patterns with SQL

Reset my tutorial environment

Just in case...

Create my dataset

Creating our ticker data set

Viewing my data

Quick review of our ticker data

Using Greedy Quantifiers

Now we have ties in our ticker data - watch out for greedy quantifiers

Using Reluctant Quantifiers

Why is this important?

Links

Additional Information