Quantcast
Channel: Grassroots Oracle
Viewing all articles
Browse latest Browse all 405

Visualising SQL Analytics Rolling Count with OracleJET in APEX

$
0
0
Back in around 2005, before the time of smart phones, I had some data.

I can't remember what the data was, but I was told that for it to be valid, it should roughly form a bell curve.

Sure, I'm sure I could have aggregated it, exported it to Excel, and plotted it to a graph, but this was 2005. There was no SQL Developer where I could copy & paste the results directly into Excel. No panel in Excel with a button that plots that data in one click. Back in my day, thing like that were a little bit more work ;p

This was a time when I enjoyed SQL*Plus, and I ended up writing a query to nest STDDEV within RPAD so it would plot asterisks down the page, in a vertical bell. In was a work of art. I'm sure I've still got the test case labelled in bowels of Gmail.

Fast forward to today, where I have data I know starts as a daily bell curve, but I'm applying a rolling 24 hour total over that daily resonance. How does that affect the wave?

I use today's fancy tools to plot the said data, and it doesn't turn out quite how I pictured.

This is my query below plotting real data using OracleJET charts in Oracle APEX.

The underlying data still formed a bell, more detail in a future post, but smoothing this out to a rolling 24 hour total - what should that look like? Why is there a flat line during no activity, instead of a dip?
No doubt there is a math-geek corner of the internet that explores this pattern, but I still wasn't confident with the query.

Real data is awesome. The trouble is, it can be real noisy. If you're trying to reconcile your results, it's often worth building a simple test case. As a regular forum participant, I'm such an advocate for this. If the person asking the question took the time to write the DDL for the test case, I reckon half the questions wouldn't need asking. The other half will have an executable test case ready for any responders.
How's this - in a response to a question I asked (with a test case, of course), Andrew Sayer responded with a LiveSQL demonstration! AskTom questions that have a livesql.oracle.com example are generally prioritised.

So, what I have here is a real simple table, with some "logs" at specific times, and a little variance. It's quite similar to the setup you might see for SQL exercises in the Oracle Dev Gym.
-- create a simple table of dates
create table range_eg (rd date);
-- ready for adjustment
truncate table range_eg;
-- insert sample hits at a regular interval, with more frequency around noon.
insert into range_eg values (timestamp '2019-03-26 09:30:00');
insert into range_eg values (timestamp '2019-03-26 10:30:00');
insert into range_eg values (timestamp '2019-03-26 10:45:00');
insert into range_eg values (timestamp '2019-03-26 11:30:00');
insert into range_eg values (timestamp '2019-03-26 11:45:00');
insert into range_eg values (timestamp '2019-03-26 12:30:00');
insert into range_eg values (timestamp '2019-03-26 12:45:00');
insert into range_eg values (timestamp '2019-03-26 13:30:00');
insert into range_eg values (timestamp '2019-03-26 13:45:00');
insert into range_eg values (timestamp '2019-03-26 14:30:00');
insert into range_eg values (timestamp '2019-03-26 15:30:00');
insert into range_eg values (timestamp '2019-03-26 16:30:00');

-- I took a few hits out the middle, so the bell would not be quite the same shape
insert into range_eg values (timestamp '2019-03-27 09:30:00');
insert into range_eg values (timestamp '2019-03-27 10:30:00');
insert into range_eg values (timestamp '2019-03-27 10:45:00');
insert into range_eg values (timestamp '2019-03-27 11:30:00');
--insert into range_eg values (timestamp '2019-03-27 11:45:00');
--insert into range_eg values (timestamp '2019-03-27 12:30:00');
--insert into range_eg values (timestamp '2019-03-27 12:45:00');
insert into range_eg values (timestamp '2019-03-27 13:30:00');
insert into range_eg values (timestamp '2019-03-27 13:45:00');
insert into range_eg values (timestamp '2019-03-27 14:30:00');
insert into range_eg values (timestamp '2019-03-27 15:30:00');
insert into range_eg values (timestamp '2019-03-27 16:30:00');

-- I added a small day at the end, to see how it may affect rolling total drift to zero
insert into range_eg values (timestamp '2019-03-28 11:30:00');
insert into range_eg values (timestamp '2019-03-28 12:30:00');
insert into range_eg values (timestamp '2019-03-28 13:30:00');
Then used an analytical function to cumulate the running count over a 1 day interval.
The second column, with accompanying union, was an attempt to see what happened if I filled out each hour with a record, to see if that influenced the overnight dip in my real data. It also let me see the cumulative count taper off as the data ended.
select * from (
select rd
,sum(sum(c)) over ( order by rd
range between NUMTODSINTERVAL(1,'day') preceding
and current row) last_24hr_running
,sum(count(*)) over ( order by rd
range between NUMTODSINTERVAL(1,'day') preceding
and current row) last_24hr_cnt
from
(select 1 c,rd from range_eg
union all
-- I even tried to pad out records to see
-- if the line would vary during the lulls
select null c, trunc(sysdate,'hh24')+1-rownum/24
from dual
connect by level< 24*3
)
group by (rd)
-- must ignore first day in inline query
-- as cumulative range will be skewed
) --where request_date > sysdate-2
order by 2;

Data checks out, SQL checks out - does the graph check out?

Test case plotted

I think so, but given the context of the data, I don't think this line graph is appropriate, particularly around the period of no activity. More on that in the next post.


Viewing all articles
Browse latest Browse all 405

Trending Articles