Saving You 15 Minutes: MGMT$JOB_STEP_HISTORY and Security

This is a very handy view for determining what steps occured on a particular job, you can use for troubleshooting job failures.  More importantly though this is a clearly readable column from a tool like SQL Developer (its a clob) so if you are using OS / or DB jobs in OEM which contain usernames and passwords either on specifc targets or across the estate be mindful that these details will remain in this view for sometime.  I have jobs going back to 7th August last year, I will keep an eye on this date to see if these are purged on a daily or weekly basis.

TweetMe: pbedba

 

 

 

 

 

Saving You 15 Minutes: SVC SSDs Flash v’s Exadata Gathering The Stats Part 3

Now we have the statistics pre and post move to SSDs we can compare those statistics to the Exadata queries.  I have taken two queries and looked at the execution times.

EXA_SSDspng4

So the results show that there is an improvement in query performance when you move to SSDs, however what this really demonstrates is the Exadata query offloading.  Looking into the details of the queries in DBA_HIST_SNAPSHOT I can see both of these queries were looking to read around 150GB of data to obtain their results.  Those same queries in Exadata had around 98% to 99% query off-loading capability, meaning they were returning around 2 to 3 GB back to the compute node for processing.  The benefits of the query offloading is quite clear.  Looking at the statistics more generally the move to SSDs provided a general performance gain of a one fifth, which isn’t bad for a seamless bit of storage black magic, i.e. no outage and simple implementation.  However what is clear is the underlying I/O sub-system is still having to cart around huge amounts of data.  No matter how fast your disks are there is only going to be a certain amount of performance benefit using SSDs and it is probably worth quantifying the I/O profile of your system to understand the performance benefits of SSDs, i.e. slow I/O or lots of I/O or both!  The query offloading improved performance around 62 times…enough said!

Tweet Me: pbedba

 

Saving You 15 Minutes: SVC SSDs Flash v’s Exadata Gathering The Stats Part 2

So although the last query provided the data we required, it was lacking in two areas:

  • It wasn’t possible to compare pre / post times as the results would be different depending on what we selected to compare, i.e. we needed averages
  • The query returned data that we didn’t require, i.e. queries that took over a certain amount of time but were executed either before the move to SSDs or after, not both.

So to resolve these two issues I am going to use some analytic functions.  Again, I would stress this is SQL written to obtain some information for a one-off, you could probably re-write this, however as a form of building blocks it is probably easier to understand this way.

This is the original query:

select snap_id, 'POST', SQL_ID,
round((elapsed_time_total/1000000),2) ETL,
rows_processed_total, rows_processed_total/round((elapsed_time_total/1000000),2) rows_per_sec, round((cpu_time_total/1000000),2),
round((iowait_total/1000000),2) 
from dba_hist_sqlstat where sql_id in
      (
       select sql_id from (
                    select snap_id,
                    sql_id,
                    plan_hash_value, round((elapsed_time_total/1000000),2), rows_processed_total
                    from dba_hist_sqlstat
                    where executions_total=1
                    and snap_id < 59045
                    and rows_processed_total > 0
                    and round((elapsed_time_total/1000000),2) > 320
                          )
      )
and snap_id > 59165
and rows_processed_total > 0
and executions_total=1
union all
select snap_id, 'PRE', SQL_ID, round((elapsed_time_total/1000000),2) ETL, rows_processed_total, rows_processed_total/round((elapsed_time_total/1000000),2) rows_per_sec, round((cpu_time_total/1000000),2), round((iowait_total/1000000),2)  from dba_hist_sqlstat
where executions_total=1
and snap_id < 59045
and rows_processed_total > 0
and round((elapsed_time_total/1000000),2) > 320
order by SQL_ID, SNAP_ID

The way we will build this up to get the data we want is to use a number of analytical wraps.

First Wrap: Getting the Averages

select 
distinct sql_id, 
when, 
avg(ETL) OVER (partition by WHEN, SQL_ID) as AVG_ELAPSED_TIME, 
avg(rows_processed_total) OVER (partition by WHEN, SQL_ID) as
AVG_ROWS_PROCESSED, avg(rows_per_sec) OVER (partition by WHEN, SQL_ID)
AVG_ROWS_PSEC, avg(cpu_time_total) OVER (partition by WHEN, SQL_ID)
AVG_CPU_TIME, avg(iowait_total) OVER (partition by WHEN, SQL_ID) AVG_IO_TIME 
FROM (original query)

We are using the avg analytic function.  The average function partitions over WHEN and SQL_ID.  To put this in English we want the averages for SQL, PRE SSDs and POST SSDs.  The hard coded PRE and POST columns in the original query are given an alias WHEN.  So when this query selects the data it can average statistics for PRE SSDs and POST SSDs times.  The output looks like this:

EXA_SSDspng1

Second Wrap: Getting only the required data

Removing the required data from a query once the data has already been retrieved is BAD.  Hence the original comment about a re-write.  However based on what we have got I want to get rid of any SQL which hasn’t been executed in the PRE and POST phase.

select 
sql_id, 
when, 
count(SQL_ID) OVER (partition by SQL_ID) SQL_COUNT, 
AVG_ELAPSED_TIME,
AVG_ROWS_PROCESSED, 
AVG_ROWS_PSEC,  
AVG_CPU_TIME, 
AVG_IO_TIME from (firstwrap(original query))

Based on the first wrap query I know that a SQL_ID executed in the PRE and POST stage will appear twice.  If I put the count function, as an analytical function, into the first query it would count the all SQL_IDs from the original query, hence it wouldn’t give me the correct output.  By placing the count here I only get the count of ONE or TWO as I have already done some aggregation.  I am therefore only interested in queries with a TWO count, i.e. appeared twice.

EXA_SSDspng2

Third Wrap: Final query

select * from 
(secondwrap(firstwrap(originalquery))
where SQL_COUNT=2

So this is what the final output looks like. It provides exactly the information we require.  The next step is to look at the results!

EXA_SSDspng3

Making my old HP OVM Server go a whole lot faster

I’m a massive believer in getting good use out of old hardware which although may be not great for production usage can make a fantastic piece of dev/test or training equipment especially when combined with Oracle VM. We had a couple of HP Servers sitting around doing very little but after combining them into 1 I had a nice little server with lots of cores, lots of RAM and lots of disks. Not cutting edge but good enough and more importantly the ability to run 6 x 2 node RAC clusters for internal training purposes.

Setting up OVM Server doesn’t take long at all and it recognised the RAID drive I build using the RAID controller in the server for the OVS repository with no issues at all. So I went about creating some templates for the training I had setup with some of the team. For this kind of environment where having lots of space is more important than performance I opt for RAID 5. However, the performance was really, really slow. Think taking about 2 hours to copy the oracle home from node 1 to node 2. Even RAID 5 shouldn’t be this slow.

So, I found a working HP Array Configuration Utility in Version 9.40.12.0 which just installed into OVM Server with a simple ‘rpm -ivh’ and fired it up:

=> ctrl all show detail

Smart Array P410 in Slot 1
Bus Interface: PCI
Slot: 1
RAID 6 (ADG) Status: Disabled
Controller Status: OK
Hardware Revision: C
Firmware Version: 2.74
Rebuild Priority: Medium
Expand Priority: Medium
Surface Scan Delay: 15 secs
Surface Scan Mode: Idle
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: OK
Cache Ratio: 100% Read / 0% Write
Drive Write Cache: Disabled
Total Cache Size: 256 MB
Total Cache Memory Available: 144 MB
No-Battery Write Cache: Disabled
Battery/Capacitor Count: 0
SATA NCQ Supported: True

So I could see that all the cache was for Read, Drive Write Cache was disabled and No-battery Write Cache was also disabled (probably because I had not batteries installed). Basically nothing there going to help my write speed in the slightest. As it’s a server for test and training then I want as much performance as possible and as much storage as possible. If I lose power and lose some data then to be honest I’m not really bothered. My templates will still be there, static and intact and if the worst case scenario came to be then installing OVM Server again is not much of a hardship.

So, full steam ahead and off with the following commands to turn on all the caching possible.

ctrl slot=1 modify nbwc=enable
ctrl slot=1 modify dwc=enable forced
ctrl slot=1 modify cacheratio=25/75

Which led to the following:

=> ctrl all show detail

Smart Array P410 in Slot 1
Bus Interface: PCI
Slot: 1
RAID 6 (ADG) Status: Disabled
Controller Status: OK
Hardware Revision: C
Firmware Version: 2.74
Rebuild Priority: Medium
Expand Priority: Medium
Surface Scan Delay: 15 secs
Surface Scan Mode: Idle
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 0 secs
Cache Board Present: True
Cache Status: OK
Cache Ratio: 25% Read / 75% Write
Drive Write Cache: Enabled
Total Cache Size: 256 MB
Total Cache Memory Available: 144 MB
No-Battery Write Cache: Enabled
Battery/Capacitor Count: 0
SATA NCQ Supported: True

The outcome, well, copying the DB home from node 1 to node 2 took a total of 9 minutes and not hours. Quite a massive improvement and shows just how important caching is. I also found a working version of the HP Array Configuration Utility for OVM Server and I now have a super fast training environment which came from combining a couple of old servers and a little bit of time.

Saving You 15 Minutes: SVC SSDs Flash v’s Exadata Gathering The Stats Part 1

So following on from the previous blog I now need to start gathering some stats to compare DB performance pre and post SSDs.  Firstly I’m going to define my requirement:

Look for sql queries which took over five minutes to execute prior to the shift to SSDs.  Then look for those same queries after the shift to SSDs and compare the time, if you can breakdown the elapsed time by CPU and IO that would be good.  Also double check the rows processed to ensure we are looking at like for like workloads. 

Breaking down the above paragraph into the various steps will help us build the SQL statement.  I don’t recommend this as a way to right efficent SQL (although it did run pretty quick anyway).  The breakdown is in bold below.

Look for sql queries which took over five minutes to execute prior to the shift to SSDs.

select snap_id, sql_id, plan_hash_value, round((elapsed_time_total/1000000),2), rows_processed_total
from dba_hist_sqlstat
where executions_total=1
and snap_id < 59045
and rows_processed_total > 0
and round((elapsed_time_total/1000000),2) > 320

To expand; we are looking at the total elapsed time for a single execution of a query which took over 5 minutes and executed only once.  This should make it easier for comparison purposes.  This first query is simple and will provide a building block.

Then look for those same queries after the shift to SSDs

select snap_id, 'POST', SQL_ID,
round((elapsed_time_total/1000000),2) ETL,
rows_processed_total, rows_processed_total/round((elapsed_time_total/1000000),2) rows_per_sec, round((cpu_time_total/1000000),2),
round((iowait_total/1000000),2) 
from dba_hist_sqlstat where sql_id in
      (
       select sql_id from (
           select snap_id,
           sql_id,
           plan_hash_value, 
           round((elapsed_time_total/1000000),2), rows_processed_total
             from dba_hist_sqlstat
             where executions_total=1
             and snap_id < 59045
             and rows_processed_total > 0
             and round((elapsed_time_total/1000000),2) > 320
                          )
      )
and snap_id  59165
and rows_processed_total > 0
and executions_total=1

So the original query is now a sub-query from DBA_HIST_SQLSTAT, it allows the outer query to pull back the same performance stats from those same IDs after the switch to SSDs, hence snap_id > 59165 in the outer query.

and compare the times 

select snap_id, 'POST', SQL_ID,
round((elapsed_time_total/1000000),2) ETL,
rows_processed_total, rows_processed_total/round((elapsed_time_total/1000000),2) rows_per_sec, round((cpu_time_total/1000000),2),
round((iowait_total/1000000),2) 
from dba_hist_sqlstat where sql_id in
      (
       select sql_id from (
           select snap_id,
           sql_id,
           plan_hash_value, 
           round((elapsed_time_total/1000000),2), rows_processed_total
             from dba_hist_sqlstat
             where executions_total=1
             and snap_id < 59045
             and rows_processed_total > 0
             and round((elapsed_time_total/1000000),2) > 320
                          )
      )
and snap_id  59165
and rows_processed_total > 0
and executions_total=1
union all
select snap_id, 'PRE', SQL_ID, round((elapsed_time_total/1000000),2) ETL, 
rows_processed_total, 
rows_processed_total/round((elapsed_time_total/1000000),2) rows_per_sec, 
round((cpu_time_total/1000000),2), round((iowait_total/1000000),2)  
from dba_hist_sqlstat
where executions_total=1
and snap_id < 59045
and rows_processed_total > 0
and round((elapsed_time_total/1000000),2) > 320
order by SQL_ID, SNAP_ID

So there could be a cleaner way to right this SQL but I just took the same original query and bolted it on with a union all.  So when I run the query these are my results:

EXA_SSDspng

As you can see we have various PRE execution times for this SQL and a single POST time,  depending on which figure we take it to will alter the results in terms of percentage improvement.  Also this is just a subset of what is brough back.  I guess therefore we need to apply some analytics to this to start looking at some pre and post averages to get a fairer picture.  This will also hopefully reduce the amount of data retrieved back and minimise any further analysis in tools like excel.

TweetMe: pbedba