Oracle, Exadata, Multitenant and related

UTL_HTTP host validation in Oracle 12.2

When using UTL_HTTP against https resources RFC 2818 specifies how the hostname should be validated.

In essence, one of the DNSNAMES as presented in the Subject Alternative Name (SAN) fields of the certificate must mach the hostname part of the URL used. As fallback, if such DNSNAME is not present, the CN of the certificate must match the host. If not, the error is

ERROR at line 1:
ORA-29273: HTTP request failed
ORA-06512: at “SYS.UTL_HTTP”, line 1501
ORA-24263: Certificate of the remote server does not match the target address.
ORA-06512: at “SYS.UTL_HTTP”, line 380
ORA-06512: at “SYS.UTL_HTTP”, line 1441
ORA-06512: at line 1

Before 12.2, UTL_HTTP did not do any hostname validation at all. This was introduced in 12.2, but unfortunately in an incompliant way: UTL_HTTP compares against the CN only, ignoring completely the SAN. As a kind of compensation an additional parameter HTTPS_HOST was introduced to the request, in which you can specify a hostname to be used instead of the one from the URL (see Doc ID 2275666.1).

This however does not help much – imagine two servers behind a load balancer serving the same URL but with different certificates (different CNs), only the same SAN (this is actually best-practise setup, certificates should not leave the server they were created on). Now the HTTPS_HOST doesn’t help, you can only specify one but the actual CN compared to switches between two different values.

Luckily there is help ! Since The July 2018 RU of 12.2, the use of the DNSNAME in the SAN is supported ! In a fully compliant way as it looks like – apart from that the CN is still used – even if the SAN is present. OK, not a major problem.

This SAN support seems nowhere documented, some interaction with Oracle Support revealed that it came with the patch for bug 26040483 (which is included in above RU), although it is not documented there either (the SNI support is something different).

How to use UTL_HTTP with SSL mutual authentication and external keys in 12c

We had to do what is in the title, and it proved not to be as straightforward as we thought. The tricky part is to get all the keys and certificates into the Oracle wallet in the right way. The usual tool to interact with wallets is orapki – but it doesnt seem to be possible to achieve this task just with orapki – as it allows to import certificates, but not external keys (i.e. keys which where not created with orapki from the start). There is a – undocumented – command import_private_key to orapki, but we never got this to work.

So what do we need for mutual authentication ?

our own private key – we assume its in a file called my_key.key
the associated public certificate for that key – my_cert.crt
the trust chain for this certificate – usually an intermediate and a root certificate in one file – my_chain.pem (this trust chain is not really needed for SSL in general – only the server you connect to needs it, but for some reason its required for Oracle wallets.
the trust chain of the servers certificate you are connecting to, but not the server certificate itself, we expect them in separate files and call them server_root.cer and server_intermediary. cer

Then, with openssl and orapki we do (tested on 12.2 April 18 RU)

openssl pkcs12 -export -out ewallet.p12 -inkey my_key.key -in my_cert.crt -certfile my_chain.pem
orapki wallet add -wallet . -trusted_cert -cert server_root.cer
orapki wallet add -wallet . -trusted_cert -cert server_intermediaty.cer

This is much easier than all kinds of other procedures we found (and does not involve keytool and jks files as seem elsewhere) – but works.

Few other things not directly related to the wallet. You need to get the ACLs right. If you don’t have them you get very helpful errors like “ORA-28860 Fatal SSL error”. What is needed apart from the standard connection acl is this:

 DBMS_NETWORK_ACL_ADMIN.ADD_PRIVILEGE('testacl','TESTUSER', true, 'use-client-certificates');
 DBMS_NETWORK_ACL_ADMIN.ASSIGN_WALLET_ACL('testacl','file:/mywalletdirectory/');

And finally – if you still have problems – UTL_HTTP tracing is your friend !

Tracing UTL_HTTP

UTL_HTTP can be quite tricky, in particular when using SSL. Errors are quite generic “Fatal SSl error” – so a detailed trace would be crucial. Luckily its exists – although badly documented.

ALTER SESSION SET EVENTS = '10937 TRACE NAME CONTEXT FOREVER, LEVEL 4';

Output is comparable to the one of java.net.debug=all in JAVA.

See Doc ID 2288646.1 for details – with other levels one gets output for ACL issues and for UTL_SMTP.

IORM performance

Following my previous article on IO resource management on Exadata (IORM) and in particular on its behaviour with smart scans on the flash cache, we came across the following:

While in the previous article we looked into smartscans which were not filtering out much and therefore eventually were (infiniband) bandwidth-bound rather than flash-io bound, this time its about the opposite case – smart scans on flash with a high filter rate on the storage cells. Does IORM work well and perform well when you have such queries concurrently from several CDBs ?

We did a very simple test. With below as test query run in a loop (test_large is again the 1 billion row table of dba_object copies we used in the previous article) :

 select /*+ parallel */ count(*) into i from test_large a;

we run either two of these queries from the same pdb (test1) or one each in different PDBs on different CDBs (test2) (using two different tables in test1 to be as close as possible to test2). IORM was configured with “auto” objective, CDBs and PDBs in them had all equal shares. So one would expect that in test2 both CDBs get the same amount of flash IOPS, and that the total IOPS in test1 is the same as in test2 (maybe a bit less in test1 if there are other limiting factors with all on one CDB.

Results were the following:

root ~]# dcli -g ./cell_group -l root "cellcli -e LIST METRICCURRENT WHERE name='PDB_FC_IO_BY_SEC' and metricvalue \>0 " 
cell1: PDB_FC_IO_BY_SEC CDB1.PDB1 11,407 MB/sec 
cell2: PDB_FC_IO_BY_SEC CDB1.PDB1 11,373 MB/sec 
cell3: PDB_FC_IO_BY_SEC CDB1.PDB1 11,218 MB/sec 
[root ~]# dcli -g ./cell_group -l root "cellcli -e LIST METRICCURRENT WHERE name='PDB_FC_IO_RQ_SEC' and metricvalue \>1000 " 
cell1: PDB_FC_IO_RQ_SEC CDB1.PDB1 174,145 IO/sec 
cell2: PDB_FC_IO_RQ_SEC CDB1.PDB1 173,613 IO/sec 
cell3: PDB_FC_IO_RQ_SEC CDB1.PDB1 171,248 IO/sec 
[root ~]# dcli -g ./cell_group -l root "cellcli -e LIST METRICCURRENT WHERE name='IORM_MODE' and metricvalue \>0 " 
cell1: IORM_MODE cell1 2 
cell2: IORM_MODE cell2 2 
cell3: IORM_MODE cell3 2 
[root ~]# dcli -g ./cell_group -l root "cellcli -e LIST METRICCURRENT WHERE name='CL_CPUT' and metricvalue \>0 " 
cell1: CL_CPUT cell1 94.0 % 
cell2: CL_CPUT cell2 94.5 % 
cell3: CL_CPUT cell3 98.1 % 

TEST 2: 

[root ~]# dcli -g ./cell_group -l root "cellcli -e LIST METRICCURRENT WHERE name='PDB_FC_IO_BY_SEC' and metricvalue \>0 " 
cell1: PDB_FC_IO_BY_SEC CDB1.PDB1 3,746 MB/sec 
cell1: PDB_FC_IO_BY_SEC CDB2.PDB2 3,768 MB/sec 
cell2: PDB_FC_IO_BY_SEC CDB1.PDB1 3,679 MB/sec 
cell2: PDB_FC_IO_BY_SEC CDB2.PDB2 3,707 MB/sec 
cell3: PDB_FC_IO_BY_SEC CDB1.PDB1 3,727 MB/sec 
cell3: PDB_FC_IO_BY_SEC CDB2.PDB2 3,704 MB/sec 
[root ~]# dcli -g ./cell_group -l root "cellcli -e LIST METRICCURRENT WHERE name='PDB_FC_IO_RQ_SEC' and metricvalue \>1000 " 
cell1: PDB_FC_IO_RQ_SEC CDB1.PDB1 54,478 IO/sec 
cell1: PDB_FC_IO_RQ_SEC CDB2.PDB2 53,703 IO/sec 
cell2: PDB_FC_IO_RQ_SEC CDB1.PDB1 54,698 IO/sec 
cell2: PDB_FC_IO_RQ_SEC CDB2.PDB2 54,493 IO/sec 
cell3: PDB_FC_IO_RQ_SEC CDB1.PDB1 53,315 IO/sec 
cell3: PDB_FC_IO_RQ_SEC CDB2.PDB2 52,469 IO/sec 
[root ~]# dcli -g ./cell_group -l root "cellcli -e LIST METRICCURRENT WHERE name='IORM_MODE' and metricvalue \>0 " 
cell1: IORM_MODE cell1 3 
cell2: IORM_MODE cell2 3 
cell3: IORM_MODE cell3 3 
[root ~]# dcli -g ./cell_group -l root "cellcli -e LIST METRICCURRENT WHERE name='CL_CPUT' and metricvalue \>0 " 
cell1: CL_CPUT cell1 44.1 % 
cell2: CL_CPUT cell2 44.0 % 
cell3: CL_CPUT cell3 44.3 %

Now this is unexpected and interesting. First of all – IORM seems to work fine with regards to balancing the loads – PDB1 and PDB2 get roughly the same IOPS and throughput. However – the the total IOPS in test2 is per cell around 110.000 / sec whereas in test1 its above 170.000 / sec. So in test2 somehow one third of the IOPS are lost. If at all one would have expected the opposite – i.e. in test1 less IOPS. One can see this as well at the total CPU usage on the cell – while it is almost 100 percent in test1 it falls below 50 in test2. And it doesnt help if one increases the number of queries – if you run two of above queries on both pdbs (so 4 in total) the result stays basically the same.

We think this must be related to IORM. What else should otherwise throttle the IOPS ? You may notice as well the differenc of the IORM_MODE between the two tests. While its 2 = “high throughput” in test1 it is 3 = “balanced” in test2. While this in itself is strange – why should IORM choose a different mode in test2 – we thought first that the is the reason for the different total IOPS – IORM reserving some IOPS in the balanced case for transactional workloads (even if there are none).

But no. Forcing “high throughput”in both cases by setting the cell IORM objective to “high throughput” instead of the default “auto” did not change anything.

So its a bit of a mystery. I have an SR open with Oracle, will update if any insights from there. I would like to hear as well from anyone of you having an idea what is going on.

IORM does not manage network-bandwidth bound workloads

We had a quite strange case recently – the Exadata io resource management (IORM) did not seem to work. This is a quarter X6-2 Exadata HC version, Storage server software version 12.2 (latest patch level), and a variety of different CDBs. We had two CDBs doing heavy smart-scanning on flash (cache) , and one CDB seemed to get much more IOPS / throughput than the other, although IORM was configured with equal shares. Queries were doing smart-scans, but still transferred massive amounts of data to the database servers.

In order to verify and assess, we prepared a test case like this:

We used test queries of this form

select /*+ parallel */ a.* bulk collect into b from test_large a where testf(decode(a.object_id,0,1,2)) = 0

test_large is a 1 billion row table of copies of dba_objects, testf is a deterministic functions which is there to prevent any row filtering on storage side. This query returns 0 rows, but transfers basically the whole table over the network.

We ran the query in a loop – on CDB2 with 1 session, on CDB1 with 4 sessions in parallel, and observed cell server metrics:

dcli -g ./cell_group -l root "cellcli -e LIST METRICCURRENT WHERE name='N_HCA_MB_TRANS_SEC' and metricvalue \>1000 " 
cell1: N_HCA_MB_TRANS_SEC cell1 4,090 MB/sec 
cell2: N_HCA_MB_TRANS_SEC cell1 4,096 MB/sec 
cell3: N_HCA_MB_TRANS_SEC cell3 4,092 MB/sec 
# dcli -g ./cell_group -l root "cellcli -e LIST METRICCURRENT WHERE name='PDB_FC_IO_RQ_SEC' and metricvalue \>1000 " 
cell1: PDB_FC_IO_RQ_SEC CDB1.PDB1 55,830 IO/sec 
cell1: PDB_FC_IO_RQ_SEC CDB2.PDB2 14,689 IO/sec 
cell2: PDB_FC_IO_RQ_SEC CDB1.PDB1 56,626 IO/sec 
cell2: PDB_FC_IO_RQ_SEC CDB2.PDB2 13,857 IO/sec 
cell3: PDB_FC_IO_RQ_SEC CDB1.PDB1 56,000 IO/sec 
cell3: PDB_FC_IO_RQ_SEC CDB2.PDB2 14,304 IO/sec 
# dcli -g ./cell_group -l root "cellcli -e LIST METRICCURRENT WHERE name='PDB_FC_IO_BY_SEC' and metricvalue \>0 " 
cell1: PDB_FC_IO_BY_SEC CDB1.PDB1 3,657 MB/sec 
cell1: PDB_FC_IO_BY_SEC CDB2.PDB2 962 MB/sec 
cell2: PDB_FC_IO_BY_SEC CDB1.PDB1 3,709 MB/sec 
cell2: PDB_FC_IO_BY_SEC CDB2.PDB2 907 MB/sec 
cell3: PDB_FC_IO_BY_SEC CDB1.PDB1 3,668 MB/sec 
cell3: PDB_FC_IO_BY_SEC CDB2.PDB2 937 MB/sec

So we indeed see that CDB1 gets 4 times the IOPS and the throughput of CDB2 – something which IORM should prevent (both have same IORM shares). To rule out that CDB2 is simply asking for less we stopped the CDB1 sessions and saw this:

dcli -g ./cell_group -l root "cellcli -e LIST METRICCURRENT WHERE name='PDB_FC_IO_BY_SEC' and metricvalue \>0 " 
cell1: PDB_FC_IO_BY_SEC CDB2.PDB2 4,140 MB/sec 
cell2: PDB_FC_IO_BY_SEC CDB2.PDB2 4,177 MB/sec 
cell3: PDB_FC_IO_BY_SEC CDB2.PDB2 4,140 MB/sec

So indeed – the CDB2 workload was substantially throttled – much below what was expected from IORM.

Now what is the reason ? An IORM bug ? If we look at the total (large) IOPS – this is around 70k. This is much less an X6 storage cell can do from flash – I have seen workloads producing 180k per cell. So maybe the reason is that IO is not at top capacity and therefore IORM is simply not kicking in ?

Indeed, appears to be the case. The limiting factor seems instead the network bandwidth. 4100 MB/sec per storage cell is only about 85 percent of what Infiniband should be able todo, but it is certainly high. In addition, the two database servers receive a total of 12.3 MB/sec, so this appears even to be above Infiniband bandwidth. Not fully clear to me where this comes from.

But anyway – it appears to be a reasonable assumption that network bandwidth is the limiting factor, that there is no resource management on this and therefore the CDB with more sessions will simply get more resources.

I have still and Oracle Support call open for this – I will update when I get further insights.

There is some network resource management on Exadata, however it seems only to prioritise between different type of workload, not between same workload type from different databases – this is an extract from the X7 data sheet:

“Exadata also uniquely implements database network resource management to
ensure that network intensive workloads such as reporting, batch, and backups don’t
stall response time sensitive interactive workloads. Latency sensitive network operations
such as RAC Cache Fusion communication and log file writes are automatically moved
to the head of the message queue in server and storage network cards as well as
InfiniBand network switches, bypassing any non-latency sensitive messages. Latency
critical messages even jump ahead of non-latency critical messages that have already
been partially sent across the network, ensuring low response times even in the
presence of large network DMA (Direct Memory Access) operations.”

How relevant is this ? Fairly relevant and becoming more relevant in my opinion. A workload on Exadata which is network bound may have been fairly rare, but with increasing flash cache sizes and other optimisation methods (cellmemory …) it is prone to become more of a usual case.

I would be happy to hear any opinions on this point.

Long running query with incremental statistics

Only a quick post today – thanks to My Oracle Support.
We saw various sessions using plenty of CPU. ASH showed the culprits:
sqlid ahpp2wqmx515j, which is

SELECT DISTINCT BO#, GROUP# FROM SYS.WRI$_OPTSTAT_SYNOPSIS_HEAD$ H WHERE GROUP#
<> 0 AND GROUP# NOT IN (SELECT T.OBJ# * 2 FROM SYS.TABPART$ T UNION ALL SELECT T
.OBJ# * 2 FROM SYS.TABCOMPART$ T)

Luckily there is an exact match with an Oracle Support article, Doc ID 19450139.8, incremental statistics on partitioned tables.

There are patches for 11.2.0.4 and 12.1.0.2, a workaround is to disable incremental statistics.

More fun with Active Data Guard, multitenant and infinite parse loops

Following my previous article about an infinite parse loop on ADG causing all kinds of trouble and having fixed that (or rather worked around), we noticed that there must be a additional problem. We saw some sessions using a full CPU permanently on ADG instances, at least one per CDB. When looking closer, the sessions were coming from Enterprise Manager and executing queries like this

WITH pdbs AS (SELECT con_id, name FROM v$containers WHERE con_id != 2) SELECT p.name, s.DBID, s.NAME, s.VERSION, NVL(s.HIGHWATER,0), NVL(s.LAST_VALUE,0) 
FROM CDB_HIGH_WATER_MARK_STATISTICS s, V$DATABASE d , pdbs p WHERE s.version in ( SELECT max(version) FROM dba_high_water_mark_statistics ) 
AND s.dbid = d.dbid AND s.con_id = p.con_id

Same query executes perfectly normally on the primary. After looking a bit around – indeed, there is a known (unpublished) bug with a very dubious name: 23514710: CROSS-CONTAINER FIXED TABLE NOT OBSOLETED WHEN PARENT IS MARKED OBSOLETE IN ADG

See here for more information: Bug 23514710 – Query over CDB_* view spins in ADG (Doc ID 23514710.8)

Indeed the trigger appears to be queries on CDB_ views on ADG, they go in infinite parse loops. There are one-off patches available for 12.1.0.2 .

Infinite parse loops with ADG, adaptive statistics and Oct 2017 BP

We recently patched our 12.1.0.2 databases with the Oct 2017 BP (20171017BP). This is when the problems started. These are RAC databases with multitenant and Active Data Guard on Exadata.

The standby databases were having problems, sessions hanging, sessions killed with ORA-29771, finally sometimes the whole database hanging. It became quite quickly clear that the problem must be related to excessive query parsing, which blocks the log writer and leads to all the subsequent issues. Now, there are a few well-documented bugs existing on 12.1.0.2, all causing this behaviour and the excessive parsing originating in one or the other way from adaptive optimisation / adaptive statistics, the most prominent being

bug 20636003 – cause being DS_SVC queries
bug 20413540 Excessive executions of SQL frjd8zfy2jfdq , which is a query to v$sql.

However – none of them was ours – it would as well not have explained why the issue suddenly happens with the Oct2017 BP.

Looking a bit more in detail and tracing some sessions we found the offending queries – mainly dictionary queries (like select * from v$database) which didnt do only excessive parsing, but seemed to go in an infinite parsing loop.

In this loop they executed over and over again the following two queries:

SELECT count(*) FROM SYS."INT$DBA_HIST_SQLSTAT" "INT$DBA_HIST_SQLSTAT" WHERE ("INT$DBA_HIST_SQLSTAT"."SQL_ID"='1' AND "IN 
T$DBA_HIST_SQLSTAT"."PARSING_SCHEMA_NAME"='1' AND ("INT$DBA_HIST_SQLSTAT"."CON_ID"=0 OR "INT$DBA_HIST_SQLSTAT"."CON_ID"=1 
0) AND BITAND(NVL("INT$DBA_HIST_SQLSTAT"."FLAG",0),1)=0) AND ("INT$DBA_HIST_SQLSTAT".CON_ID=0 OR "INT$DBA_HIST_SQLSTAT".C 
ON_ID=10) 

SELECT count(*) FROM SYS."INT$DBA_HIST_SNAPSHOT" "INT$DBA_HIST_SNAPSHOT" WHERE 1=1

The first statement is indeed used (in multitenant) with adaptive statistics (to find out how often an sql was executed in the past – whatever that is good for – see here, where in that version it had a result cache hint – something which Oracle seems to have dropped later as it caused other problems).

So I looked around what has changed in the October BP with regards to adaptive statistics. And indeed , there are major changes:

In 12.2 the parameter optimizer_adaptive_features has been split into two, optimizer_adaptive_plans and optimizer_adaptive_statistics, to give better control over the adaptive features. This change has been backported and this backport has indeed been included in the October BP. You can find the details in document 22652097.8 , or see here. According to this however – the whole change should be disabled by default and only be enabled by some fix_control, ~~and optimizer_adaptive _statistics should be false by default~~ (I received the information in the meanwhile that it is true by default if the optimizer_adaptive_features was set to true (the default)). For us however it was true …so we tried to set it to false. And voila – everything back to normal.

~~So this backport and its disabling by default seems buggy – maybe only in an environment with RAC and ADG.~~

Update 10 Jan: After talking more to Oracle it appears the bug causing the parsing loop may not be in the backport of the adaptive features change, but by something else (yet unidentified) in the BP. Anyway – the disabling of optimizer_adaptive_statistics fixes it for us. Nigel Bayliss has written an excellent blog post now on the interaction between the new optimizer parameters and the related fix control – see here.

What is not clear is what actually triggers the parse loop- as it doesn’t happen always and not on all pdbs and all RAC instances. As a flush shared pool temporary helps it must be some SGA structures involved – Im not so clear how and what and there is not much detail available how this adaptive statistics actually works and which structures may be involved – see here for some information and links to further articles. An SR with Oracle is open , I will update the article once I have more info.

Update Mar 18: it appears now the root cause may actually be bug 23514710, discussed here. This bug probably does not affect the query itself, but the query to int$dba_hist_sqlstat spawned during parse if adaptive statistic is enabled.

The fastest way to load flat files to Oracle via jdbc

I have discussed loading flat files in previous posts, see here and here, using SQL*Loader (which I prefer to external tables, for reasons discussed there). However – many people use JAVA-based programs so invoking SQL*Loader from there is a bit of a nuisance – you need to install the Oracle client, call an external program, handle errors and so on. So can’t we load fast with pure JAVA – using jdbc ?

If we want to reach anything comparable to SQL*Loader, we certainly need todo jdbc batching, we need to use direct path loading and we need todo parallel.

Is this possible ? The batching is easy – jdbc supports it. Use the standard jdbc one, not the Oracle specific which is deprecated (and slower). A batch size of a few thousand is usually a good start.

What about the direct path ? Yes, it is supported for insert into … values … constructs, since Oracle 11.2, by using the not very well known APPEND_VALUES hint. I use it usually together with an autocommit (i.e. the batches of x thousand are committed), as otherwise (even if you commit manually after every executeBatch) it throws

ORA-12838: cannot read/modify an object after modifying it in parallel

for higher batch sizes, which appears to be some kind of bug.

So far so good. But now the crucial part – parallel. Without parallel the bottleneck is client CPU, so with multi-core clients parallel should really bring something There are nice examples on howto read a file and handle the rows by parallel threads, see for example here. However when we tried this there was quite some disappointment. Parallel was somewhat faster, but fare away from what we had expected. Furthermore already with 4 client cores they were not fully used anymore – i.e. bottleneck seemed to be database side.

Checking – yes of course, table locks. Direct path inserts lock the table – perfectly normal behaviour. Why hadn’t we seen that with SQL*Loader ? SQL*Loader is applying some trick they call “parallel direct path load”- it loads the data into temporary segments (each session in its own) and merges at the end. Therefore no table lock concurrency waits with SQL*Loader.

This is however not available with JDBC (unless you want to implement somehow yourself …). The easier way – partition the table you load into ! You can for example add a column which reflects the thread id of the loading java thread – then every thread loads into its own partition (make sure you specify the partition explicitly in the “insert into”.

So lesson learnt: Use partitioning for fast parallel direct path loads with jdbc .

After introducing partitioning the loading scales well with the cores. Bottleneck client CPU. How fast is it ? Testing with 10 column files with text data – about 10 million rows per minute and per core. Comparing with the figures from SQl*Loader linked above – this was about 30 million per minute, so SQL*Loader is still a factor 3 faster.

Exadata can’t solve your TEMP io problem ? Maybe it can !

Exadata has in many situations resolved the io bottleneck, in particular with using smartscans in ETL and analytical workloads. However – one major io bottleneck stayed – temp io originating from hash joins or sorts spilling to disk. This is a very common behaviour in complex joins – I refer you to the excellent article of DBAKevlar from where I have stolen as well the title .

You can see there an example of an ETL workload which spends more than 40 percent of the db time in temp reads and writes – temp needed is in the 10ths of Gigabytes and way beyond the PGA available. Temp reads and writes goto disk ( not to the flash cache) and take therefore some milliseconds – a substantial bottleneck in many such workloads. You can see as well offload percentages going heavily down – it does not help if smartscans filter and return data very fast if you write and read subsequently large amounts of data to/from temp. In the article above the solution was to rewrite queries to use less temp – always the best solution.

But – what if you can’t ? There are situations where simplifications are not reasonably possible , temp usage is needed and is the bottleneck.

Since recently , to be more precise since Exadata storage server software release 12.2.1.1.0 there is help ! With this update, temp ios are using the Exadata flash cache. And even better – once you have this version installed ( and the right patch levels for your database software , see details in above link) it is used out of the box, nothing you would need to enable or configure. You don’t need database server 12.2 – 12.1 or even 11.2 with latest patches are fine.

In our case we had workloads very similar to DBAKevlar’s above. More than 40 percent of db time on temp reads – a single read taking 4 ms. After upgrading the storage software these reads took only 0.4 ms , both figures very typical for reads from disk vs flash , so an improvement of a factor 10 ! For the workloads themselves improvements were in the region of factor 2, as usually half of the db time was CPU. Db time used for temp reads and writes dropped well below 5 percent. A substantial improvement – maybe it does not “solve” all temp io problems but certainly improves a lot on them.

Oracle has as well added new awr sections for this in the “exadata part” – showing you the details on temp io without having to go to cell level. Furthermore you can monitor the usage of flash in then standard statistics for flash read and write hits.

Here is a quick simple example

select * from ( select a.*, rownum r from (select * from dba_objects a where rownum < 10000 ) a , (SELECT LEVEL n
 FROM DUAL
CONNECT BY LEVEL < 2000) b order by object_name , n ) where r = 19000000

This statement creates 20 million rows and sorts them. It writes about 3 GB of TEMP, on the Exadata (X6) with storage software version 12.2.1.1.0 we have execution time 29 sec, 95 percent CPU, 5 percent IO. Average temp read time 408 microseconds.

On the Exadata with storage server version 12.1.2.3.6 we have execution time 61 sec, 60 percent CPU, 40 percent IO. Average temp read time 3.05 ms.

I had asked myself why Oracle hadn’t included this feature from the start – why not let temp use the flash cache ? Reason appears to be ( apart from the lower cache sizes in early Exadata) the fear from increased “flash wear” from heavy write activity. Improvements in flash technology and a better “wear management” seems to have convinced Oracle now to be able to do it.