DOP (= degree of parallelism) downgrades in Oracle can be a hassle to deal with – your queries are designed for parallel, your application requires parallel performance – and you do not get it – in the worst case your queries run serially and performance is a factor worse. Luckily – standard cases for downgrades are documented well and usually easy to find – like not enough parallel servers.
We had however a mysterious case. Queries which usually ran with DOP 16 or 32 were downgraded – sporadically – to 0. Reason code was 352:
352 DOP downgrade due to insufficient number of processes
First thought – OK, something else must be using up all the parallel servers. Nope. Plenty available. So how to troubleshoot this ? Trace – using _px_trace :
2016-03-05 14:47:52.959149*:PX_Messaging:kxfp.c@19412:kxfpiinfo(): inst [cpus:mxslv:bc(MB) :pga_target(MB)] 1 [16 :240 :36608 :20480 ] 2 [16 :240 :39936 :20480 ] 2016-03-05 14:47:52.959149*:PX_Messaging:kxfp.c@19767:kxfpclinfo(): inst(load :user:pct:fact:servtarget:queued:started:granted:active:acti ve(free)): aff 1 (18.00 :0 :0 :0 :128 :0 :0 :0 :4 :0 ) 2 (0.00 :0 :0 :0 :128 :0 :0 :0 :0 :0 ) 2016-03-05 14:47:52.959149*:PX_Messaging:kxfp.c@19412:kxfpiinfo(): inst [cpus:mxslv:bc(MB) :pga_target(MB)] 1 [16 :240 :36608 :20480 ] 2 [16 :240 :39936 :20480 ] 2016-03-05 14:47:52.959149*:PX_Granule:kxfr.c@8408:kxfrDefaultDOP(): deg:32 tpc:1 #cpus:32 cap:-1 service:SYS$USERS pig: ser:0 locl:0 var=-18 limit=256 use_aff=0 aff_num=0 unit=16 new q=0xe83007928 dp=(nil) pnum=65535 numa#=0 qser=0 done. 2016-03-05 14:47:52.959149*:PX_Messaging:kxfp.c@11575:kxfpg1sg(): q=0xe83007928 req_threads=8 nthreads=8 unit=16 #inst=2 normal 2016-03-05 14:47:52.959775*:PX_Messaging:kxfp.c@11704:kxfpg1sg(): Got It. 1 so far. 2016-03-05 14:47:52.959966*:PX_Messaging:kxfp.c@11704:kxfpg1sg(): Got It. 2 so far. (....) join failed qref=0xe83013650 server=2.6 pid=86622 reason=remote instance not opened(5) err=0 (.....) Finish: allocated actual 0 slaves for non-GV query 2016-03-05 14:47:52.970512*:PX_Granule:kxfr.c@2828:kxfrialo(end): Finished granules allocation and slave acquisition (qcq:(nil)) 2016-03-05 14:47:52.970512*:PX_Overhead:kxfr.c@2953:kxfralo(): stepid=after_kxfr ialo 2016-03-05 14:47:52.970512*:PX_Scheduler:qerpx.c@7744:qerpx_rowsrc_start(): DOP downgraded to 0(reason_code = 352)
So what do we see here ? As shown in the first part this is a RAC system with 2 instances. Oracle in principle can allocate the parallel slaves on either instance – it takes the decision by the load, and here it decides for instance 2. When it then tries to allocate slaves on this instance, it fails with “remote instance not open” – it can not allocated slaves and therefore DOP 0.
What ? Remote instance not open ? How can that be ? Did Oracle just say it has two instances, both with some loads ? So certainly open ?
Of course there is a little bit background information missing here – this is 12.1.0.2 and more importantly – this is multitenant. Now it becomes clear. These queries are in a pdb – and the pdb is open on only one of the RAC instances (which is a quite common setup for better resource utilization for pdbs not requiring highest availability). Oracle however does not check this – all it checks is the service class which in this case is the default SYS$USERS (in fact this was scheduler jobs) which is pdb independent. So a kind of bug / feature – any way to work around ?
Yes, two ways
- set parallel_force_local = true. Slaves will allocated only locally (but be aware of some bugs in 12.1.0.2 which make the optimizer take this value from the root cdb, see here for example)
- assign the queries to a pdb specific service. Oracle checks where the service is running and allocates slaves only there.