A mysterious DOP downgrade

DOP (= degree of parallelism) downgrades in Oracle can be a hassle to deal with – your queries are designed for parallel, your application requires parallel performance – and you do not get it – in the worst case your queries run serially and performance is a factor worse. Luckily – standard cases for downgrades are documented well and usually easy to find – like not enough parallel servers.

We had however a mysterious case. Queries which usually ran with DOP 16 or 32 were downgraded – sporadically – to 0.  Reason code was 352:

352 DOP downgrade due to insufficient number of processes

First thought – OK, something else must be using up all the parallel servers. Nope. Plenty available. So how to troubleshoot this ? Trace – using _px_trace :

2016-03-05 14:47:52.959149*:PX_Messaging:kxfp.c@19412:kxfpiinfo():
        inst [cpus:mxslv:bc(MB)    :pga_target(MB)]
        1    [16  :240  :36608     :20480         ]
        2    [16  :240  :39936     :20480         ]
2016-03-05 14:47:52.959149*:PX_Messaging:kxfp.c@19767:kxfpclinfo():
        inst(load   :user:pct:fact:servtarget:queued:started:granted:active:acti
ve(free)): aff
        1   (18.00  :0   :0  :0   :128       :0     :0      :0      :4     :0
  )
        2   (0.00   :0   :0  :0   :128       :0     :0      :0      :0     :0
  )
2016-03-05 14:47:52.959149*:PX_Messaging:kxfp.c@19412:kxfpiinfo():
        inst [cpus:mxslv:bc(MB)    :pga_target(MB)]
        1    [16  :240  :36608     :20480         ]
        2    [16  :240  :39936     :20480         ]
2016-03-05 14:47:52.959149*:PX_Granule:kxfr.c@8408:kxfrDefaultDOP():
        deg:32 tpc:1 #cpus:32 cap:-1 service:SYS$USERS pig: ser:0 locl:0
        var=-18 limit=256 use_aff=0 aff_num=0 unit=16
        new q=0xe83007928 dp=(nil) pnum=65535 numa#=0 qser=0 done.
2016-03-05 14:47:52.959149*:PX_Messaging:kxfp.c@11575:kxfpg1sg():
        q=0xe83007928 req_threads=8 nthreads=8 unit=16 #inst=2 normal
2016-03-05 14:47:52.959775*:PX_Messaging:kxfp.c@11704:kxfpg1sg():       Got It.
1 so far.
2016-03-05 14:47:52.959966*:PX_Messaging:kxfp.c@11704:kxfpg1sg():       Got It.
2 so far.

(....)


   join failed qref=0xe83013650 server=2.6 pid=86622 reason=remote instance

not opened(5) err=0

(.....)

Finish: allocated actual 0 slaves for non-GV query
2016-03-05 14:47:52.970512*:PX_Granule:kxfr.c@2828:kxfrialo(end):
        Finished granules allocation and slave acquisition (qcq:(nil))
2016-03-05 14:47:52.970512*:PX_Overhead:kxfr.c@2953:kxfralo(): stepid=after_kxfr
ialo
2016-03-05 14:47:52.970512*:PX_Scheduler:qerpx.c@7744:qerpx_rowsrc_start(): DOP
downgraded to 0(reason_code = 352)

So what do we see here ? As shown in the first part this is a RAC system with 2 instances. Oracle in principle can allocate the parallel slaves on either instance – it takes the decision by the load, and here it decides for instance 2. When it then tries to allocate slaves on this instance, it fails with “remote instance not open” – it can not allocated slaves and therefore DOP 0.

What ? Remote instance not open ? How can that be ? Did Oracle just say it has two instances, both with some loads ? So certainly open ?

Of course there is a little bit background information missing here – this is 12.1.0.2 and more importantly – this is multitenant.  Now it becomes clear. These queries are in a pdb – and the pdb is open on only one of the RAC instances (which is a quite common setup for better resource utilization for pdbs not requiring highest availability). Oracle however does not check this – all it checks is the service class which in this case is the default SYS$USERS (in fact this was scheduler jobs) which is pdb independent. So a kind of bug / feature – any way to work around ?

Yes, two ways

  1. set parallel_force_local = true. Slaves will allocated only locally (but be aware of some bugs in 12.1.0.2 which make the optimizer take this value from the root cdb, see here for example)
  2. assign the queries to a pdb specific service. Oracle checks where the service is running and allocates slaves only there.

Leave a comment