Oracle, Exadata, Multitenant and related

OCI private cloud shell name resolution

Cloud shell private access is a great feature, however it lacks some particularity:
It should naturally use as well the DNS settings from the subnet it uses – like custom resolvers. It doesnt unfortunately. Reason is the missing nameserver entry

nameserver 169.254.169.254

Hope they add it ….

OCI – a difficult VNIC dependency case

The Subnet ocid1.subnet.oc1.xxx references the VNIC ocid1.vnic.oc1.yyy. You must remove the reference to proceed with this operation.

Typically, you have an instance in there … or a volume mount. Not here for us. The VNIC belongs to a primary ip … without further info. No instances.

What was it ?

https://blogs.oracle.com/cloud-infrastructure/post/securely-access-private-resources-from-oci-cloud-shell-using-private-network-access

We had a cloud-shell with private network access in that subnet – and somehow it keeps the VNIC. Only solution was – start the cloudshell again privately – and change the network to public. Then it took some time … but after 20 minutes or so the VNIC was magically gone.

OCI security zone does not support OKE images

Thats not so nice –

Node launch failure: (403, Forbidden, false) Security Zone Violation: You must create a compute instance in a security zone using a platform image. You can't create a compute instance in a security zone from a custom image.

With default OKE images and maximum security recipe. Of course you could change the recipe removing that rule – but Oracle should rather fix that ….

Oracle on Kubernetes

Kubernetes is one of the most exciting technologies I have ever seen. So wouldn’t it be great to do something involving my old love (Oracle) and my new (Kubernetes) ?

And we have already a use case. We have a kind of pdb-as-a-service based on Exadata for our developers. Most of the times they are happy … but sometimes they are starting to complain about that they have only access to the pdb. “We cant restart our pdb/cdb”, “we cant see the OS”, “we cant we cant we cant”…. Wouldnt it be great to provide them Oracle in a container as service on our Kubernetes environment (based on Rancher RKE) ? So what do we need ?

A container image for Oracle – I used the official 12.2 one from docker hub ( https://hub.docker.com/_/oracle-database-enterprise-edition ), but will as well try with the github 19c. This installs a CDB with a single pdb on it.
we want to offer multiple environments – one per dev team or so, so how do we expose them ? Here we have a perfect fit – my beloved Oracle Connection Manager. There is an existing dockerfile – here: https://github.com/oracle/docker-images/tree/master/OracleDatabase/RAC/OracleConnectionManager . Cman should run as well in the cluster serving the sql-net calls from outside and rerouting to the right pods.
Some persistant storage provider for Kubernetes. For this initial trial I will use the great software-defined block storage provider I anyway had installed – Rancher Longhorn.

We set this up – and again proving the greatness of Kubernetes – everything works beautifully. Some snippets and associated gotcha’s:

apiVersion: v1
kind: Service
metadata:
  name: oracledb-2
  namespace: default
spec:
  clusterIP: None
  selector:
    app: oracle2
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: oracle2
  namespace: default
spec:
  podManagementPolicy: OrderedReady
  replicas: 1
  selector:
    matchLabels:
      app: oracle2
  template:
    metadata:
      labels:
       app: oracle2
    spec:
      containers:
      - image: store/oracle/database-enterprise:12.2.0.1-slim
        imagePullPolicy: IfNotPresent
        name: oracle
        volumeMounts:
        - mountPath: /ORCL
          name: oradata2
      initContainers:
      - name: pgsql-data-permission-fix
        image: busybox
        command: ["/bin/chmod","-R","777", "/ORCL"]
        volumeMounts:
        - mountPath: /ORCL
          name: oradata2
      dnsPolicy: ClusterFirst
      restartPolicy: Always
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - metadata:
      name: oradata2
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 50Gi

Pay attention here to the “initContainer”. This is required as the Oracle database creation happens as user “Oracle” and it needs to be able to write to the mount. There are probably better ways. Note as well that we are relying here on the default storage class (which is Longhorn).

For the cman, the setup referenced above can very much be simplified – as that was done for RAC. All these environment variables can be removed, simply hardwire the hostname to 0.0.0.0 in cman.ora , remove domain, remove ip and scanip, remove the updating of /etc/hosts and then you can create an image which does not need any environment. It needs however the right for privilege escalation (not so clear to me why). Expose port 1521 on ClusterIP. And define a service on top. Nothing else really.

As we use Nginx Ingress (on premise) and dont have L4 load balancers, we needed in addition to make the Ingress to forward plain TCP – this basically involves defining a config map like

apiVersion: v1
kind: ConfigMap
metadata:
  name: tcp-configmap-example
data:
  1521: "default/cman:1521"

and an Ingress amendment like this:

      - args:
        - /nginx-ingress-controller
        - --tcp-services-configmap=default/tcp-configmap-example
...
        ports:
        - containerPort: 80
          hostPort: 80
          name: http
          protocol: TCP
        - containerPort: 443
          hostPort: 443
          name: https
          protocol: TCP
        - containerPort: 1521
          hostPort: 1521
          name: oradb
          protocol: TCP

Developers access then the service like this over sqlnet:

test1=
 (DESCRIPTION=
   (SOURCE_ROUTE=yes)
   (ADDRESS=(PROTOCOL=tcp)(HOST=cman)(PORT=1521))   
   (ADDRESS=(PROTOCOL=tcp)(HOST=oracledb-0)(PORT=1521))   
   (CONNECT_DATA=(SERVICE_NAME=ORCLPDB1.localdomain)))

In addition they can get a shell directly into their pod if they want to restart something, want to create more pdbs or so … even sparse cloning works 🙂

Overall a great setup, of course its more of a POC at the moment, but its as well lots of fun !

Exadata X8 is out

Since a few days – https://www.oracle.com/technetwork/database/exadata/exadata-x8-2-ds-5444350.pdf

Update 16 June 2019: now as well officially announced with further links.

There was not much boohay around this this time – and indeed on the first glance one may be a bit disappointed if one is used to the improvements the last years:

there is no increase in CPU cores on database server (24, like on X7), however these are newer CPUs with higher clock speed 2.3 GhZ compared to 2.1 before. As well larger disk drives on the database servers.
On storage servers: No increase in flash. Increase of sizing of spinning disks – from 10 TB to 14 TB, thereby increasing the usable storage on HC storage servers by 40%. More cores (16 instead of 10) per CPU on storage servers.
No changes to Infiniband

But then – there is a great new feature which one may easily overlook: there is a new, much cheaper (factor 3 ) storage option: Storage Server XT. This is basically the HC Storage Server without flash, only one processor and with offloading by default disabled. If you leave it disabled, you do not need to buy the storage software licenses ! HCC compression and everything else is supported with this storage option – so you will still get decent performance for datawarehousing workloads – in particular when used for more infrequently accessed data. Taking into account typical discounts this option will bring down your costs per usable terabyte compared to the X7 for such data by a factor 10 or more.
All on the same machine – not having to use workarounds like utilising the ZFS appliance or other offloading approaches.

Scheduler delays in 12.2 RAC

There are quite a number of bugs in different Oracle versions which can lead to delayed start of scheduler jobs, sometimes only some seconds, sometimes minutes and more. In our case the problems started after an upgrade from 12.1 to 12.2. One-off scheduler jobs started randomly with delays. Not all of them, but some. Some only seconds, some up to half an hour delayed.

We thought first about some bug regarding the resource manager, as the resource manager is called to determine whether there are enough resources available to start additional jobs – and there are several bugs and associated changes to what the resource manager actually looks at – see for example bug 27333794 and DocID 2322087.1 – Oracle however does not disclose the actual algorithm applied.

After a while we found out that only a certain type of scheduler jobs was affected: we are running a two-node RAC cluster and have jobs which are via job classes bound to services, and some of those services were bound to a particular RAC instance. Only such jobs were delayed – and only if the job creation (which was done in a way to run immediate) was done on the other instance.

Looking into the scheduler traces (see here ) , we saw the following in the CJQ traces on the instance the job was supposed to run on:

SCHED 03-03 18:57:25.774 2 00 153076 CJQ0 1(jskqJobQRefresh):Breadcast received, going to disk for job 139671, conid 4, total 1 
SCHED 03-03 18:57:35.485 2 00 153076 CJQ0 1(jskqJobQRefresh):Breadcast received, going to disk for job 139672, conid 4, total 1 
SCHED 03-03 18:57:47.058 2 00 153076 CJQ0 1(jskqJobQRefresh):Breadcast received, going to disk for job 139678, conid 4, total 1 
SCHED 03-03 18:57:56.088 2 00 153076 CJQ0 1(jskqJobQRefresh):Breadcast received, going to disk for job 139679, conid 4, total 1 
SCHED 03-03 18:58:06.640 2 00 153076 CJQ0 1(jskqJobQRefresh):Breadcast received, going to disk for job 139680, conid 4, total 1 
SCHED 03-03 18:58:15.605 2 00 153076 CJQ0 1(jsksCheckForBCast):Inside a bcast send, don't try and receive bcast here <===============

this is a grep on the trace file for “Breadcast received” (nice typo :)), and one sees the following. The instance is informed about the jobs created on the other instance via broadcasts (one can find the sending of the broadcast in the other instance’s log). However sometimes the broadcast reception is going wrong – the last line above, because its received in a moment when its sending a broadcast itself. Thats why the broadcast information is not working and the job not started. One can see then that the instance checks on certain events (other jobs started) and periodically for outstanding jobs, finds the job and starts it – but much too late.

Checking this with Oracle support we finally determined a known bug – although its description looks very different. Applying the patch for this bug solved our issue. In 18c the patch is included, on 12.2 there are interim patches. See Doc ID 2463589.1 . Bug 27765272 which was never resolved has likely the same root cause,

So in summary: If you are on 12.2 RAC and have scheduler jobs bound to one instance delayed, look in the CJQ trace for above message. If you have it, this is likely the case from Doc ID 2463589.1

ORA-27041 for snapshot copies

Since 12c, Oracle multitenant offers the nice feature of “snapshot copies”, i.e. the possibility of creating thin clones of pluggable databases within seconds – using the copy-on-write functionality of the underlying storage, see for example here for details (note that this article is about the functionality on “normal”storage using clonedb=true, not the similar functionality on specific asm sparse disks). This functionality should allow for example to create quickly potentially multiple copies from some “golden image” for testing purposes. An obvious limitation is that you need to keep the source db read-only while there are such thin clones existing, otherwise there can easily be inconsistencies (there seems to have been a bug in this in 12.1 – see the comments in above link). However,when you have dropped the clones, you should be able to open the source db again read-write, for example to evolve your “golden image”. This however – even in the (at time of writing) latest 19.2 – does not work, you get errors like this one:

alter pluggable database ORCLPDB1 open;
alter pluggable database ORCLPDB1 open
*
ERROR at line 1:
ORA-01114: IO error writing block to file 12 (block # 1)
ORA-01110: data file 12: '/opt/oracle/oradata/ORCLCDB/ORCLPDB1/users01.dbf'
ORA-27091: unable to queue I/O
ORA-27041: unable to open file
Linux-x86_64 Error: 13: Permission denied
Additional information: 3

This is in fact the same error you get when trying to open the pdb while there are still clones. The implementation here sets the datafiles read-only on cloning, but does not change them back when the last clone is dropped. The is an Oracle support article on this since August 2018 (
Doc ID 2419236.1 ) and it mentions as workaround to change the datafiles manually back to writable on OS level (which works) … however there does not even seem to be a bug filed ? Maybe a sign that this functionality is not much used ?

Tracing DBMS_SCHEDULER

Nothing exciting today, just some things to remember:
DBMS_SCHEDULER tracing is done via the event 27402, like

alter system set events '27402 trace name context forever, level 65355';

This gives several interesting traces

trace of the job scheduler itself, in *cjq*.trc files
trace of the session creation/running the jobs, in *ora*.trc files
trace of the job slaves – in *j0*.trc files

Some things to remember:

the event needs to be set at system level, it does not appear to do anything at session level
in multitenant, you need to set it at ROOT level, at least to get the cjq traces
in RAC – remember to set it at all instances if required

Database Vault: Protection from rewriting, redacting, translating ?

We use the database vault for integrity protection – the data in a certain schema must not be tampered with, not even by dba or by SYSDBA.

Now this article is specifically about the protection from SYSDBA. Of course Oracle writes that even with the Database Vault SYSDBA are still powerful privileges, so you should limit the access to it as much as possible. Still – there is nowhere a statement like you shouldn’t even try to protect from SYSDBA.

And in principle it works. You define a realm – sysdba does not have access anymore. However we came across the different ways in which oracle allows to influence query results – this could be used by SYSDBA to influence the query results of realm users and thereby affect the integrity of the protected information.

For example – on a statement like “insert into a select c from b where …” – if you could influence the result of the subselect then you can tamper with table a.

Now Oracle has (fortunately or unfortunately) several ways of doing such things:

Advanced Query Rewrites (dbms_advanced_rewrite)
SQL translation framework (dbms_sql_translator)
Redaction (dbms_redact)
possibly others ….

It is easy to see that anyone having access to this functionality could compromise the integrity. So does the database vault protect here ? First thing we saw – by default, no. SYSDBA can freely and happily use all this functionality.

Now – can we protect from this ? First idea- command rules. But unfortunately, no :

BEGIN
  object_owner    => 'SYS',
 DBMS_MACADM.CREATE_COMMAND_RULE(
  command         => 'EXECUTE',
  rule_set_name   => 'Disabled',
  object_owner    => 'SYS',
  object_name     => 'DBMS_ADVANCED_REWRITE',
  enabled         => DBMS_MACUTL.G_YES,
 scope           => DBMS_MACUTL.G_SCOPE_LOCAL);
END;
 10  /
BEGIN
*
ERROR at line 1:
ORA-47105: invalid object owner SYS for command EXECUTE
ORA-06512: at "DVSYS.DBMS_MACADM", line 1735
ORA-06512: at line 2

This is even documented, here :

"Note that the SELECT, INSERT, UPDATE, DELETE, and EXECUTE statements are not allowed for a selection of all (%) or the SYS and DVSYS schemas.".

Next idea – define a mandatory realm protecting access to these packages ? This should block access from SYS:

BEGIN
 DBMS_MACADM.CREATE_REALM(
  realm_name    => 'Test Realm Rewrite', 
  description   => 'Test Realm Rewrite', 
  enabled       => DBMS_MACUTL.G_YES, 
  audit_options => DBMS_MACUTL.G_REALM_AUDIT_FAIL,
  realm_type    => 1);
END; 
/
BEGIN
 DBMS_MACADM.ADD_OBJECT_TO_REALM(
  realm_name   => 'Test Realm Rewrite', 
  object_owner => 'SYS', 
  object_name  => 'DBMS_ADVANCED_REWRITE', 
  object_type  => 'PACKAGE'); 
END;
/

but … unfortunately ….

BEGIN 
DBMS_ADVANCED_REWRITE.declare_rewrite_equivalence ( 
name => 'test_rewrite', 
source_stmt => 'select 1 from testuser.test', 
destination_stmt => 'select 2 from testuser.test', 
validate => FALSE, 
rewrite_mode => 'TEXT_MATCH'); 
END; 
/

succeeds. And this does not appear to be documented anywhere. We found then bug 26825142 – which describes exactly this behaviour and which has been classified as “expected behaviour” and “documentation bug”.

This is of course very unfortunate. So access to these packages by SYSDBA cannot be prevented. What are the possibilities ? Luckily, for the query rewrite and the translation framework parameters or events need to be set – and this can be prevented via command rules ! For example

  BEGIN
DBMS_MACADM.CREATE_COMMAND_RULE(
 command         => 'ALTER SYSTEM',
 rule_set_name   => 'Disabled',
 object_owner    => '%',
 object_name     => '%',
 enabled         => DBMS_MACUTL.G_YES,
clause_name     => 'SET',
parameter_name  => 'QUERY_REWRITE_INTEGRITY',
scope           => DBMS_MACUTL.G_SCOPE_LOCAL);
END;
/

Now not even SYS can set this event – and not use the advanced query rewrite for tampering anymore.

So there are ways – but it would be much better if Oracle would include such protection by default. I have a case open with them – I will update if I have news.

ORA-65199: database busy. operation not allowed now

It appears to be nowhere documented, so let me document it:

ORA-65107: Error encountered when processing the current task on instance:1
ORA-65199: database busy. operation not allowed now

Reason for us was – try to put a RAC pdb in upgrade mode either both instances or the other instance not shutdown. You have to shutdown one instance and upgrade the other only.