Oracle JDBC and AES-NI

“Your network is slowing down my queries”

This is how it started , how it often starts and almost never the network is the real issue. Same here – however it made an interesting case.

So what was happening ? Application developers wanted to fetch quite some large amount of information via JDBC  from a 12c database  – around 5 GB. This took several minutes, like 300-400 Mbit/sec, too slow for them. My first thought was – well indeed, this sounds like a long time. Its in the LAN, we have 10GbE and latencies below half a microsecond, so cant really be the network limiting it. Lets test it.

select test,test,test,test,test,test,test,test,test,test from (select test||test||test||test||test||test||test||test||test||test test from ( select rownum ||'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' test from dual connect by level<= 100000))";

This produces about a gigabyte of data. ( NB: I was banging my head for quite a while before introducing the rownum – because without it the query is really fast but transfers only a fraction of 1GB over the network. I was starting to doubt the session stats … until I came across the not so well known feature of SQL*Net deduplication (see here)).

They first main factor determining the speed from jdbc is the fetch size – 10.000 was a good setting in this case. Took 20 seconds. So this is indeed like  50 MB/sec =400Mbit/sec, so network bandwidth can’t be the bottleneck here. First thought was network latency – this is a single session so what can be achieved here in terms of network throughput depends on things like tcp window size and scaling, maybe as well SDU sizes and so … but after looking a bit into it – no this is not the reason. Modern OS and networks seem to autotune this quite well – changing such settings over LANs hardly ever makes a significant positive difference.  What then ? The spoiler is of course already in the subject – we are using SQL*Net encryption. In detail – server side sqlnet.ora has

SQLNET.ENCRYPTION_TYPES_SERVER = (AES256)
SQLNET.ENCRYPTION_SERVER = required
SQLNET.CRYPTO_CHECKSUM_TYPES_SERVER = (MD5,SHA512)
SQLNET.CRYPTO_CHECKSUM_SERVER = required

It is needed in our environment, but for testing purposes, lets remove it, both the encryption and the checksum. Result – 2.5 sec, much better. Testing separately with removing encryption and checksum revealed that the more major contribution for the time was the encryption.  Ok, so it is related to the encryption, but where exactly ? In particular – encryption or decryption ? A first hint was given by the CPU usage on client and server. One client CPU was fully used, the server CPU by far not. Confirmed when trying the same query over a database link (i.e. using the Oracle database server itself as client) – this is very fast, despite encryption.

So the bottleneck appears to be the decryption by the JDBC driver. Lets look a bit more into this encryption matter. AES – modern encryption algorithm, supposed to be quite fast. What means fast ? After looking a bit around we found the following: AES-NI

So modern Intel processors have a specialised instruction set for AES, which promises to speed it up alot. One can test with openssl – with AES-NI and without. Below the results from the client on which we ran the jdbc driver

 

WITHOUT AES-NI
D:\> openssl speed -elapsed aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256 cbc for 3s on 16 size blocks: 14253991 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 64 size blocks: 4010416 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 256 size blocks: 1009130 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 1024 size blocks: 256351 aes-256 cbc's in 3.01s
Doing aes-256 cbc for 3s on 8192 size blocks: 32142 aes-256 cbc's in 3.01s
OpenSSL 1.0.2a 19 Mar 2015
built on: reproducible build, date unspecified
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: gcc -I. -I.. -I../include -D_WINDLL -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -DDSO_DLFCN -DHAVE_DLFCN_H -ggdb
-O2 -pipe -Wimplicit-function-declaration -fdebug-prefix-map=/home/corinna/src/openssl/openssl-1.0.2a/openssl-1.0.2a-1.x
86_64/build=/usr/src/debug/openssl-1.0.2a-1 -fdebug-prefix-map=/home/corinna/src/openssl/openssl-1.0.2a/openssl-1.0.2a-1
.x86_64/src/openssl-1.0.2a=/usr/src/debug/openssl-1.0.2a-1 -DTERMIOS -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_
BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAE
S_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256 cbc 75945.34k 85242.98k 85826.34k 87152.53k 87448.44k

WITH AES-NI

D:\> openssl speed -elapsed -evp aes-256-cbc
You have chosen to measure elapsed time instead of user CPU time.
Doing aes-256-cbc for 3s on 16 size blocks: 79900020 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 21354477 aes-256-cbc's in 3.01s
Doing aes-256-cbc for 3s on 256 size blocks: 5424546 aes-256-cbc's in 3.01s
Doing aes-256-cbc for 3s on 1024 size blocks: 1358737 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 170131 aes-256-cbc's in 3.01s
OpenSSL 1.0.2a 19 Mar 2015
built on: reproducible build, date unspecified
options:bn(64,64) md2(int) rc4(16x,int) des(idx,cisc,16,int) aes(partial) blowfish(idx)
compiler: gcc -I. -I.. -I../include -D_WINDLL -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -DDSO_DLFCN -DHAVE_DLFCN_H -ggdb
-O2 -pipe -Wimplicit-function-declaration -fdebug-prefix-map=/home/corinna/src/openssl/openssl-1.0.2a/openssl-1.0.2a-1.x
86_64/build=/usr/src/debug/openssl-1.0.2a-1 -fdebug-prefix-map=/home/corinna/src/openssl/openssl-1.0.2a/openssl-1.0.2a-1
.x86_64/src/openssl-1.0.2a=/usr/src/debug/openssl-1.0.2a-1 -DTERMIOS -DL_ENDIAN -O3 -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_
BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAE
S_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 425991.44k 453747.19k 461356.74k 463627.69k 462873.85k

So 80 MB/sec without AES-NI, 460 with.

The theory would be therefore that the Oracle JDBC thin driver somehow does not use the AES-NI.  Does JAVA actually support it ? Indeed, it does, although quite badly documented. Few things to be taken into account however

  • Only since JAVA 7.something, better use JAVA 8 where you as well don’t need to set any more any JVM options to enable it
  • IMPORTANT: Only the server VM supports it, use the -server flag (its the default for 64bit JAVA). Use the 64bit version, the latest one.
  • For 256bit AES you in addition need to download and install the unlimited strength policy files

So JAVA should support it, so why not the Oracle (thin) JDBC driver ? Checking with Oracle support helped – indeed until very recently it doesn’t as it has its own AES implementation not relying on the underlying JAVA one. This can be changed in the newest Oracle 12.2 JDBC driver with the following connection property:

oracle.net.useJCEAPI = true (default is false)

Enabled all that, tested again – and voila ! The test case finished in about 9 sec (compared to 20 without AES-NI and 2.5 without encryption/checksum).

This is quite good but where does the remaining difference come from ? Partially the checksum – here the Oracle jdbc implementation seems to be worse than standard benchmarks, and that independently of whether it is MD5 or SHA512 – this deserves further investigation. About 3 sec are still needed for the decryption of 1 GB – 350 MB/sec and thereby at least in the same ballpark as the OpenSSL one.

Few closing remarks:

  • we tried as well the OCI driver. For some reason this shows much worse performance even without encryption. Same with SQLPlus – even when preventing all output with set autotrace = traceonly
  • The testcase is a bit specific in the sense that it has few, wide columns and no nulls. The customer case which started this was actually a bit different, the Gigabytes of data were spread over plenty of columns with many nulls – here another effect comes into play overlaying the encryption one, namely the time the driver needs to process a single cell (even if its null). A ballpark number we saw is 40 million cells per second – so this can play a role. More in a future article.
  • Of course one other way to speed this up would be to use multiple parallel sessions – but it requires to determine appropriate split keys.

 

 

Advertisements

One thought on “Oracle JDBC and AES-NI”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s