Monday, November 28, 2011

Implementing Oracle parallel shared server process in Java inside the Database

Behind the implementation of latest LDI open source project and the OLS products there is a functionality not well know by Oracle Java database developers, I called it Parallel Shared Server process.
The idea is to have an Oracle shared server process running during the  life-time of the instance, which means a process automatically started during database startup and stopped during database shutdown.
So which functionality this process can implement?, on LDI is an RMI server, on OLS is lightweight HTTP server, but basically you can implement anything you need for example getting information from another process and fill some table, getting statistical, consuming web services, etc. etc.
Let see in some example how it works.
We will create a TEST user and creates some Java classes running a simple Hello World RMI server.
SQL> conn / as sysdba
SQL> create user test identified by test
  2  default tablespace users
  3  temporary tablespace temp
  4  quota unlimited on users;
SQL> grant connect,resource,create any job to TEST;
SQL> exec dbms_java.grant_permission( 'TEST', 'SYS:java.net.SocketPermission', 'localhost:1024-', 'listen,resolve');
SQL> exec dbms_java.grant_permission( '
TEST', 'SYS:java.net.SocketPermission', 'localhost:1024-', 'accept, resolve');
SQL> exec dbms_java.grant_permission( '
TEST', 'SYS:java.net.SocketPermission', 'localhost:1024-', 'connect, resolve');
SQL> exec dbms_java.grant_permission( '
TEST', 'SYS:java.lang.RuntimePermission', 'setContextClassLoader', '' );
The RMI interface and server implementation running on TEST user.
SQL> conn test/test
SQL> create or replace and compile java source named "mytest.Hello" as
package mytest;
import java.rmi.Remote;
import java.rmi.RemoteException;
public interface Hello extends Remote {
    String sayHello() throws RemoteException;
    int nextCount() throws RemoteException;
}
/
SQL> create or replace and compile java source named "mytest.HelloImpl" as
package mytest;
import java.rmi.Naming;
import java.rmi.RemoteException;
import java.rmi.RMISecurityManager;
import java.rmi.registry.LocateRegistry;
import java.rmi.server.UnicastRemoteObject;
public class HelloImpl extends UnicastRemoteObject implements Hello {
    static int counter = 0;
   
    public HelloImpl() throws RemoteException {
        super();
    }
    public String sayHello() {
        return "Hello World!";
    }
    public static void main(String[] args) {
        // Create and install a security manager
        if (System.getSecurityManager() == null) {
            System.setSecurityManager(new RMISecurityManager());
        }
        try {
            HelloImpl obj = new HelloImpl();
            LocateRegistry.createRegistry(1099);
            // Bind this object instance to the name "HelloServer"
            Naming.rebind("//localhost/HelloServer", obj);
            System.out.println("HelloServer bound in registry");
        } catch (Exception e) {
            System.out.println("HelloImpl err: " + e.getMessage());
            e.printStackTrace();
        }
    }
    public synchronized int nextCount() {
        return counter++;
    }
}
/
SQL> create or replace procedure HelloServ(srvName IN VARCHAR2) as LANGUAGE JAVA NAME
        'mytest.HelloImpl.main(java.lang.String [])';
/
SQL> begin
  -- Start a Cron like process (DBMS_SCHEDULER)
  DBMS_SCHEDULER.CREATE_JOB(
   job_name          =>  'HelloServJob',
   job_type          =>  'PLSQL_BLOCK',
   job_action        =>  'begin
     HelloServ(''HelloServer'');
     exception when others then
        null;
     end;',
   start_date        =>  SYSDATE,
   enabled           => false,
   auto_drop         => false);
  DBMS_SCHEDULER.SET_ATTRIBUTE_NULL (
   name           =>   'HelloServJob',
   attribute      =>   'MAX_FAILURES');
end;
/
commit;
Now we can register two database instance trigger to automatically start and stop the job.
SQL> conn / as sysdba
SQL> CREATE OR REPLACE TRIGGER start_test_srv
  AFTER STARTUP ON DATABASE
BEGIN
  -- Start a Cron like process (DBMS_SCHEDULER)
  DBMS_SCHEDULER.ENABLE('TEST.HelloServJob');
END;
/
SQL> CREATE OR REPLACE TRIGGER stop_test_srv
  BEFORE SHUTDOWN ON DATABASE
BEGIN
  -- Start a Cron like process (DBMS_SCHEDULER)
  DBMS_SCHEDULER.STOP_JOB('TEST.HelloServJob',force=>true);
EXCEPTION WHEN OTHERS THEN
  null;
END;
/
If we process to do a shutdown/startup sequence the server will up and running, also we can start the server manually by executing:
SQL> conn / as sysdba
SQL> exec DBMS_SCHEDULER.ENABLE('TEST.HelloServJob');
SQL> commit;
after doing that we can see at $ORACLE_BASE/diag/rdbms/orcl/orcl/trace a .trc file associated with the parallel shared server process which is up and running:
-bash-4.2$ cat orcl_j000_10411.trc
Trace file /u01/app/oracle/diag/rdbms/orcl/orcl/trace/orcl_j000_10411.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11_2_0_2_0/dbhome_1
System name:    Linux
Node name:      localhost.localdomain
Release:        2.6.38.7-server-1mnb2
Version:        #1 SMP Sun May 22 22:59:25 UTC 2011
Machine:        i686
Instance name: orcl
Redo thread mounted by this instance: 1
Oracle process number: 25
Unix process pid: 10411, image: oracle@localhost.localdomain (J000)

*** 2011-11-28 18:05:41.091
*** SESSION ID:(151.35) 2011-11-28 18:05:41.091
*** CLIENT ID:() 2011-11-28 18:05:41.091
*** SERVICE NAME:(SYS$USERS) 2011-11-28 18:05:41.091
*** MODULE NAME:(DBMS_SCHEDULER) 2011-11-28 18:05:41.091
*** ACTION NAME:(HELLOSERVJOB) 2011-11-28 18:05:41.091

HelloServer bound in registry
and this process is listening into the default RMI port 1099, we can see that using:
-bash-4.2$ netstat -anp|grep ora_j0
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 :::19189                    :::*                        LISTEN      10411/ora_j000_orcl
tcp        0      0 :::1099                     :::*                        LISTEN      10411/ora_j000_orcl 
and that's all, we can connect using an RMI client from another client session, for example:
SQL> create or replace and compile java source named "mytest.HelloClient" as
package mytest;
import java.rmi.Naming;
import java.rmi.RemoteException;
public class HelloClient {
    Hello obj = null;
    public HelloClient() {
        try {
            obj = (Hello)Naming.lookup("//localhost/HelloServer");
        } catch (Exception e) {
            System.out.println("HelloApplet exception: " + e.getMessage());
            e.printStackTrace();
        }
    }
    public String sayHello() throws RemoteException {
        return obj.sayHello();
    }
    public int nextCount() throws RemoteException {
        return obj.nextCount();
    }
    public static void main(String[] args) throws RemoteException {
        HelloClient helloClient = new HelloClient();
        System.out.println(helloClient.sayHello());
        System.out.println(helloClient.nextCount());
    }
}
/
SQL> create or replace procedure HelloClient(srvName IN VARCHAR2) as LANGUAGE JAVA NAME
'mytest.HelloClient.main(java.lang.String [])';
/
SQL> set define ?
SQL> set serverout on
SQL> exec dbms_java.set_output(32000);
SQL> exec HelloClient('HelloServer');
0
SQL> exec HelloClient('HelloServer');
1
Note that the server is state-full which means that preserve the state across calls, if we exit from above SQLPlus and connect again we will see that:

SQL> set define ?
SQL> set serverout on
SQL> exec dbms_java.set_output(32000);
SQL> exec HelloClient('HelloServer');
2
SQL> exec HelloClient('HelloServer');
3

Concluding this post I would like to remark that this parallel shared server process is running into RDBMS space and is not like starting an RMI server into the middle tier. the big difference is that all SQL access is implemented accessing directly to the RDBMS structures into the SGA because is using the internal JDBC driver.

Friday, November 25, 2011

New release of Lucene Domain Index based on 3.4.0 code

This new release of Lucene Domain Index (LDI) has been in the new SVN for a long time, but due a lot of works with the commercial version Scotas never went public in binary installation :(
Several thing happen during this time:

  • New web site (thanks a lot to Peter Wehner for the conversion from the Google docs)
  • New hosting at SF.net (now is separate SVN from the previous one CVS at DBPrism)

The change log of this version is:


  • Use latest merge policy implementation TieredMergePolicy
  • Use total RAM reported by getJavaPoolSize() when setting MaxBufferedDocs
  • Better error reporting when an Analyzer is not found.
  • Replaced execute immediate with open-fech-close functionality to avoid core dump on 10g when double check for deleted rowid
  • Included a back-port version of JUnit4 to jdk1.4 version for 10g releases
  • Added a parallel updater process, when working in OnLine mode this process do write operations on LDI structure on behalf of the AQ process
  • Delete do not longer required a write exclusive lock on index storage, now deletes are also en-queued as inserts or updates
  • Updated source to Lucene 3.4.0 code, removed some deprecated API

Download latest binary distribution at 3.4.0 directory of SF.net download area (tested with 10g/11gR2).
The addition of a new parallel shared server process is the major change which speed up a lot DML operations, I'll write in a new post on how this parallel shared server technique works.
Please report any issue during the installation or bugs at the Support Area of the project.

Monday, September 26, 2011

Long time no post


I found that my last post was on Sep. 14 2010, that's too bad.
The reason of that is the startup called Scotas I worked on that almost since December of 2010 to engineering a next level of Oracle and Lucene project integration, this is by adding the Solr stack.
Its a natural evolution, Solr provides a lot of common functionality working on top of  Lucene required for the enterprise level solution.
But the integration required a set of implementation challenges to work, once of them is the implementation of long live shared server process running in Java and which implements a new dedicated HTTP listener working as an Oracle slave process, I will explain this topic on another technical post.
Another additions included in a set of products are the push technology spread on the integration with Solr running externally, ElasticSearch, Cassandra and HBase.
Behind this technology we extended the idea of Oracle/Solr integration, the Oracle ODCI API,. which enabled a NRT (Near Real Time) synchronization of the enterprise data with NoSQL layer for example.
Near Real Time means that once you commit the changes on the RDBMS layer thy are propagated automatically to the other side practically with 0 delay, and for the two way replicator such as Solr PC and ElasticSearch deletions are in real-time avoiding the false positive hits of deleted rows.
Near Real Time synchronization is declarative, no programming effort is required and allows multiples columns of one table, or multiples columns of many tables connected by a foreign key.
Well next week is the OOW 2011, I'll Be there and I Hope You Will, Too and for anybody that are coming to the event and are interested on that technology we could meet there, just drop me an email to mochoa-at-scotas.com

Tuesday, September 14, 2010

New Oracle Lucene Domain Index release based on Lucene 3.0.2

Just a few words to announce a new release of Oracle Lucene Domain Index, this zip is valid for 10g and 11g database version (10g using back-ported classes from 1.5 to 1.4)
This release is compiled using Lucene 3.0.2 version and incorporates a set of new features added, here the list:


  • Added a long awaited functionality, a parallel/shared/slave search process used during a start-fetch-close and CountHits function
  • Added lfreqterms ancillary operator returning the freq terms array of rows visited
  • Added lsimilarity ancillary operator returning a computed Levenshtein distance of the row visited
  • Added a ldidyoumean pipeline table function using DidYouMean.indexDictionary storage
  • Added test using SQLUnit

The bigger addition is the Parallel-Shared-Slave search process, this architectural change was in my to-do list for a long time and finally I added in this release :)
The idea behind this is to have a new Oracle process started by the DBMS_SCHEDULER sub-system during the database startup process and stopped immediately before shutdown.
Now this process is responsible for implementing the ODCI methods start-fetch-close/count-hit on behalf of the client process (process associated to an specific user session) which connect to the shared-slave process by using RMI.
With this new architecture we have two principal benefits:

  • Reduce memory consumption
  • Increase Lucene Cache Hits

Less memory consumption because the internal OJVM implementation is attached to a client session, so the Java space used by Lucene structures is isolated and independent from another concurrent session, now all Lucene memory structures used during index scan process are created in a shared process and then not replicated.
Also if one session submits a Lucene search, this search is cached for subsequent queries, all subsequent queries coming from the same client session or any other which are associated to the same index and with the same Query string implies a hit.
I'll explain more in detail this new architecture in another post also showing how many parallel process can work together when using Parallel Indexing and Searching.
On the other hand next week I'll be at the Oracle OpenWorld 2010 in SFO presenting the session:

Schedule: Tuesday: 09:30AM
Session ID: S315660
Title: Database Applications Lifecycle Management
Event: JavaOne and Oracle Develop
Stream(s): ORACLE DEVELOP
Track(s): Database Development
Abstract: Complex applications, such as Java running inside the database, require an application lifecycle management to develop and delivery good code. This session will cover some best practices, tools, and experience managing and delivering code for running inside the database, including tools for debugging, automatic test, packaging, deployment, and release management. Some of the tools presented will include Apache Maven, JUnit, log4j, Oracle JDeveloper, and others integrated into the Oracle Java Virtual Machine (JVM) development environment.

See you there or at any of networking planned events :)

Tuesday, June 29, 2010

Near-shore development in Tandil-Argentina

This post is bit off topic about Oracle, but not at all.
I remember read by a first time the term near-shore development during OOW08 in a Information Week newsletter free at OTN Lounge.
Two years ago I see the increasing on development using this model in my City Tandil, 400Km far from Buenos Aires in Argentina.
Specially, I am working as external consultant with the company Temperies which was founded and growing in Tandil unlike other companies which are part of the Technology Park and came from Buenos Aires.
But why this market still growing in Tandil with well know economy changes?
First Why Tandil?

  • Is a very nice small city (130000 habitants) surrounded by hills where the quality of life makes the difference in term of productivity of the development team, this is one of the bigger difference against other Argentine cities such as Buenos Aires, Cordoba or Rosario. Any worker can get his workplace in less than five minutes walking.
  • Has a recognized University in Computer Science.
  • Is cheap to live here, so salaries are not so high.

Second Why Argentina?
  - Is a country with a big difference in term of US$/Euro exchange rate compared with another countries like Chile for example, and believe me when an near-shore project is evaluated this point is very important because with the same money you can do the project several times :)
  - Most of the computer science graduated and under-graduated students have very good knowledge of English.
  - The time zone of Argentine is in middle of Europe and US West coast, so projects can be perfectly managed around the team synchronization, if we work with San Francisco, we work during our afternoon, if we work with Germany, we work during our morning, and thats all.
Finally I want to remark that there are plenty of offers in term of Software Factories in Tandil, from small companies with 10 to 20 employees to big factories with more than one hundred employees to fit with every need.
As I set at the beginning of this post I am working with Temperies a mid-range Software Factory which, in addition to the list of goodness exposed previously, have an strong commitment with Agile practices, Oracle and a widely experience with success stories in near-shoring projects around the world, and obviously the CIOs are good friends of mine :)
Well if anybody want to go deeper in this area just drop me an email to marcelo.ochoa gmail.com or see you at the Oracle Open World in September.

Tuesday, June 8, 2010

Dealing with JDK1.5 libraries on Oracle 10g

Modern libraries are compiled with JDK 1.5 and the question is How to deal with these libraries on an Oracle 10g OJVM.
Some examples are Lucene 3.x branch or Hadoop. The solution that I tested is using a Java Retro Translator and some complementary libraries.
I have tested this solution in Lucene Domain Index 3.x branch with success.
As you can see on the CVS there is build.xml file which performs all the retro translator steps. Here an step by step explanation of the process:

  1. Load all required libraries provided by Retro translator project which implements features not available on JDK 1.4/1.3 runtime, this is done on the target load-retrotranslator-sys-code.  This target loads many libraries on SYS schema due are immutable, or with low probability of change. It will change if we upgrade a retro-translator version. All libraries are then compiled to assembler using NCOMP utility, target ncomp-runtime-retrotranslator-sys-code.
  2. Then we can convert libraries compiled with JDK1.5, in this build.xml file the Lucene and Lucene Domain Index implementation, to a JDK1.4 target runtime. This is done on the targets backport-code-lucene and backport-code-odi, on first target We converts all Lucene libraries excluding JUnit and Test code, these libraries require as a dependency JUnit and retro-translator jars. Second target converts Lucene Domain Index jar depending on Lucene core and Oracle's libs. The back-port operation generates a file named lucene-odi-all-${version}.jar with Lucene and Lucene Domain Index code ready to run on JDK1.4 runtime.
  3. Once We have the code back-ported to a JDK1.4 runtime We can upload and NCOMP into Oracle 10g, this is done on targets load-lucene-odi-backported-code and ncomp-lucene-all.
And that's all!!, the code works fine on my Oracle 10.2 database - Linux :), finally users of 11g and 10g databases can deploy Lucene Domain Index implementation using one distribution file.

Saturday, May 8, 2010

Auto complete functionality with latest Lucene Domain Index

A few days ago I uploaded two new releases of Oracle Lucene Domain Index, once based on Lucene 2.9.2 core base (10g, 11g) and another based on 3.0.1 release (10g/11g).
The question is why 3.0.1 release only have one installation file?
This is because the code base of Lucene 3.x branch is only compatible with JDK1.5 so to get Lucene 3.x release working on 10g databases which is based on JDK1.4 I included a retro-translator, this library gets code compiled in 1.5 format and converts it to 1.4, I'll explain more in details this process in another post.
The important point is, I want to still supporting Lucene Domain Index for Oracle 10g because the installed base of this release is big even with the end of official support next July.
On the other hand this new release includes another great contribution from Pedro Pinheiro, an auto-complete pipeline table function. This reinforce the goal of Lucene Domain Index that with a few new classes and some PLSQL wrapper you can extend LDI to your need.
Here a simple example:
I am creating and populating a simple table with a english-spanish dictionary lookup:
create table dicc (
   term varchar(256),
   def  varchar2(4000))
/
-- Populate dictionary with 10K terms and definitions
@@dicc-values
then a Lucene Domain Index for auto-complete functionality:
create index dicc_lidx on dicc(term) indextype is lucene.luceneindex
parameters('ExtraCols:def;LogLevel:INFO;FormatCols:TERM(ANALYZED_WITH_POSITIONS_OFFSETS);PerFieldAnalyzer:TERM(org.apache.lucene.analysis.WhitespaceAnalyzer),DEF(org.apache.lucene.analysis.StopAnalyzer)');
Note that TERM column is analyzed storing term positions offset.
With this index created we can query using auto complete pipeline table function as follow:
SQL> select * from table(lautocomplete('DICC_LIDX','TERM','th',15)) t;

TERM    DOCFREQ
there       3
theory     2
thaw       2
then        2
therefore 2
thence     1
their        1
thanks     1
theft        1
theatrical 1
the          1
theme     1
that         1
thermal   1
thank      1
15 rows selected.
Elapsed: 00:00:00.01
First argument of this function is your index name, second argument is the column used for auto complete, third argument is the string used for lookup and last argument is how many terms are returned. By default rows are returned order by docFreq descending. Here other example:
SQL> select * from table(lautocomplete('DICC_LIDX','TERM','spor',10)) t;
TERM      DOCFREQ
sport         3
sportsman 1
sporadic   1
Elapsed: 00:00:00.02
For the example table which includes 10102 rows the execution time of above examples is around 21ms, not bad for a notebook.
Another new feature of this release is parallel index on RAM, which is enable by default with this release, indexing on RAM means that when you are working in OnLine mode a batch of new rows to be added to the index are processed in parallel (ParallelDegree parameter) more information is on Lucene Domain Index documentation on-line, if you have a server with multi-core processor or a RAC installation with sufficient RAM this feature speed up your indexing time by 40% eliminating the BLOB access during a partial index creation.
Well next post will be about how to deal with Libraries compiled using JDK1.5 on Oracle 10g Databases. Stay tunned...