For quite a while now we've had issues with one of our Apex installations. We have two completely parallel systems that independently process the same data - the business can access either one of these and get the same results (no rac/cluster - just two seperate systems that process the same inputs).
They are both running the same versions of apex/oracle/os but the one login page is massively slower than the other - why?
Both are using the EPG way of accessing the applications (the simplest setup i always find) - so why is the one much slower then the other. A quick comparison of the database parameters found no real differences so it would seem to be somewhere else (we had seen problems on other systems where shared_servers was set to too small a value but that wasn't a problem here).
We needed to narrow down the issue - the infrastructure is hugely complex with large numbers of routers, firewalls and proxis between the client and the application.
The simplest way to start the narrowing down was to take the network 'stuff' out of the picture and just retrieve the apex web page directly on the linux server - so how do do that?
Well wget has been around for a while and is perfectly suited to doing what we wanted to do - so we set up a very simple wget fetch to retreive the login page - if it was quick locally then the problem lies somewhere else in the infrastructure.
so lets retrieve the page on both servers - to do this we run the following command
wget http://localhost:7777/apex/apex
--22:54:18-- http://localhost:7777/apex/apex
=> `apex'
Resolving localhost... 127.0.0.1, ::1
Connecting to localhost|127.0.0.1|:7777... connected.
HTTP request sent, awaiting response... 302 Found
Location: f?p=4550:1:3358419253871 [following]
--22:54:18-- http://localhost:7777/apex/f?p=4550:1:3358419253871
=> `f?p=4550:1:3358419253871'
Connecting to localhost|127.0.0.1|:7777... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
[ <=> ] 9,935 --.--K/s
22:55:18 (165.64 B/s) - `f?p=4550:1:3358419253871' saved [9935]
Now this wokred fine on both servers however it took 60 seconds in both cases (actually way longer than either url access from a browser) - this then muddied the waters even more as it then appeared to be a localized box issue.....
However after a lot of trial and error (and i mean a lot - trying almost every switch that exists in wget) i came across the solution
adding this switch (--ignore-length) and the page comes back straight away on both boxes
wget --ignore-length http://127.0.0.1:7777/apex/apex
--22:58:23-- http://127.0.0.1:7777/apex/apex
=> `apex'
Connecting to 127.0.0.1:7777... connected.
HTTP request sent, awaiting response... 302 Found
Location: f?p=4550:1:22032694640760 [following]
--22:58:23-- http://127.0.0.1:7777/apex/f?p=4550:1:22032694640760
=> `f?p=4550:1:22032694640760'
Connecting to 127.0.0.1:7777... connected.
HTTP request sent, awaiting response... 200 OK
Length: ignored [text/html]
[ <=> ] 9,945 --.--K/s
22:58:23 (474.21 MB/s) - `f?p=4550:1:22032694640760' saved [9945]
The problem seems to be that the reported length and the actual length of the html are different )don;t know why this would be) - but wget sits there for 60 seconds waiting for extra data which it never gets (as there is no more) - it then gives up and completes the call.
So with the extra switch we know that the local box/oracle/apex setup is fine and the problem is somewhere else in the system.
The network/firewall team are now looking into things and a tcpdump of one of the interfaces seems to show some 'interesting' results - so hopefully now we are on the path to fixing it......
Comments
Post a Comment