Select Page

How to Increase the Maximum Connections on Apache Tomcat

The maximum number of connections that Apache Tomcat can handle is defined within the Tomcat settings. In essence, this setting determines the number of concurrent connections that can be handled by Apache Tomcat.

Out of the box, Apache Tomcat is configured to handle around 200 simultaneous connections (kind of…), which for most web servers and projects is more than adequate. But when you get into the realms of millions of monthly active users and concurrent connections, you’ll soon need to optimise Tomcat to suit your needs accordingly.

Ultimately though, the number of simultaneous connections that the Apache Tomcat software on your web server is capable is handling is determined by the physical limitations of your web server. The bigger your hardware, the bigger the number of connections that Apache Tomcat is capable of handling. This being said, the 200 maximum connections as a default is merely an arbitrary number, so you’ll need to assess this in relation to the hardware you are running on. The bigger the server hardware you’re running on, the higher the number of maximum connections on Apache Tomcat your server will be able to handle. As always, this can change depending on what you are doing. For example, if you have a process on your really fast server that is calling a third party dog slow server, then this soon becomes the limiting factor in the chain.

Max connections from an Apache Tomcat perspective is calculated based on a number of factors such as the number of simultaneous connections multiplied by the maximum execution time that each thread is capable of handling. Under normal circumstances this can appear to be behaving correctly, so keep in mind exceptional circumstances when things don’t quite perform as you wish. What you’ll soon spot (via the Tomcat Manager) is that the Max Connections soon starts to creep up and eat up all the resources on your server. This can be caused by many factors, from long running processes on your own server, or code on that is calling third party servers which may be running slow and causing problems.

With all this in mind, let’s take a look at how to increase the maximum connections on Apache Tomcat. It’s actually quite simple to do.

Run the command;

 


sudo nano /usr/share/tomcat8/conf/server.xml

 

This command will open the editor to edit the file Server.xml which controls many of the settings required for configuring Apache Tomcat. You’ll note that it is Apache Tomcat 8 that is being used in this example, so edit this accordingly for the version of Apache Tomcat that you are using. Again, depending on your individual configuration of your server, this file may live in a different file path. So to easily find the file if the above command doesn’t work, then run the command below;

 


find / -type f -name server.xml

 

The above command will search for a File (f) called “server.xml”

What you’re looking for within this file is the setting for “maxConnections” which will have a value set. You may not have this variable set so look out for the core settings for the port that your Tomcat instance is listening on, for example;

 


<Connector port=”8080” protocol=”HTTP/1.1” maxConnections=”200”/>

 

The maximum number of connections that the server will accept and process at any given time. When this number has been reached, the server will accept, but not process, one further connection. This additional connection be blocked until the number of connections being processed falls below maxConnections at which point the server will start accepting and processing new connections again.

Depending on your specific requirements, you may also need to increase the maxThreads setting too. Hope that helps getting you started with tuning Apache Tomcat for your specific needs.

How to Make Apache HTTPD Service Start Automatically on Server Reboot and Startup

If you’re looking for a quick answer and you’re running a common version of Linux, just type this when you are SSH’d into your server;

 


sudo chkconfig httpd on

 

Now for a bit more technical details about what this all does.

 

What is chkconfig?

The Chkconfig command in Linux is used to setup, view, or change services that are configured to start automatically during the system startup which abstracts some of the underlying settings that are stored within the /etc/rc.d/init.d/ directory. Chkconfig basically helps you to easily make service start or not when your system reboots.

Some handy chkconfig commands that you’ll find handy;

  • chkconfig –list (dash dash)
    • This will list the details of all services that are running on startup
  • chkconfig –list httpd (dash dash)
    • This allows you to list specifically the details of the service you care about, to avoid you having to trawl through the entire list. Simply change ‘httpd’ to whatever service name you are interested in looking at.

When you start to look through the outputs from the above commands, it may be a little confusing seeing 7 levels, some with a status of on and some with a status of off, but what do they all mean? The different levels are what is known as Linux Runlevels. Generally speaking, you probably don’t need to worry about those at all. But if you do need some fine grain control over what happens and when, when that may be covered in a future blog post.

MySQL Fulltext Search Performance With and Without Table Indexes

I thought it was handy to do a bit of performance testing with MySQL Fulltext Search to quantify how much benefit a MySQL Index has while querying large data sets using MySQL Fulltext Search. The reason this came about was due to the performance of SQL queries running against the dataset powering the search engine behind https://www.tendojobs.com/. While there is a lot of complex technology behind the scenes powering the search functionality of Tendo Jobs, one piece of this technology mix is the MySQL Fulltext Search functionality. It was noticed that for certain queries the time to run the queries was sub-optimal. Hence, I decided to do a bit of performance testing on this.

MySQL Fulltext Indexes are essential when querying large datasets, although MySQL Fulltext Search doesn’t work on all data types, which is rather annoying. According to the official MySQL documentation on MySQL Fulltext Search, https://dev.mysql.com/doc/refman/5.7/en/fulltext-search.html, the only options for using MySQL Fulltext Search are on columns with the datatype of Char, Varchar, or Text. Which is kind of limiting as datatypes such as Blob can be extremely valuable for storing larger data sets, but hey. This the limit, so for MySQL Fulltext Search, that is what you’ve got to work with. So, onto the performance side.

I started with the dataset of searches run by users of Tendo Jobs to gather some real search data. From this I tested the performance of the queries when using both Indexes and No-Indexes. What was clear on both approaches was that the first query always took significantly longer than the subsequent queries for the same search term, so to avoid any potential discrepancies, this is highlighted below.

 

Time to Process Query with MySQL Fulltext Search No-Index

This is when there is no MySQL Fulltext Search Index on the relevant column;

 

 

 

Time to Process Query with MySQL Fulltext Search With-Index

This is when there a MySQL Fulltext Search Index on the relevant columns;

 

 

Comparison for Performance in Seconds

MySQL Fulltext Search with an Index clearly performs significantly better. Between 78% – 84% improvement by using an Index when querying datasets using MySQL Fulltext Search!

 

 

Summary

Just use Indexes, they are awesome. But it’s great to quantify this in terms of performance improvement on larger data sets, along with the limitations of MySQL Fulltext Search. As a final note, MySQL Fulltext Search is not designed to be a search engine, so bear this in mind. It is good up-until a point, so always keep an eye on performance when querying large datasets.

Netbeans with Apache Tomcat Throwing a Port Already In Use Error and How to Kill a Process on Windows

When developing through Netbeans in Java (my personal favourite) you may come across the annoying error now and again which is telling you that Tomcat cannot start due to the port being in use already. Often this is Port 8005 but it could be something else depending on how you have things configured. Either way, this ‘feature’ is rather annoying and requires a couple of steps to kill this off and get you back on track. Unfortunately the traditional IT solution doesn’t work in this scenario, turning Netbeans off and back on again, the issue persists.

This issue is caused when you are trying to run multiple applications simultaneously and something has got in a bit of a mess in the background.

Thankfully, the solution is rather simple when you know where to look on Windows, there are a few steps involved.

 

Find the PID (Process ID) that is hogging the Port 8005 that you need

To do this, either search for “Resource Monitor” in the Windows start menu, or, type at the command prompt, resmon.exe. Either way will open the resource monitor as you can see in the screenshot below. Look through the list of Ports and find the port that is being blocked, then look across to the PID column to find the Process ID, you’ll need this in the next step.

 

 

For those of you more familiar with the command line, just run the following command;

 

Netstat –a –o –n

 

Netstat is a command-line tool for displaying network statistics including displaying connections using the TCP/IP protocol, i.e. your application.

The –a flag displays all connections and listening ports

The –o flag displays the owning process ID associated with each connection

The –n flag displays addresses and port numbers in numerical form

Simple.

 

Next Force Kill the Process

To do this, you need to run Command Prompt as an Administrator. This will not work as a standard Windows user, you need to be an Administrator. Simply type;

 

Taskkill /F /PID 12345 (replace the process ID that you found earlier)

 

Done. You’re good to go again.

How to Use SQL_CALC_FOUND_ROWS and FOUND_ROWS() With LIMIT and OFFSET in a MySQL Query Using Java and JDBC

If you’ve come across this blog post, you’ve likely experienced the requirement that often crops up when paginating results while querying a MySQL database via Java. As a user, the user generally wants to see the first 10 results on one query, i.e. page 1, then the second 10 results on the next query, i.e. page 2, and so on. But what happens in the situation when the part of the application that displays information to the user wants to know the total number of results from the entire results set that without pagination? For example, to create a visual indication of some sort for the total number of results that are available to look through which could be used to display the total number of pages.

Traditionally you would create two completely separate queries against the database, one which includes the LIMIT and OFFSET parameters in the query and another that does not include these parameters. That’s two database connections, running two independent queries such as;

 

-- Get paginated results for query
SELECT * FROM table_name LIMIT 10 OFFSET 30;

 

Then running;

 

-- Get total number of results for query, excluding pagination
SELECT COUNT(*) AS NumberOfResultsFound FROM table_name;

 

And you know what, this is a perfectly good approach to take in a lot of scenarios and especially for simple – medium complexity level MySQL queries. Although where this approach soon falls down is when the SQL queries grow and grow, either due to database structure complexities through many table joins or over time as requirements expand. What you’ll soon find yourself doing is replicating the same complex (and time consuming to create/manage at the coding level…) SQL queries that need updating in two places and keeping in sync with zero discrepancies. And this isn’t ideal, lots of duplicated effort and open to errors with the two queries becoming out of sync.

Thankfully, there is a solution to this within MySQL itself, yet if you’ve come across this blog post you’ve probably realised after much searching around that this isn’t particularly well documented either at the MySQL level or at the Java and JDBC level. Hence the reason for writing this up, partially for others, but mainly so I also don’t forget how to do this in the future…

This is where SQL_CALC_FOUND_ROWS and FOUND_ROWS() parts to the queries come in handy.

For those of you reading this as a traditional database administration type person, you’ll likely be rather familiar with MySQL Workbench for administrating a MySQL database. So if you want to do this within MySQL Workbench, you can simply run the two commands sequentially;

 

-- Option 1
SELECT * FROM table_name; 
SELECT FOUND_ROWS() AS NumberOfRowsFound;
-- OR
-- Option 2
SELECT SQL_CALC_FOUND_ROWS * FROM table_name; -- Yes, no comma, that’s correct
SELECT FOUND_ROWS() AS NumberOfRowsFound;

 

Then this will produce the results you desire;

— Option 1;

150 Rows

— Option 2;

150 Rows

 

Yes, that’s the same information. Great. But you’ll notice that we haven’t added in the pagination aspects to the SQL query yet via the LIMIT and OFFSET query parameters. So let’s take a look at what happens when we do that;

 

-- Option 1
SELECT * FROM table_name LIMIT 10 OFFSET 30; 
SELECT FOUND_ROWS() AS NumberOfRowsFound;
-- OR
-- Option 2
SELECT SQL_CALC_FOUND_ROWS * FROM table_name LIMIT 10 OFFSET 30; -- Yes, no comma, that’s correct
SELECT FOUND_ROWS() AS NumberOfRowsFound;

 

Then this will produce the results you desire;

— Option 1;

40 Rows

— Option 2;

150 Rows

 

What you’ll notice here is the clear difference in the way MySQL handles the data when you add in the SQL_CALC_FOUND_ROWS into the second set of queries that is run. On the first query, when the SQL_CALC_FOUND_ROWS part is not present in the query, the NumberOfRowsFound is the total number of results that takes into account the LIMIT and OFFSET parameters, resulting in 40 rows, i.e. 10 + 30 = 40. Whereas the second query which includes the SQL_CALC_FOUND_ROWS as part of the query, then this completely ignores the LIMIT and OFFSET parameters, resulting in the desired behaviour for calculating the total number of rows within a MySQL query while ignoring the LIMIT and OFFSET parameters within the query. This is nice as this avoids having to run two duplicate queries as mentioned earlier.

But we’ve just done all of the above within MySQL Workbench which is designed specifically to manage MySQL sessions as needed with ease. Now try taking the same approach within your preferred Integrated Development Environment (IDE) via the SQL editor that is in there and you’ll soon see that this no longer works. Why? Well, quite simply IDEs aren’t dedicated MySQL environments, so they have likely cut corners when it comes to implementing the entire functionalities for MySQL within your preferred IDE. If yours works, great, leave a comment letting others know what you use, I’m sure others reading this would also be interested to know what you are using.

Then we move onto the Java level. Taking the traditional approach for a database connection which roughly follows the logic;

 

  • Create JDBC Connection using MySQL Driver
  • Create SQLQuery
  • Create PreparedStatement object based on the SQLQuery
  • Add in the relevant information to the PreparedStatment, replacing the ?s with the actual data
  • Execute the PreparedStatement
  • Read the ResultsSet and do what you need to do
  • Close the ResultsSet
  • Close the JDBC Connection

 

So taking the initial logic from earlier. We first need to run one query, then run a second query that returns the total number of results that doesn’t take into account pagination aspects of the query. Yet if we took the simple approach with Java, which is to run steps 1 – 8 above twice, then you’ll soon notice that the second query returns 0 for the NumberOfFoundRows on the second query, which is not the correct behaviour we are looking for. The reason behind this is because you are running the query as two distinct JDBC Connections, hence, the second query that is run is a new connection to the MySQL database and hence has no reference to what was run on the previous query.

Makes sense? No? Don’t worry. To test this yourself, give it a go. Create 2x pieces of code that replicates the pseudo code for steps 1 – 8 above, with the first query being the SELECT * FROM table_name LIMIT 10 OFFSET 30; and the second query being SELECT FOUND_ROWS(); and you’ll see that the second database query returns 0, which is clearly incorrect.

The reason for this is due to how MySQL handles sessions. And this is where this gets a little bit unclear in the official MySQL documentation, so if anyone has any specific details on this, again, please comment. Based on my own testing, it appears that MySQL has some form of session management, whereby a session is managed when a connection happens to the database. Meaning that we can take advantage of that at the Java level to utilise this.

So instead of steps 1 – 8 above, we take a slightly different approach to exploit MySQL and the SQL_CALC_FOUND_ROWS and FOUND_ROWS() functionality. In a nutshell, we do this by opening a connection, running two SELECT queries, then closing the connection. This allows us to achieve the desired result that we need.

 

  • Create JDBC Connection using MySQL Driver (aka. MySQL Session)
  • Create SQLQuery1 – SELECT SQL_CALC_FOUND_ROWS * FROM table_name LIMIT 10 OFFSET 30;
  • Create PreparedStatement1 object based on the SQLQuery1
  • Add in the relevant information to the PreparedStatment1, replacing the ?s with the actual data
  • Execute the PreparedStatement1
  • Read the ResultsSet1 and do what you need to do
  • Close the ResultsSet1
  • //then do the same again
  • Create SQLQuery2 – SELECT FOUND_ROWS() AS NumberOfRowsFound;
  • Create PreparedStatement2 object based on the SQLQuery2
  • Add in the relevant information to the PreparedStatment2, replacing the ?s with the actual data
  • Execute the PreparedStatement2
  • Read the ResultsSet2 and do what you need to do
  • Close the ResultsSet2
  • Close the JDBC Connection (aka. MySQL Session)

 

What you’ll notice when you take this approach in your Java code is that your database queries to achieve this will return exactly what you are looking for. i.e.;

 

  • Rows X – Y (based on pagination controlled by LIMIT and OFFSET MySQL parameters)
  • NumberOfRowsFound (Total number of rows, ignoring the LIMIT and OFFSET MySQL parameters)

 

Pretty neat really and this can save a hell of a lot of time when managing SQL queries at the Java and JDBC level when dealing with paginated data.

Still confused? I’m not surprised. Re-read again about 5x times and do some testing at the MySQL (via MySQL Workbench and via your preferred Java IDE) and Java levels. Still confused? Leave a comment J This is so poorly documented on the web, the above is simply from what I have found through extensive testing based on extremely minimal information. Hope this helps J

For completeness, here’s the not so useful official MySQL information on the FOUND_ROWS() option, https://dev.mysql.com/doc/refman/5.7/en/information-functions.html#function_found-rows.

And finally. It is not 100% clear how MySQL manages sessions at the moment looking at the official documentation. I’ll update this blog post as I find more information on the topic. i.e. What happens when multiple users do the same thing that is overlapping;

  • User 1 – SELECT * FROM table_name LIMIT 10 OFFSET 20;
  • User 2 – SELECT * FROM table_name LIMIT 2 OFFSET 30;
  • User 1 – SELECT FOUND_ROWS(); — Does this bring back #1 or #2?
  • User 2 – SELECT FOUND_ROWS();– Does this bring back #1 or #2?

For my tests, I have replicated the above scenario by adding in an artificial delay in between the two queries run by User 1, so I could then run the first query against a different table to produce a different number of results. What I found when running this test, is that MySQL is indeed rather smart in this area and manages the data correctly through some form of session management. This means that even when 1 – 4 above are run in this order, the second query for User 1, returns the number of FOUND_ROWS() from the first query for User 1, not the first query for User 2, which is the correct behaviour.

Hope this is of use to others who also come across this challenge.

How to Repeat A Cell Value Multiple Times In New Cell In Excel

Ok, this is one that had been bugging me for a while and I’ve finally got around to spending some time to figure out how to do this. Quite boringly, this is actually for creating a WordCloud from Google Analytics data, but hey. It has nice visuals and is a great way of visualising data. Firstly, hats off to people like Jason Davies for creating awesome tools like this for people to use. What a lot of the better WordCloud tools available online allow you to do is to create WordClouds with phrases, not simply just single words. Single word WordClouds can be handy, but they can often lose the context. So personally I was looking for something that could keep things grouped by phrases. Specifically I was looking to turn Google Analytics data, which is in the format of Data | Number of Times, into a WordCloud. On smaller data sets, this is relatively straight forward as you can simply manipulate the data in Excel and copy and paste data a number of times to get the required result. When data sets get large though, this solution soon becomes difficult. And here comes more Excel magic that allows you to manipulate Excel data into the format you need, in this case….

Data from;

  • Name | Number of Times (i.e. Web Developer Job in Manchester | 50

Data into;

  • Web Developer Job in Manchester
  • Web Developer Job in Manchester
  • Web Developer Job in Manchester
  • Etc.

And the beauty of Excel, this is possible with a carefully crafted formula! Whoop!

The simple answer for how to do this is, with A1 being the ‘Name’ in the example above and B1 being the ‘Number of Times’ in the example above;

 

=REPT(CONCATENATE(A1, CHAR(10)), B1)

 

Then to break this down a little for those who aren’t too familiar with Excel, here’s what all this means;

  • REPT: This function quite simply is for repetition whereby the function is, REPT(text, number_times).
    • Text    Required. The text you want to repeat.
    • Number_times    Required. A positive number specifying the number of times to repeat text.

  • CONCATENATE: This is simply joining two or more things together, i.e. CONCATENATE(“This”, “With”, “This”), which would result in, This With This
  • CHAR(10): This is the formula for inserting a specific character into a formula, in this case a New Line Break character

And that is it. This magical formula formats all of the data into what is required. When you are viewing this data in Excel, it won’t add the data into a new Cell, so simply copy and paste the whole data into another text document and you’ll see the data on new lines as you expect.

Pretty cool! Well I thought so anyhow 🙂