Friday, June 6, 2014

PHP memory consumption with Arrays and Objects (update: HHVM, phpng June-2014)

Lessons learned:
  • PHPng is significantly faster and uses significantly less memory
  • objects need more memory than arrays (+ 5-250 percent)
  • if array values are numeric, don't save them as strings!
  • saving 1M integers takes 33M of memory with PHPng (increase by factor 8)
  • saving 1M integers as strings takes 79M of memory with PHPng (increase by factor 20)
  • using SplFixedArray can reduce memory usage by 20-100 percent
  • avoid big Arrays and Objects in PHP whenever possible
    (don't use file('big_file') or explode("\n", file_get_contents('big_file')), etc.)
  • use streams whenever possible (fopen, fsockopen, etc.)
  • use generators when available with PHP 5.5 (RFC)
  • comparing 32bit to 64bit systems, memory consumption increases by 100-230 percent

Sunday, May 4, 2014

For or Foreach? PHP vs. Javascript, C++, Java, HHVM (update: HHVM v3)

Lessons learned:
  • Foreach is 4-5 times faster than For
  • Nested Foreach is 2-3 times faster than nested For
  • Foreach with key lookup is 2-3 times slower than Foreach without
  • C++ is 5-300 times faster than PHP running For/Foreach on Arrays
  • HHVM is 2-6 times faster than PHP
  • HHVM is currently no alternative to C++
  • Javascript is 2-20 times slower than C++/Java running For on nested Arrays

Friday, March 28, 2014

Recursion with PHP v5.4, HHVM 3.0/3.1, Javascript, Java and C/C++

Lessons learned:
  • HHVM runs recursive function calls 13-20 times faster than PHP
  • HHVM runs recursive function calls 25-50 percent slower than Javascript
  • HHVM runs recursive function calls 4-5 times slower than Java
  • HHVM runs recursive function calls 2-8 times slower than C

strpos() vs. preg_match() vs. stripos()

Lessons learned:
  • strpos() is 3-16 times faster than preg_match()
  • stripos() is 2-30 times slower than strpos()
  • stripos() is 20-100 percent faster than preg_match() with the caseless modifier "//i"
  • using a regular expression in preg_match() is not faster than using a long string
  • using the utf8 modifier "//u" in preg_match() makes it 2 times slower

Sunday, February 10, 2013

PHP caching: shm vs. apc vs. memcache vs. mysql vs. file cache (update: fill apc from cron)

Lessons learned:
  • shm/apc are 32-60 times faster than memcached or mysql
  • shm/apc are 2 times faster than php file cache with apc
  • php file cache with apc is 15-24 times faster than memcached or mysql
  • mysql is 2 times faster than memcached when storing more than 400 bytes
  • memcached is 2 times faster than mysql when storing less than 400 bytes
  • php file cache with apc is 2-3 times faster than normal file cache
  • php file cache without apc is 8 times slower than normal file cache

Wednesday, February 6, 2013

Next generation refactoring with syntactical grep and patch from pfff

Refactoring of PHP methods is often difficult:
  • syntax errors or non-existing methods are only detected during runtime
  • wrong method calls or uninitialized variables are only detected during runtime
  • wrong order of parameters often remains undetected
  • not enough unit tests to validate the changes
  • not many resources for refactoring (time + money)
We can solve most of these issues by using a few tools for syntactic analysis.

Friday, February 1, 2013

Analyze log files from several servers in real-time (update: whois, firewall)

First, we setup a machine to analyze the logs:

# install missing packages (e.g. with Ubuntu 12.10)
apt-get install netcat logtop

# create a ramdisk with 1GB for storing the logs
mkdir /ramdisk
mount -t tmpfs -o nosuid,noexec,noatime,size=1G none /ramdisk

# receive logs on port 8080
ncat --ssl -l -k 8080 > /ramdisk/access.log
# open second terminal
tail -f /ramdisk/access.log | logtop

# clean up the ramdisk from time to time
echo >/ramdisk/access.log

Thursday, January 10, 2013

How to implement really small and fast ORM with PHP (Part 7: IDE)

Fork me on GitHub

Queries are gaining more and more complexity, data is getting bigger and bigger. Most optimizations in database technology are done in the database server. This is an approach to optimize queries on the client side.

With this ORM, queries ...
  • don't select more data than needed
  • contain less joins when data is expected to be consistent
  • can be written manually in pure SQL
  • are not written in a new query language

Performance of integer casting

Lessons learned:
  • $var+0 is as fast as (int)$var
  • intval($var) is 2.5 times slower than (int)$var
  • is_numeric($var) is 3 times slower than (int)$var
  • is_numeric($var) is 7 percent slower than intval($var)
  • settype($var) is 40 percent slower than is_numeric($var)

Wednesday, January 2, 2013

Make PDFs searchable

I searched for a good way to make scanned documents searchable. Most newer scanning software already has some OCR built-in, but what about all the old documents? Using pdfsandwich and Tesseract, we recover the text from each page of a PDF and put it behind each page as an invisible layer. That way, we can search the PDF with a normal PDF reader or upload it to Google translate to get a translated version. To get a text-only version, pdftotext can be used.

Friday, December 28, 2012

Top 10 articles 2012

Article Published Views
Using V8 Javascript engine as a PHP extension Jul 22 8458
Create PDF invoices with HTML5 and PhantomJS Dec 8 2006
How to write a really small and fast controller with PHP Jul 17 1320
MySQL or MySQLi or PDO Jul 21 957
PHP Framework Comparison Oct 19 916
Mass inserts, updates: SQLite vs MySQL Sep 13 585
How to implement really small and fast ORM with PHP Oct 16 524
PHP memory consumption with Arrays and Objects Jun 23 465
For or Foreach? PHP vs. Javascript, C++, Java, HipHop Jun 27 444
How to implement a real-time chat server in PHP using Server-Sent Events Oct 29 386

Views from 6/19/12 to 12/28/12: 23464

Top 5 CountryViews
United States4680
Germany4109
China1268
United Kingdom1224
Russia982

Wednesday, December 26, 2012

Building PHP extensions with C++, the easy way (update: MySQL, threads)

Here is an easy way to build a PHP extension with C++ and default PHP packages on a Ubuntu system. I use SWIG to wrap C++ code to the Zend API. When using loops and recursion intensively, porting a few functions to C++ can give you some extra power.

Saturday, December 8, 2012

Create PDF invoices with HTML5 and PhantomJS

Creating invoices in PDF is always a bit tricky: there are many libraries to create PDF documents with PHP, but most can't handle complex layouts and require a lot of memory and CPU time. Things like Unicode characters and line/page breaks are often difficult to program and the source of many bugs and memory problems. Using logos as vector graphics or embedding TrueType fonts is often required, but not possible with most libraries.

Sunday, December 2, 2012

The power of column stores

  • using column stores instead of row based stores can reduce access logs from 10 GB to 130 MB of disk space
  • reading compressed log files is 4 times faster than reading uncompressed files from hard disk
  • column stores can speed up analytical queries by a factor of 18-58

Tuesday, November 27, 2012

Runtime vs. memory usage

Oftentimes, better runtime can result in higher memory usage. Here is an example to create some strings to test bulk inserts on Redis:

Tuesday, November 20, 2012

Developers's time of permanency is 15 years?

This article is not about performance of PHP, but more about performance of developers in general.
A nice article from golem.de claims that the developers's time of permanency is - exactly - 15 years.

Tuesday, November 6, 2012

Applying Scrum to legacy code and maintenance tasks

There are some problems with Scrum that mainly occur when dealing with third party components or legacy systems:
  • wrong estimations (time, impact, risk, complexity)
  • bad requirements (inconsistent, incomplete, testable, conflicting, faulty)
  • development involved in operations (bug analysis, data correction, deployment, hot-fixes)
  • delays in development (bugs in legacy system, missing documentation)
  • testing (quality/quantity of test cases, un-mockable interfaces, long running offline processes, performance issues, live and test system differ)

Monday, October 29, 2012

How to implement a real-time chat server in PHP using Server-Sent Events (update: added C benchmark)

Lessons learned:
  • A web server written in PHP can give more than 10000 req/s on small hardware
  • A web server written in PHP is not slower than being written in Java (without threads)
  • A web server written in PHP is 30 percent slower than being written in C (without threads)
  • Realtime applications can be developed in PHP without problems

Friday, October 19, 2012

PHP Framework Comparison (update: opcode caching)

Reading about PHP frameworks, you'll get a lot about Symfony and Zend Framework. It is true that these frameworks have a lot of features and do great marketing. But what about performance and scalability?

Friday, September 14, 2012

Members, __set, __get, ArrayAccess and Iterator

Lessons learned:
  • __set() is 5 times slower than setting a member
  • __get() + __set() is 13 times slower than incrementing a member
  • Iterator is 4 times slower than using a member
  • ArrayAccess is 3 times slower than setting an element in a member array
  • ArrayAccess is 6 times slower than incrementing an element in a member array

Thursday, September 13, 2012

Mass inserts, updates: SQLite vs MySQL (update: delayed inserts)

Lessons learned:
  • SQLite performs inserts 8-14 times faster then InnoDB / MyISAM
  • SQLite performs updates 4-8 times faster then InnoDB and as fast as MyISAM
  • SQLite performs selects 2 times faster than InnoDB and 3 times slower than MyISAM
  • SQLite requires 2.6 times less disk space than InnoDB and 1.7 times more than MyISAM
  • Allowing null values or using synchronous=NORMAL makes inserts 5-10 percent faster in SQLite

Sunday, August 12, 2012

How to implement a real life benchmark with PHP

To determine the maximum capacity of a web page, Apache ab is often used in the first step. Fetching one URL very often is optimal for caching and gives a best case. To get the worst case for caching, it is necessary to fetch different URLs in a random order.

Monday, August 6, 2012

MySQLi prepared statements

Lessons learned:
  • Prepared statements are 13 percent faster than normal statements with escaping
  • Prepared statements are 8 percent faster than normal statements without escaping
  • To get improvements, you need at least 10000 inserts for 1 statement
  • Using insert...set is 0.5-1 percent faster than insert...values

Thursday, August 2, 2012

ircmaxell: Framework Fixation - An Anti Pattern

ircmaxell: Framework Fixation - An Anti Pattern, short summary:
  • delegation of architecture decisions to frameworks may not be optimal or even wrong
  • only use frameworks when doing prototypes or projects you don't need to maintain
  • frameworks don't save time/money in the long term
  • frameworks don't make it easier to hire good programmers
  • using a framework prevents developers from understanding backgrounds
  • not all framework developers are super heroes (look at the bug trackers ...)
  • favor libraries over frameworks

Sunday, July 22, 2012

Using V8 Javascript engine as a PHP extension (update: write PHP session)

"We Are Borg PHP. We Will Assimilate You. Resistance Is Futile!"
Just got to something described as: This extension embeds the V8 Javascript Engine into PHP.
It is called v8js and the documentation is already available on php.net, examples and the sources are here. V8 is known to work well in browsers and webservers like node.js, but does it work inside PHP? YES!

Saturday, July 21, 2012

MySQL or MySQLi or PDO

Lessons learned:
  • MySQLi is 3-4 times slower than MySQL when fetching less then 500 datasets
  • MySQLi is 2-4 times faster than MySQL when fetching more than 500 datasets
  • PDO is 2-5 times slower than MySQL/MySQLi
  • Unbuffered queries are 15-40 percent faster than buffered queries in MySQLi
  • Unbuffered queries are 10-25 percent faster than buffered queries in MySQL for less than 10000 datasets
  • Unbuffered queries are 3-7 percent slower than buffered queries in MySQL for more than 10000 datasets
  • Unbuffered queries are 0-5 percent faster than buffered queries in PDO
  • Non thread safe versions of PHP on win32 are 50 percent faster than thread safe versions

Friday, July 20, 2012

ircmaxell: Is Autoloading A Good Solution?

ircmaxell: Is Autoloading A Good Solution?: at a 75% class usage tradeoff point, it doesn't really make sense not to autoload, especially given all of the other benefits. So in the end, it looks like autoloading is indeed a good solution...
From the comments: Simply enabling APC sped up the fixed requires by 82%, and the autoloading by an amazing 91%.

Wednesday, July 18, 2012

Decorator or Subclassing?

Using anonymous functions in PHP is very nice to implement a decorator, but what about performance? Results:
  • Subclassing is 40 percent faster than using a decorator
  • Subclassing might require a bit more code

Tuesday, July 17, 2012

How to write a really small and fast controller with PHP (update: benchmark Slim, Silex, Zend Framework, Symfony2)

To handle a lot of traffic, we need a fast controller with very little memory overhead. First, we implement a dynamic controller.

The design is based on the micro frameworks Slim and Silex. The first example maps the URL "http://server/index.php/blog/2012/03/02" to a function with the parameters $year, $month and $day:
// index.php, handle /blog/2012/03/02
$app = new App();
$app->get('/blog/:year/:month/:day', function($year, $month, $day) {
  printf('%d-%02d-%02d', $year, $month, $day);
});

Thursday, July 12, 2012

How to implement i18n without performance overhead

i18n is always difficult to implement and costs a lot performance. Normally, implementations use gettext() or a custom t()-function to translate a string. t() searches a INI or XML file for a translation key and returns the value. For example t('setting', 'de'), gives the German translation 'Einstellung'.

Typical optimizations use associative arrays (hashmaps) loaded into APC or Memcached. This requires a lot of memory for the array and produces a lot of cpu cycles for calling t() all the time. So the question is, can we do this better?

Yes! We use a just-in-time compiler for our PHP files and write the compiled PHP files to disk, so APC can cache them like regular PHP files.

Array key lookup: isset() or array_key_exists() or @ ?

Lessons learned:
  • isset() is faster than array_key_exists()
  • array_key_exists() is faster than @
  • @ is slower than ignoring notices with error_reporting()

Tuesday, July 10, 2012

PHP Deployment with Dropbox

Deploying files over sFTP or scp is boring and takes a lot of time. Using git with commit-hooks to initiate the deployment process is comfortable, but there is a solution that's even faster:

Deploy your server(s) with a Dropbox!


Here we go, first create a new Dropbox account:

Tuesday, July 3, 2012

Development Principles - Think first programming

A professor once told me:
Remember, a string goes into the server, a string goes out. Not more, not less.

So let's ask the following questions:
  • do you need a seven tier architecture?
  • do you need to write a servlet container in PHP?
  • do you need to derive a class more than twice?
  • where do you need OOP?
  • do you need design patterns?
  • how to choose the right technology?

Saturday, June 30, 2012

Disadvantages of ORM

ORM has attracted a lot of attention in the last years. So let's get a bit deeper into it.
The biggest advantage of ORM is also the biggest disadvantage: queries are generated automatically
  • queries can't be optimized
  • queries select more data than needed, things get slower, more latency
    (some ORMs fetch all datasets of all relations of an object even though only 1 attribute is read)
  • compiling queries from ORM code is slow (ORM compiler written in PHP)
  • SQL is more powerful than ORM query languages
  • database abstraction forbids vendor specific optimizations

Tuesday, June 26, 2012

Things you should not do in PHP (update: references)

Here is a list of things you should not do in PHP. Most of the stuff is pretty obvious, but over the years I've seen a lot of them. In most cases, these problems remain hidden until data grows above 10000 entries. So on a development system, things are always fast and there are no problems with memory limits :-)

Suppose we have a table with 100k entries:
$db->query('create table stats (c1 int(11) primary key, c2 varchar(255))');
$db->query('begin');
for ($i=0; $i<100000; $i++) {
  $db->query('insert into stats values ('.$i.','.($i*2).')');
}
$db->query('commit');
Populate a big array instead of streaming results:
$result = $db->query('select * from stats');
$array = $result->fetch_all(); // 35M
// or
while ($row = $result->fetch_assoc()) $array[] = $row; // 35M
// or
while ($row = $result->fetch_array()) $array[] = $row; // 44.5M
// process $array ...

// instead of:
while ($row = $result->fetch_assoc()) { // 0.5M
  // process $row
}

Wednesday, June 20, 2012

Replace Smarty with PHP templates

In many performance guides, Smarty is considered to be removed to speed up things. But oftentimes it's not Smarty causing performance problems, but rather big modifier chains not being cached. To point this out, we need to profile our template which is quite difficult when Smarty compiles in into something unreadable. So we need a quick and easy way to replace the Smarty template engine with pure PHP code. Since Smarty can't do more than PHP, let's replace Smarty with simple PHP templates.

So I'm providing here a small guide to replace Smarty with simple PHP based templates. These can be also cached by APC without any compiler.

First thing: Smarty configuration files
e.g. core.conf
foo = bar

[core] 
logo = public/img/logo.png 
link = http://www.simple-groupware.de 
notice = Photo from xy 
bg_grey = #F5F5F5

Now let's convert it to PHP:
core_conf.php
<?php
  $config = array(
    "logo" => "public/img/logo.png",
    "bg_grey" => "#F5F5F5" 
  ); 
  $config["core"] = array( 
    "logo" => "public/img/logocore.png", 
    "link" => "http://www.simple-groupware.de", 
    "notice" => "Photo from xy"
  );

Tuesday, June 19, 2012

Agenda

Welcome to my first blog!


Yes, I'm a bit late in 2012, but I've collected some important things I'd like to write down and analyze in more detail.

In the daily IT business, it is often unclear if software should be optimized or better hardware should be ordered. This blog should help you to make this decision easier by looking at the benchmarks of possible software optimizations.

Here is a small agenda of the things I'll write in the next weeks:
  • The impact of APC/Optimizer+
  • APC vs Memcached (local vs distributed cache, refresh, serialization)
  • The power of cache warming
  • How to write a really small and fast AJAX controller with PHP
  • How to write a really small and fast O/R-mapper with PHP
  • How to write a really small and fast CMS
  • Why entity–attribute–value is bad for performance
  • How to benchmark a PHP framework
  • How to find the right PHP framework (or why to write your own)
  • How to implement client side session storage
  • Iteration vs Recursion
  • ...