Sunday, February 10, 2013

PHP caching: shm vs. apc vs. memcache vs. mysql vs. file cache (update: fill apc from cron)

Lessons learned:
  • shm/apc are 32-60 times faster than memcached or mysql
  • shm/apc are 2 times faster than php file cache with apc
  • php file cache with apc is 15-24 times faster than memcached or mysql
  • mysql is 2 times faster than memcached when storing more than 400 bytes
  • memcached is 2 times faster than mysql when storing less than 400 bytes
  • php file cache with apc is 2-3 times faster than normal file cache
  • php file cache without apc is 8 times slower than normal file cache

Wednesday, February 6, 2013

Next generation refactoring with syntactical grep and patch from pfff

Refactoring of PHP methods is often difficult:
  • syntax errors or non-existing methods are only detected during runtime
  • wrong method calls or uninitialized variables are only detected during runtime
  • wrong order of parameters often remains undetected
  • not enough unit tests to validate the changes
  • not many resources for refactoring (time + money)
We can solve most of these issues by using a few tools for syntactic analysis.

Friday, February 1, 2013

Analyze log files from several servers in real-time (update: whois, firewall)

First, we setup a machine to analyze the logs:

# install missing packages (e.g. with Ubuntu 12.10)
apt-get install netcat logtop

# create a ramdisk with 1GB for storing the logs
mkdir /ramdisk
mount -t tmpfs -o nosuid,noexec,noatime,size=1G none /ramdisk

# receive logs on port 8080
ncat --ssl -l -k 8080 > /ramdisk/access.log
# open second terminal
tail -f /ramdisk/access.log | logtop

# clean up the ramdisk from time to time
echo >/ramdisk/access.log

Thursday, January 10, 2013

How to implement really small and fast ORM with PHP (Part 7: IDE)

Fork me on GitHub

Queries are gaining more and more complexity, data is getting bigger and bigger. Most optimizations in database technology are done in the database server. This is an approach to optimize queries on the client side.

With this ORM, queries ...
  • don't select more data than needed
  • contain less joins when data is expected to be consistent
  • can be written manually in pure SQL
  • are not written in a new query language

Performance of integer casting

Lessons learned:
  • $var+0 is as fast as (int)$var
  • intval($var) is 2 times slower than (int)$var
  • is_numeric($var) is 2 times slower than (int)$var
  • is_numeric($var) is 7 percent faster than intval($var)
  • settype($var) is 3 times slower than (int)$var

Wednesday, January 2, 2013

Make PDFs searchable

I searched for a good way to make scanned documents searchable. Most newer scanning software already has some OCR built-in, but what about all the old documents? Using pdfsandwich and Tesseract, we recover the text from each page of a PDF and put it behind each page as an invisible layer. That way, we can search the PDF with a normal PDF reader or upload it to Google translate to get a translated version. To get a text-only version, pdftotext can be used.

Labels

performance (23) benchmark (6) MySQL (5) architecture (5) coding style (5) memory usage (5) HHVM (4) C++ (3) Java (3) Javascript (3) MVC (3) SQL (3) abstraction layer (3) framework (3) maintenance (3) Go (2) Golang (2) HTML5 (2) ORM (2) PDF (2) Slim (2) Symfony (2) Zend Framework (2) Zephir (2) firewall (2) log files (2) loops (2) quality (2) real-time (2) scrum (2) streaming (2) AOP (1) Apache (1) Arrays (1) C (1) DDoS (1) Deployment (1) DoS (1) Dropbox (1) HTML to PDF (1) HipHop (1) OCR (1) OOP (1) Objects (1) PDO (1) PHP extension (1) PhantomJS (1) SPL (1) SQLite (1) Server-Sent Events (1) Silex (1) Smarty (1) SplFixedArray (1) Unicode (1) V8 (1) analytics (1) annotations (1) apc (1) archiving (1) autoloading (1) awk (1) caching (1) code quality (1) column store (1) common mistakes (1) configuration (1) controller (1) decisions (1) design patterns (1) disk space (1) dynamic routing (1) file cache (1) garbage collector (1) good developer (1) html2pdf (1) internationalization (1) invoice (1) just-in-time compiler (1) kiss (1) knockd (1) legacy code (1) legacy systems (1) logtop (1) memcache (1) memcached (1) micro framework (1) ncat (1) node.js (1) openssh (1) pfff (1) php7 (1) phpng (1) procedure models (1) ramdisk (1) recursion (1) refactoring (1) references (1) regular expressions (1) search (1) security (1) sgrep (1) shm (1) sorting (1) spatch (1) ssh (1) strange behavior (1) swig (1) template engine (1) threads (1) translation (1) ubuntu (1) ufw (1) web server (1) whois (1)