[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ] [ 16 ] [ A ] [ B ] [ next ]


Smart Cache Manual
Chapter 7 - Advanced topics


7.1 Choosing a good cachesize

How to choose good cache size? It depends how you want to use it and how many daily traffic you have. Nowdays diskpace is cheap resource. Performance or runtime used memory do not depends on cache size.

100 MB is starting point - I never recommend cache size to be smaller than 100 MB. If you want smaller cache, you do not need it - simply use cache in your browser.

Next sections has some hints about choosing cache size.


7.1.1 Dial-up and offline browsing

Case #1: If you do not browsing at home very much (you have T1 line at work) and just want to see some hot breaking news sometimes and you are not interested mainly in offline browsing. 200 MB cachesize is enough.

Case #2: If you like offline browsing, you need a bigger cache. Start with 400 MB if you are not browsing very much. If you have some free diskspace or you are browsing very much, you can go up to 1GB. In any case use Data compression support, Section 7.10. It is also good to somewhat finetune GC in gc.cnf. My recommended settings:

     #two years in cache are enough
     reference_age 2y
     large_object_penalty 200000 *2.5 50000 +5
     negative_cached_penalty *5
     #disable faster expiration
     expired_penalty *1
     expired_but_checkable_penalty *1
     not_checkable_penalty *1

7.1.2 Know your daily traffic

You need to determine average daily traffic - how fast will grow your cache daily without doing -gc. It is preferred to use -fakegc operation for computing cache size.

If you are starting with empty cache, do not use numbers from first days. Empty cache will grow fast in first days and then will grow slower. Most people are visiting the same set of WWW sites. Let it run for 14 days; write down cache size, wait one week; compute difference and divide it by 7.


7.1.3 Standard network http-proxy

Standard mode of operation: leased line Internet, not planing to use offline browsing, use proxy cache for saving the bandwidth, about 50 computers connected to it.

In this case you should have cache size for holding about 1 Month traffic with 1GB minimum. I do not recommend to make it too much bigger than 1 Month traffic size, because in my tests it do not significantly improves cache hit rate. You can make it smaller if you have a heavy traffic site, but do not go under 14 days of traffic unless you really need it. Never go under 7days.


7.1.4 Home network, sharing connection

Start with cache size 500 MB. You should read standard http-proxy section also. Data compression support, Section 7.10 is your friend.


7.2 Configuring refresh patterns

Smart Cache can be configured how often will made checks if new version of page is available. This is done with keywords default_refresh_pattern and refresh_pattern in configuration file. The difference between them is that refresh_pattern contains URL mask, other arguments are the same.

If you want to play with them, setting trace_refresh yes and trace_url yes will provide you more information why page is being or not being loaded. If you can understand Java, go into file cacheobject.java and look at function needRefresh().

Arguments of these commands are:

[default_]refresh_pattern [URL mask] /Reload_age/ /Min_age/ /Lastmod_factor/ /max_age/ /Expire_age/ /Redirect_Age/

These numbers are floating-point times in minutes except /Lastmod_factor/ which is a fraction part < 1

Smart Cache's refresh algorithm in English. Page age is difference between date, when the page was loaded and today.

  1. If browser requests a forced page reload and page is older than /Reload_age/, reload it otherwise return old copy.

  1. If page is older than /Max_age/ load it.

  1. If page has expire date and page has expired and page age is bigger than /Expire_age/ load it.

  1. If page is redirect to other page and age is older than /Redir_age/ load it.

  1. If page is younger than /Min_age/ return cached copy.

  1. If page do not have last modified date, load it.

  1. Compute last mod factor: lmf= page_age / (page_date - page_last_modified)

  1. If lmf > /Lastmod_factor/ load it otherwise return cached copy.


7.3 How garbage collection works

Smart Cache uses real LRU based garbage collection. It remembers last access time to every object in cache. When GC runs last access value of every object is transformed to days.

Object's age is modified by various object's attributes (for example size or expiration age), size rules and best matching penalty rule are applied. Priority of penalty rules follows order in sample configuration file. Last step is to apply first matching urlmask rule (if any). Rules can be fine-tuned with high details in gc.cnf.

If age is bigger than reference_age or object is bigger than maximum_object_size or smaller than minimum_object_size object is immediately removed from cache without considering cache size.

After all objects in cache are scanned, there are sorted by age. When cache size is bigger than cache high mark GC starts cleaning with removing objects with highest score first until size of cache drops between high and low marks. GC prefers to clean as much as possible without needing of another cache scan, but never deletes files bellow low mark.


7.4 Cookies filtering

Some Web sites deal so called "cookies". These "cookies" are tags sent from the server to the browser, which enable the server to keep track of the sites that the user visits, and thus compromise his privacy.

As was requested by many users, Smart Cache has now built-in filter for Cookies. Smart Cache's cookie filter has now 2 working modes: incoming and outgoing. These modes are switched using allow_all_session_cookies.

In both modes browsers warns if incoming cookies are detected, so they will continue to display warnings. Just turn this warnings off. If you use Netscape 3+, you can disable confirmation messages for any cookies sent to your browser by going into Options->Network_Preferences->Protocols and checking off the box for Show an Alert before Accepting a cookie.


7.4.1 Outgoing cookie filter

This mode is set by allow_all_session_cookies false. In this mode all cookies sent by your browser will be filtered unless domain name is allowed in cookies.cnf.

Benefits of this solution:

DANGER: When using fake_cookie option (cookies filter itself do not harm), you can CRASH remote WWW site when sending back very long cookies (buffer overflow attack) or cookies with known name, but unexpected value (for example text instead of numeric input). Some versions of Microsoft Internet Information Server will crash entire (instead of just one http-child in Apache), so no new users can access this server until IIS is restarted.


7.4.2 Incoming cookie filter

This mode is set by allow_all_session_cookies true. In this mode all-session cookies are allowed from all sites and persistent cookies are allowed only from sites listed in cookies.cnf. If persistent cookies is not allowed - it is changed to session cookie. This is enough for keeping cookies-only www servers happy.

It is not a good idea to switch filter from outgoing to incoming mode without deleting all cookies in browser cache first.

Benefits of this solution:

Drawbacks:


7.5 Setting up logfiles

Smart Cache produce common or combined log file formats. Type of log can be switched by log_common.

Logs are wildcard masked, so you can log to multiple logs depending on URL requested. This is especially written for use in Web forwarding with Smart Cache, Section 9.7, but logs can be produced even if no forwarding is set.

Usage: access_log mask filename


7.6 Importing files to Smart Cache

If you are using SC for offline browsing, sometimes you may find importing files from outer sources (for example some CDs has offline copies of some servers) useful.

Place these files to directory structure, which has the exactly same name as original URLs of files. For example in /tmp make directory /tmp/www.javaworld.com/javaworld/jw-10-1999/ and copy necessary files to it, you can create any number of directories (even from different servers).

After that run java scache -import /tmp and files will be imported.

Smart Cache will check if newer version of imported file is not already available in cache. If not, file will be moved (preserving timestamp) to cache, if it fails, because file and cache directory are at different filesystems (disks), SC will copy file instead. Is wise to place files to the same filesystem as SC main data directory and clear READ-ONLY attributes.


7.7 Exporting data from Smart Cache

Cached data can be easily exported from Smart Cache. Data are exported in format, which can be used in Importing files to Smart Cache, Section 7.6.

Smart Cache can export last recently cached data in 3 modes:

  1. -export uses file's last modified date.

  1. -fullexport uses date when file was last checked against original server.

  1. -lruexport uses file's last access date.

Command line syntax is -[lru|full]export <Directory> <Timedelta>

Time delta is in format number unit with no space. (For example "1w" is one week). Supported time units are: d/D day, w/W week, m minutes, M months, y/Y years, h/H hours, s/S seconds.


7.8 Using Smart Cache for non-HTTP protocols

When Smart Cache gets request for unsupported protocol (for example ftp), it will forward it to ftp_proxy for FTP protocol or to http_proxy for rest. Result obtained from that will be cached.


7.9 Parent proxy authentication

A parent proxy login and password may be specified in the "http_proxy" or "ftp_proxy" configuration statements after the parent proxy port in a form login:password.

     Example:
     http_proxy my.cache.net 3128 mylogin:mypass

7.10 Data compression support

Smart Cache can compress text data while saving them to disk. This save significant amount (about 50%) of diskspace. If you want to do this set auto_compress 1 in scache.cnf, which compress files greater than 512 bytes. You can also set filesize limit instead 1, for example auto_compress 20000 will compress files bigger than 20k. Is recommended to use your blocksize as limit for compressing files.

These data will be sent to your browser in gzip compressed form. Your browser must know how to decompress them. Existing data can be compressed via Smart Cache repair utility, Chapter 11

If you see garbage on the screen, your browser can not handle compressed data. If you want still to use data compression set auto_decompress 1. Smart Cache will decompress data if your browser do not sent accept-encoding: gzip header. Some browsers do not send this header, but accept compressed data, so use auto_decompress only if necessary. If you want to ALWAYS decompress outgoing data, use value 2.

Any modern browser supports compressed HTML pages. Reported browsers which supports compressed HTML pages:

Browsers which DO NOT SUPPORTS compressed pages:

I have no information about browsers on other OSes (Win,Mac). If you want, you can send me these information.


7.11 Request remap/redirecting

Smart Cache can perform request rewriting or redirecting. Difference between rewriting and redirecting is that in case of redirecting browser is able to see a new destination.

If you are using web forwarding, you must use rewrite feature; if you are using fast redirects you must use redirect feature. In all other cases you may use anything.

Syntax of configuration file: <Source mask> <Target url> <star number>

     http://opensource.org/* http://www.opensource.org/ 1
     */redir.php?link=* * 2

Destination URL is second URL with appended string which was matched by n-th star from source mask. If n-th is 0, nothing is appended. If you need zero length Target url, use * instead.


7.12 Timestrings

Smart Cache uses in configuration file for time representation data type called 'timestring'. Timestring syntax is: <Floating point number><type>. For example: 1h, 40m, 2d. Timestring defaults to minutes if no type is used.

Supported types:

  1. s or S - seconds

  1. m - minutes

  1. d or D - day

  1. w or W - week

  1. f or F - fortnight

  1. M - months

  1. y or Y - years

  1. h or H - hours

You can also use full unit name i.e. "day". See function timestring() in garbage.java for details.


7.13 Sizestrings

Smart Cache uses in configuration file for size representation data type called 'sizestring'. Sizestring syntax is: <Floating point number><type>. For example: 1MB, 40KB, 2GB. Sizestring defaults to Bytes if no type is used.

Supported types:

  1. B - bytes

  1. K - 1024 Bytes

  1. M - 1024 KBytes

  1. G - 1024 MBytes

You can also use full unit name i.e. "kilobytes". See function sizestring() in garbage.java for details.


[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ] [ 16 ] [ A ] [ B ] [ next ]


Smart Cache Manual

0.94

Radim Kolar hsn_nospam_at.sendmail.dot.cz