[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ] [ 16 ] [ A ] [ B ] [ next ]
How to choose good cache size? It depends how you want to use it and how many daily traffic you have. Nowdays diskpace is cheap resource. Performance or runtime used memory do not depends on cache size.
100 MB is starting point - I never recommend cache size to be smaller than 100 MB. If you want smaller cache, you do not need it - simply use cache in your browser.
Next sections has some hints about choosing cache size.
Case #1: If you do not browsing at home very much (you have T1 line at work) and just want to see some hot breaking news sometimes and you are not interested mainly in offline browsing. 200 MB cachesize is enough.
Case #2: If you like offline browsing, you need a bigger cache. Start with 400
MB if you are not browsing very much. If you have some free diskspace or you
are browsing very much, you can go up to 1GB. In any case use Data compression support, Section 7.10. It is also good
to somewhat finetune GC in gc.cnf
. My recommended settings:
#two years in cache are enough reference_age 2y large_object_penalty 200000 *2.5 50000 +5 negative_cached_penalty *5 #disable faster expiration expired_penalty *1 expired_but_checkable_penalty *1 not_checkable_penalty *1
You need to determine average daily traffic - how fast will grow your cache daily without doing -gc. It is preferred to use -fakegc operation for computing cache size.
If you are starting with empty cache, do not use numbers from first days. Empty cache will grow fast in first days and then will grow slower. Most people are visiting the same set of WWW sites. Let it run for 14 days; write down cache size, wait one week; compute difference and divide it by 7.
Standard mode of operation: leased line Internet, not planing to use offline browsing, use proxy cache for saving the bandwidth, about 50 computers connected to it.
In this case you should have cache size for holding about 1 Month traffic with 1GB minimum. I do not recommend to make it too much bigger than 1 Month traffic size, because in my tests it do not significantly improves cache hit rate. You can make it smaller if you have a heavy traffic site, but do not go under 14 days of traffic unless you really need it. Never go under 7days.
Start with cache size 500 MB. You should read standard http-proxy section also. Data compression support, Section 7.10 is your friend.
Smart Cache can be configured how often will made checks if new version of page is available. This is done with keywords default_refresh_pattern and refresh_pattern in configuration file. The difference between them is that refresh_pattern contains URL mask, other arguments are the same.
If you want to play with them, setting trace_refresh yes and trace_url yes will provide you more information why page is being or not being loaded. If you can understand Java, go into file cacheobject.java and look at function needRefresh().
Arguments of these commands are:
[default_]refresh_pattern [URL mask] /Reload_age/ /Min_age/ /Lastmod_factor/ /max_age/ /Expire_age/ /Redirect_Age/
These numbers are floating-point times in minutes except /Lastmod_factor/ which is a fraction part < 1
Smart Cache's refresh algorithm in English. Page age is difference between date, when the page was loaded and today.
If browser requests a forced page reload and page is older than /Reload_age/, reload it otherwise return old copy.
If page is older than /Max_age/ load it.
If page has expire date and page has expired and page age is bigger than /Expire_age/ load it.
If page is redirect to other page and age is older than /Redir_age/ load it.
If page is younger than /Min_age/ return cached copy.
If page do not have last modified date, load it.
Compute last mod factor: lmf= page_age / (page_date - page_last_modified)
If lmf > /Lastmod_factor/ load it otherwise return cached copy.
Smart Cache uses real LRU based garbage collection. It remembers last access time to every object in cache. When GC runs last access value of every object is transformed to days.
Object's age is modified by various object's attributes (for example size or
expiration age), size rules and best matching penalty rule are applied.
Priority of penalty rules follows order in sample configuration file. Last
step is to apply first matching urlmask rule (if any). Rules can be fine-tuned
with high details in gc.cnf
.
If age is bigger than reference_age or object is bigger than maximum_object_size or smaller than minimum_object_size object is immediately removed from cache without considering cache size.
After all objects in cache are scanned, there are sorted by age. When cache size is bigger than cache high mark GC starts cleaning with removing objects with highest score first until size of cache drops between high and low marks. GC prefers to clean as much as possible without needing of another cache scan, but never deletes files bellow low mark.
Some Web sites deal so called "cookies". These "cookies" are tags sent from the server to the browser, which enable the server to keep track of the sites that the user visits, and thus compromise his privacy.
As was requested by many users, Smart Cache has now built-in filter for Cookies. Smart Cache's cookie filter has now 2 working modes: incoming and outgoing. These modes are switched using allow_all_session_cookies.
In both modes browsers warns if incoming cookies are detected, so they will continue to display warnings. Just turn this warnings off. If you use Netscape 3+, you can disable confirmation messages for any cookies sent to your browser by going into Options->Network_Preferences->Protocols and checking off the box for Show an Alert before Accepting a cookie.
This mode is set by allow_all_session_cookies false. In this mode all
cookies sent by your browser will be filtered unless domain name is allowed in
cookies.cnf
.
Benefits of this solution:
Outgoing filter (Instead of incoming) is 100% safe, no cookies will gets out.
Cookies which are set by HTML or by JavaScript are not problem.
Incoming cookies can be useful in some cases, because forms which are JavaScript enhanced can use cookie for storing user input between sessions.
Storing cookies on your hard-drive doesn't reduce your privacy, just takes some little space. Privacy is compromised when cookies are send back.
fake_cookie option can be easily done.
DANGER: When using fake_cookie option (cookies filter itself do not harm), you can CRASH remote WWW site when sending back very long cookies (buffer overflow attack) or cookies with known name, but unexpected value (for example text instead of numeric input). Some versions of Microsoft Internet Information Server will crash entire (instead of just one http-child in Apache), so no new users can access this server until IIS is restarted.
This mode is set by allow_all_session_cookies true. In this mode
all-session cookies are allowed from all sites and persistent cookies are
allowed only from sites listed in cookies.cnf
. If persistent
cookies is not allowed - it is changed to session cookie. This is enough for
keeping cookies-only www servers happy.
It is not a good idea to switch filter from outgoing to incoming mode without deleting all cookies in browser cache first.
Benefits of this solution:
You do not need to configure list of sites, which are cookie only.
Non persistent cookies are not a security threat.
Unallowed cookies will not be stored in browser.
Drawbacks:
Not possible to be 100% persistent cookies safe.
Cookies which are set by HTML or by JavaScript are problem. These are not detectable by proxy-server without HTML parsing, which is very slow process.
fake_cookie option is ignored.
Smart Cache produce common or combined log file formats. Type of log can be switched by log_common.
Logs are wildcard masked, so you can log to multiple logs depending on URL requested. This is especially written for use in Web forwarding with Smart Cache, Section 9.7, but logs can be produced even if no forwarding is set.
Usage: access_log mask filename
If you are using SC for offline browsing, sometimes you may find importing files from outer sources (for example some CDs has offline copies of some servers) useful.
Place these files to directory structure, which has the exactly same name as original URLs of files. For example in /tmp make directory /tmp/www.javaworld.com/javaworld/jw-10-1999/ and copy necessary files to it, you can create any number of directories (even from different servers).
After that run java scache -import /tmp and files will be imported.
Smart Cache will check if newer version of imported file is not already available in cache. If not, file will be moved (preserving timestamp) to cache, if it fails, because file and cache directory are at different filesystems (disks), SC will copy file instead. Is wise to place files to the same filesystem as SC main data directory and clear READ-ONLY attributes.
Cached data can be easily exported from Smart Cache. Data are exported in format, which can be used in Importing files to Smart Cache, Section 7.6.
Smart Cache can export last recently cached data in 3 modes:
-export uses file's last modified date.
-fullexport uses date when file was last checked against original server.
-lruexport uses file's last access date.
Command line syntax is -[lru|full]export <Directory> <Timedelta>
Time delta is in format number unit with no space. (For example "1w" is one week). Supported time units are: d/D day, w/W week, m minutes, M months, y/Y years, h/H hours, s/S seconds.
When Smart Cache gets request for unsupported protocol (for example ftp), it will forward it to ftp_proxy for FTP protocol or to http_proxy for rest. Result obtained from that will be cached.
A parent proxy login and password may be specified in the "http_proxy" or "ftp_proxy" configuration statements after the parent proxy port in a form login:password.
Example: http_proxy my.cache.net 3128 mylogin:mypass
Smart Cache can compress text data while saving them to disk. This save
significant amount (about 50%) of diskspace. If you want to do this set
auto_compress 1 in scache.cnf
, which compress files
greater than 512 bytes. You can also set filesize limit instead 1, for example
auto_compress 20000 will compress files bigger than 20k. Is
recommended to use your blocksize as limit for compressing files.
These data will be sent to your browser in gzip compressed form. Your browser must know how to decompress them. Existing data can be compressed via Smart Cache repair utility, Chapter 11
If you see garbage on the screen, your browser can not handle compressed data. If you want still to use data compression set auto_decompress 1. Smart Cache will decompress data if your browser do not sent accept-encoding: gzip header. Some browsers do not send this header, but accept compressed data, so use auto_decompress only if necessary. If you want to ALWAYS decompress outgoing data, use value 2.
Any modern browser supports compressed HTML pages. Reported browsers which supports compressed HTML pages:
Netscape 3.0/Linux (requires installed gzip)
Lynx 2.8/Linux (requires installed gzip)
Netscape 4.61/OS2
Opera 5
StarOffice 5.1 built-in browser
w3m (nice text mode browser with table rendering support!)
Browsers which DO NOT SUPPORTS compressed pages:
Netscape 2.02/OS2
Mozilla 5.0 beta/OS2
I have no information about browsers on other OSes (Win,Mac). If you want, you can send me these information.
Smart Cache can perform request rewriting or redirecting. Difference between rewriting and redirecting is that in case of redirecting browser is able to see a new destination.
If you are using web forwarding, you must use rewrite feature; if you are using fast redirects you must use redirect feature. In all other cases you may use anything.
Syntax of configuration file: <Source mask> <Target url> <star number>
http://opensource.org/* http://www.opensource.org/ 1 */redir.php?link=* * 2
Destination URL is second URL with appended string which was matched by n-th star from source mask. If n-th is 0, nothing is appended. If you need zero length Target url, use * instead.
Smart Cache uses in configuration file for time representation data type called 'timestring'. Timestring syntax is: <Floating point number><type>. For example: 1h, 40m, 2d. Timestring defaults to minutes if no type is used.
Supported types:
s or S - seconds
m - minutes
d or D - day
w or W - week
f or F - fortnight
M - months
y or Y - years
h or H - hours
You can also use full unit name i.e. "day". See function timestring() in garbage.java for details.
Smart Cache uses in configuration file for size representation data type called 'sizestring'. Sizestring syntax is: <Floating point number><type>. For example: 1MB, 40KB, 2GB. Sizestring defaults to Bytes if no type is used.
Supported types:
B - bytes
K - 1024 Bytes
M - 1024 KBytes
G - 1024 MBytes
You can also use full unit name i.e. "kilobytes". See function sizestring() in garbage.java for details.
[ previous ] [ Contents ] [ 1 ] [ 2 ] [ 3 ] [ 4 ] [ 5 ] [ 6 ] [ 7 ] [ 8 ] [ 9 ] [ 10 ] [ 11 ] [ 12 ] [ 13 ] [ 14 ] [ 15 ] [ 16 ] [ A ] [ B ] [ next ]
Smart Cache Manual
0.94hsn_nospam_at.sendmail.dot.cz