Semidynamic sites on static servers

Intention

You can't use dynamic (SSI, jsp,php,perl,...) creation of html-pages, cause your hosting-contract doesn't include such features?

But you want to

  • generate your site out of a database
  • be flexible
  • not pay for a CMS System.
  • prepare for hosting with dynamic features
  • generate your site dynamic, but create static pages for better performance/less resource use
  • ...

Here i explain a possibility for not to big sites, or if your content doesn't change to often. I use it for ~200 pages, more are feasible for sure.

You only need ftp-access to your hosters server.

A Solution

Install a local webserver, here you can do what you want.
Suck the whole (local) site with a web-mirror tool.
Find the new pages and ftp them to your hoster.

How to

Synchronizing is a two step process:

  1. Generating (e.g. with wget)
  2. Uploading, i just found sitecopy, which does its job really good.
    My older method needed two steps: 2a) 'Pick the changed pages' and 2b) 'Upload by ftp'

What you need

Standard software, folder structure & some 'scripts'.

Webserver
A local webserver: Apache, Tomcat, ...

Folder structure
Some folders on your computer:

  • One for your raw data/cgi-scripts/php/perl/jsp... Let's call it 'data'.
  • The second one, where the web-mirror tool puts the generated files, we name it 'localhost'.

1. Generating with a web-mirror tool

I use wget (1.8.2), but you can use your favourite web-mirror tool as well. Here i describe how to use wget. If you are running windows you may try Cygwin, which i use, its great. Now you need a 'script' to suck your whole site from the local server, generating them on the fly. It is only one line:

wget -r --level=0 -k --proxy=off -a logfile http://localhost/
   Parameters
   '-r --level=0'    gets the whole site recursively
   '-k'              convert non-relative links to relative
   '--proxy=off'     i normally use a proxy, switch it off
   '-a <filename>'   logfile, search for 404 to find internal dead links.

Wget creates a folder named like 'localhost', if your server is not running on port 80 it will use 'localhost_portNr'.

If you remove pages between different runs of the sitegeneration, be sure to empty the folder 'localhost' before a new generation, so that wget (or what you use) always produces fresh content. Later i found, that it is necessary to delete the folder 'localhost' every time, or wget doesn't convert all the links in the right way, search for 'http://localhost/' in the generated site. I will review this later.

Now your whole site lies fresh generated in folder 'localhost', ready for upload. But all timestamps are from 'now', you have to pick only the changed files to minimize traffic, and also to let the caches out there in the internet do their job.

2. Upload

After i read about sitecopy, i use this nice tool, something like the opposite of wget. You get sitecopy here and - if you are a windows user - need Cygwin.
In the mode we use it, this tool calculates a checksum for all files uploaded to the server. If you refresh your site, it recalculates the checksums of your local files and uploads only the changed ones.

Here is my configuration '.sitecopyrc'. Very important is the line 'state checksum', without it the whole site is uploaded every time, not just the changed files.

site mySite
server myServer
protocol ftp
username me
password dontKnow
local /cygdrive/r/mySite/homepage/mySite_8001
remote /
nodelete
state checksum
permissions ignore

Old synchronisation, handmade

This is the above mentioned synchronisation i used before i knew sitecopy. You don't need it, just documented in case.... If you want to use this, create a third folder. This one will hold the files in exactly the same state (binary contents, timestamp) as they lie around at your hosters place. Let's call it 'destination'

2.a. Pick the changed pages

For this i collect all files from the folder 'localhost' which either

  • doesn't exist in destination
  • differing by at least one byte

and put them into the destination folder.

For this job i use a really bad performing script (it starts lots of sub-processes - needs a rewrite in perl or java). By now it's a bash-script download (windows users want to visit Cygwin). You need bash, find and cmp, so install: base, bash and fileutils.

2.b. Upload by ftp

I use WS_Ftp - update only newer files - and upload the whole folder 'destination'

Other ftp-clients should also be fine. I tried ncftp, which should work nicely by commandline. But didn't get it to work with my provider, it works like a charm with a local ftp-server, so its malfunction may caused by my router/firewall. Give this a try:

ncftpput -u <user> -p <pass> -R -z -y -d log.txt <server> / ./destination/*

Get ncftp from Cygwin or find a native port.

© July 2003 Peter Büttner