Ethernet TCP/IP Source Code Driver Project
Section 06. Generating HTTP Web Content

06. Generating HTTP Web Content

a) Creating your web pages

Standard HTML web sites are really very simple. You design one or more HTML pages, which are simply ASCII text files with the extension .htm or .html (you can open them in Windows Notepad). The HTML file contains specific sections which contain instructions to the browser receiving the file. In the <BODY> section of the file is the text that is displayed on the web page, surrounded by special tags that tell the browser how to format it, or styles that tell the browser to style it using styles specified elsewhere. Images are included in a web page using special tags which specify the URL of the image file. As a browser encounters these it then sends separate requests to the HTML server to download each image file and it then adds them to the displayed page. Other files that are required by the web page (for instance style sheets) are also separately requested by the browser as it encounters references to them.

If a web page has special functionality it will either be accomplished using ‘client side’ or ‘server side’ functions. As ‘client side’ functionality is provided by the users computer (for instance general JAVA, Flash) the code to use this functionality can be included in your HTML pages if desired. However ‘server side’. functionality is provided by a server (for instance PHP) and you therefore can’t include this in you HTML pages as this driver provides an HTML server, not an HTML server with PHP, CGI, etc, plug ins.

The HTML specification is available from the World Wide Web Consortium (W3C) at www.w3.org. The specifics of HTML creation are outside the scope of this manual and there are many resources available to teach you this if required. Although HTML can be created using a text editor it can be a painful task and there are many GUI applications available to design web pages. Applications such as Adobe Dreamweaver provide very powerful HTML tools, but with a bigger learning curve required to effectively use them. Many simpler applications are also available although if you are a new user it is worth having a hunt around for general comments about an application you are considering using as some create quite sloppy or bloated (large) HTML code. This driver doesn’t care which program you use to create your web pages. The only requirements are that you can not use sub directories more than 1 level deep (only files from the main directory and any sub directories directly off that directory can be used) and the only ‘server side’ processing you may use is the dynamic data function provided by the driver.

b) Including Dynamic Content In Your Web Pages

To allow you to be able to display text or items in a web page that are selected as the page is served to a browser the driver includes a powerful dynamic content feature. As the driver reads each .htm HTML file from memory and outputs it into a TCP packet, it checks every byte to see if it is the tilde ‘~’ character. When a tilde character is found it is not outputted as part of the HTTP page but instead the driver continues reading and storing subsequent characters as a variable name until it finds the hyphen ‘-‘ character. As soon as the hyphen character is found the driver calls the function defined by HTTP_DYNAMIC_DATA_FUNCTION in the eth-http.h header file.

The function is called by the driver with the variable found, together with the TCP socket ID of the client computer the page is being sent to (in case its of use for example to identify a user by their unique MAC or IP address). Your function can then compare the variable name against a list of possible variable names you included in your web pages and return a string containing whatever you wish (maximum 100 characters) which the driver will output before continuing to send the rest of the page.

There are no restrictions or requirements on this. Anywhere you place the following:

~my_variable_name-

in your .htm files will cause your function to be called each time with the driver replacing the .htm file content from the tilde character to the hyphen character (inclusive) with the string you return from your function (excluding the 0×00 null termination). This means you can include anything from dynamically generated text to dynamically selected images or links.

The tilde character was selected as it’s a character that is very rarely used in web pages. The use of html comments was considered (<!– –>) but these can cause strange preview results in many web page creation programs when used inside certain special tags. Should you need to include the tilde character in your web page to be displayed as a tilde character then you can simply use the html character code for it instead in the html source:

&#126 in html source will display the tilde character ~

c) Sending Data To Your Embedded HTTP Server

Being able to provide web pages that include dynamically generated content is very useful but it doesn’t allow a user to send data to your embedded device using the web interface. You may want to provide the means for users to turn on and off simple options, adjust numeric or textual parameters or even upload files. This driver provides the common HTTP server form input methods, optimised for embedded use. Typically these can be implemented on your web page by inserting a standard form and the individual form components using your web design application.

GET Requests

The most simple of all input methods is to use the standard GET request. This is exactly the same request that is used by a browser to retrieve web pages and resources such as images, but with input data being sent back by the browser tagged onto the end of the requested filename. You’ve probably noticed this input method before when using all sorts of internet sites. All that has to be done is to follow the filename the browser is GET’ing with a ‘?’ character and then one or more input names and values. For example the following is a GET request with inputs:

/index.htm?variable1=25&varaible2=1280&string1=Hello+World

The filename being requested is immediately followed by the ‘?’ character, which signifies that inputs follow. Each input is then added to the end formatted as follows:

input_name=input_value

with a ‘&’ character between each input. As spaces are not permitted, if an input value contains spaces they are converted to the ‘+’ character by the browser before being added, and converted back to spaces by the HTTP server as they are received. If a special character needs to be sent (for instance a ‘?’, ‘=’, ‘&’, ‘+’, ‘%’) it may be sent by using the ‘%’ character followed by the 2 character hexadecimal code which represents the character. As this special 3 character sequence is received it is decoded into the required character by the driver.

The first space character found after the start of the filename marks the end of the inputs.

When your browser sends this type of GET request, which is typically generated when a form is submitted but can also simply be the URL of a hyperlink, the data being submitted is displayed in the browsers address bar exactly as it is sent to the HTTP server. The typical way to cause a browser to include inputs with a GET request is to include a form (using GET) on your html page with one or more inputs. When the form submit button is pressed the GET request is sent together with all of the form controls as input values.

This driver provides a very simple and memory efficient method of passing received input values to your main application. You define a function in the eth-http.h file to be called when an input value is received and the HTTP server will automatically call it every time it encounters an input value. The call includes the input name string that was sent by the browser (which will typically be the name you gave the form object when designing the html form), the input value string (fully decoded to match what the user entered or selected), plus the filename being requested and the TCP socket ID (in case these are required by your application to uniquely identify input values against particular files or clients. How you process the input data is entirely up to you, with the HTTP server simply acting as a relay of the values to your applications function. As a pointer to the requested filename is passed to your function you can also use this to change the filename that will be returned to the user once the inputs have been processed, by modifying it. This can be useful if you are, say, checking for a valid password entry to re-direct the user from a default ‘bad password’ page to a ‘you have logged in’ page.

Any input data is provided to your applications function before the filename the browser requested is returned by the HTTP server, allowing a new web page to include dynamic content retrieved from your application based on the data just submitted (useful when you want the next web page to confirm the values just entered).

The GET input method is an excellent method of passing small amounts of input data whilst at the same time requesting a new html page. Its only significant drawbacks are that the input data is visible in the browsers address bar (and may be cached by some applications or hardware which could be a concern for sensitive information) and that it only supports ASCII text.

The included sample project web page ‘Setup POP3’ section is a working example of a GET form.

POST Requests – application/x-www-form-urlencoded

The basic POST request with encoding type “application/x-www-form-urlencoded” is very similar to the GET request method of providing input values, except that the input values are contained within the message body of the TCP transmission and not after the filename. This means they can only be viewed by a network analyser and they can also be longer if required. When posting an html form the input values are formatted and passed in exactly the same way as the GET method, so the message data area of a POST request might contain the following:

variable1=25&varaible2=1280&string1=Hello+World

The driver passes POST request input values to your application by calling the same function as is used for GET input values (defined in eth-http.h). This will be called in exactly the same way as a GET request, with a separate call for each input value and with the call including the input name string, the input value string, the filename being requested and the TCP socket ID. How you process the input data is entirely up to you, with the HTTP server simply acting as a relay of the values to your applications function. As a pointer to the requested filename is passed to your function you can also use this to change the filename that will be returned to the user once the inputs have been processed by modifying it. This can be useful if you are, say, checking for a valid password entry to re-direct the user from a default ‘bad password’ page to a ‘you have logged in’ page.

The POST input method is an excellent way to pass input values that you don’t want a user to see or larger input values.

The included sample project web page ‘Setup SMTP’ section is a working example of a POST “application/x-www-form-urlencoded” form.

Driver Limitations

To avoid the need for large ram buffers the driver supports the POST application/x-www-form-urlencoded method using a single packet for the input data. The input data may be located in the first packet message body after the HTTP headers or in a second subsequent packet immediately after the HTTP headers (browsers may do either). This restriction avoids the need for the driver to deal with inputs that span packet boundaries, which can happen in TCP transfers with a large block of data. In normal use this will not be an issue because as long as the total quantity of input names and data bytes does not exceed approximately 1000 bytes then it will fit within a single TCP packet even if included by the browser with the HTTP headers. However if this is an issue then use the POST multipart/form-data method below.

POST Requests – multipart/form-data

The content type “application/x-www-form-urlencoded” is inefficient for sending large quantities of binary data or text containing non-ASCII characters. It is not possible to label the enclosed data with content type, apply a charset, or use other encoding mechanisms. The content type “multipart/form-data” should be used for submitting forms that contain non ASCII data and is also the standard method used to allow files to be uploaded from a browser.

The driver provides simple but effective handling of this type of POST request, and decodes the multipart data before passing it to the user application, so that the application gets the data exactly as submitted by a user. Instead of providing the data as complete strings, the driver passes it byte by byte, allowing binary data and data spanning many TCP packets to be handled.

The included sample project web page ‘Upload File’ section is a working example of a POST “multipart/form-data” form.

Limitations

The driver supports uploading single files per input using a POST request. Multiple selected files in 1 form input (“multipart/mixed”) are not supported (and are not generally required).

To avoid the need for large ram buffers the driver supports only a single HTTP client actively using the POST method at a time. As typical POST requests will typically only take one or a very few TCP packets to complete the upload of the input data this is typically not an issue as two HTTP clients would have to try and POST at exactly the same time for one of them to be rejected by the driver. However when uploading large files this will block any other HTTP clients being able to use the POST method until the upload is complete. Should a client be blocked from using POST it will receive a ‘503 Service Unavailable’ response, which is a general server busy message advising the browser to try again after a delay.

PUT Requests

PUT requests provide the most simple method of uploading files to a HTTP server. Although part of the HTTP specification, PUT is not generally supported by the main browsers. To utilise a PUT request you have to use script in your web page. However due to constant security concerns and vulnerabilities script that is cable of accessing client side files is often blocked by browsers and operating systems. Therefore, whilst more complex to handle, POST requests are implemented by this driver to handle file upload instead of PUT requests. Due to the widespread adoption of POST as the method to use for file uploads you can be confident that it will just work with the various mainstream browsers and operating systems available.

HTML Forms For Input Data

The typical method of sending input data with a GET or POST request is by using one or more forms on your web page. A form always has three elements: <form></form> tags that define the start and end of the form, one or more controls that allow the user to provide data to the server and a submit button. An example of the HTML code for a form:

<form action=″index.htm″ method=″get″>

<input type=″text″ name=″max_level″ maxlength=″3″ value=″50″>

</form>

action

Names the URL where the browser will submit the form data to when the submit button is clicked. This is simply the filename that will be requested when the form data is sent. Once all of the form input data has been sent to your function this is the file that will be returned by the HTTP driver. The driver will pass the filename to your function that processes each of the input values in case it is helpful. If you wish you may change the filename at any time while processing the input data in your function allowing, for instance, a log in success page to be returned if a user enters the correct log in password.

method

Specifies whether the browser will use a HTML GET or POST request to send form data to the server (see above for details of each type)

enctype

Specified for POST forms to select either the “application/x-www-form-urlencoded” or “multipart/form-data” method.

input

Start of a user input control

type

The type of user input control. Text displays a single line text box. Also available are check boxes, radio buttons, passwords, multiline text boxes, hidden fields, etc.

name

Unique within the form to identify the control.

maxlength

Maximum number of characters the user may enter

value

The default value to display (for a user adjustable embedded parameter this would typically be dynamically retrieved from the embedded application as the web page is transmitted so the user is shown the current value).

Tips for creating forms

Where your site will have several forms it’s often a good idea to give form elements unique names so that when processing an input value in your applications function you can check just the value name and without having to also check the filename to know what the input value is for.

Keeping input value names short will reduce your program memory space and speed up the time it takes to compare received value names with all of the possible names.

Where you have several input values that are related consider using a common name followed by an index number when naming them. For instance when providing an option to set a MAC address you might have 6 input boxes which you could call:

mac1, mac2, mac3, mac4, mac5 & mac6

Your function which handles the input values could then simply check to see I the input name contains ‘mac’ and if it does then check which indexed value of the mac address (1 – 6) it is.

d) Useful HTTP Design Notes

Refreshing a page automatically

Once loaded an HTML page will not be re-loaded unless you click the browsers refresh button. You can cause a page to automatically re-load periodically using the following HTML code in a web pages <head></head> section:

<meta http-equiv=”Refresh” Content=”30”>

This will cause the page to be re-loaded every 30 seconds in most browsers.

Limited Processing Power

If your embedded microcontroller or processor is not particularly fast then you can speed up web server speed by limiting the number of individual files to be served, limiting the length of filenames and any sub directories and keeping file sizes small where possible.

Included resources

Remember that a single image file can be included in many different pages. Where memory space is tight (and to improve page loading speed) try to use generic image files that may be included in different places and which the browser will only need to request once.

Resources that are included in a web page, such as images, don’t have to be stored on the same device as the HTML web page. If you are designing with an embedded device that has limited memory there is nothing to stop you including resources such as images that are located on a different web server. When a client requests the web page it will request any included resources from whatever address is specified in the HTML code. Remember of course that the client must be able to connect to the other web server to be able to get the resources.

File Extensions

The driver and web pages converter application processes all file extensions as 3 characters, but will correctly convert file extensions of .html to .htm

Caching

A client browser will often cache retrieved resources to avoid re-loading them unnecessarily. If this causes your application problems there are approaches to avoid this occurring which you can search online for. Also bear in mind that if your browser is not displaying an updated image file, for instance, it may be because its not actually requesting it. In this instance waiting a period of time or changing the filename will force it to get a newly updated file. Also bear in mind that if testing across the internet, internet service providers (ISP’s) also cache to reduce bandwidth and demands on their networks, and again you may know that you’ve changed the content of a file such as an image but be baffled as to why the browser isn’t showing it, even though in this case you may see in Wireshark the file actually being downloaded! Waiting or changing the file name will again solve this.