CGI text filtering

  • Text processing tools are in the news. Censorship is being automated
  • With these two examples of computer-assisted "meaning management", both, in my view, rather hostile to meaning (in the case of the Microsoft program, I think the article was rather sympathetic, down-playing the tendency of such software to reduce legitimate prose style to 5th-grade level), we might have fun altering form input -- essentially writing our own text manipulators. The first example of this type of thing is written as a standard cgi script, which strips naughty words out of all submitted text and sends it back purified:

    The html for this looks like this:

    <form method=post action="cgi-bin/pure.cgi">
    <textarea name="foo" rows=10 cols=50></textarea>
    <input type="submit value="clean it">
    </form>

    and the script looks like this


  • So that's one way of filtering content. Pretty unintelligent, but the authors of CYBERsitter and NYTimes would have you believe that this is nearly "artificial" intelligence, machine smarts.

    Java as a text manipulator

  • As you've seen, the C-shell is a bit awkward for doing these manipulations, because the right character in the wrong context can do terrible things. It's just confusing. Java has some neat text manipulating features, which, though not normally used for cgi, will serve as a good introduction to Java.
  • If we change our hello world program to take input as "standard input" we can then manipulate that input.
  • Standard input comes in the form of an array of Strings.

    class helloworld {
    public static void main(String input[]){
    int i=0;
    while( i < input.length ){
    System.out.println( input[i] );
    i++;
    }

    }

    So, now, after compiling this helloworld.java at the command line with javac, we can execute it as follows:

    java helloworld How did I do?

    The result should look like:
    How
    did
    I
    do?

    We could add the following:
    if(input[i].equals("shit")) System.out.println("poop");
    else if(input[i].equals("socialism")) System.out.println("communism");
    else System.out.println(input[i]);

    There are lots of neat utilities in the String class so you can really manipulate the input without worrying about a lot of the problems you encounter in CSH.


  • Compile this little program. And make it work.
  • Attempt to use some methods of the String class, eg.:
    	input[i].toUpperCase();
    	System.out.println(input[i]);