Natural language is human language. For the most part linguistics is a study of natural language. One of the ways linguistics has approached this task is to identify various levels of language and give them names.
[ Up to What is natural language?]
At the lowest level, we have the sounds that come out of our mouths. Somehow very early on in our lives, we manage to distinguish a fairly limited number of sounds which make up the language we hear around us. Each individual language is made up of a set of sound categories called phonemes. The study of this mapping from raw sound to phonemes is called phonetics.
In the course are studying language, linguists have also noticed that you can't describe the way he language sounds just by assigning some sequence of phonemes to each unit of meaning. For example in English when we want to pluralize something, we add an '-s' sound. But consider the contrast between the words 'cats' and 'dogs'. We spell both of them with an '-s', but careful listening will reveal that 'dogs' is pronounced [dogz]. As it turns out, this variation is predictable. This pluralizer is pronounced as a 'z' whenever the sound that precedes it is voiced, i.e. it involves engaging our vocal cords. This clever observation comes to us courtesy of those involved in the field of phonology.
Consider again this pluralizing '-s'. Clearly this conveys meaning, but it doesn't qualify as a word. What it qualifies as is a morpheme, along with 'cat', 'dog', and '-ing'. The study of how morphemes can combine in a given language to form words falls under the rubric of morphology.
Once we've resolved the morphological issues, we have a sequence of words. These words group themselves together into phrases, in these phrases in turn combine into sentences. This is the level of syntax.
So now we've taken a stream of noises (or characters) and distilled from it a sequence of sentences. The problem of how to represent the meaning of sentences is undertaken in the level of semantics.
Even after extracting a set of literal meanings from sentences, there is still the level of pragmatics. Language is usually and a cooperative process between at least two people, and conversations involve a subtle interplay of assumptions, requests, and expectations on the part of each speaker. Having said something, how can you be sure the other person understands? What should you say first? How do you keep from repeating yourself? When is it the other person's turn to talk? Does this person really want to know if I have a watch, or does she really want to know what time it is? These kinds of problems are hard enough for humans to work out, and to date no computer program even approaches human capabilities at this level.
[ Up to What is natural language?]
Identifying these levels of language makes for a convenient shorthand, and there are influential schools of linguistics which think they correspond to real aspects of the human language facility. However, there is controversy on this issue.
How integrated are all these language processes? Is language itself a function of a separate module in the brain? Or is it a complex skill which emerges out of a collection of other skills? This is a hotly debated question.
Computational linguistics itself has to subscribe partially to both views. Certainly there are a number of generally applicable skills which can be brought to bear on the problem of language. Pattern recognition, planning, prediction, complex process modeling, and analogy are all applicable to fields outside of natural language. All of these areas are active fields of research. However, in order to make use of these methodologies and apply them to the domain of natural language, sheer engineering necessity requires that we break the problem down into modules.
[ Up to What is natural language?]
One thing that seems certain is that natural language is implemented on massively parallel hardware. The brain is a complex organ consisting of billions of interconnected neurons. Clearly many of the brain's processes are carried out in parallel, but there's this thing we call our attention which for some reason only focuses itself very narrowly during any given second.
Recall our discussion in the last section on the CPU. Here we have a very limited computing capacity of the few registers available at any one time. Each CPU is required to undertake its processing sequentially. Most traditional computer programs use the stack we discussed earlier as a way of sequentially dealing with each part of a given problem. Most traditional approaches to linguistics have also relied heavily on stacks to determine the sequences of the various components of language.
However, there is such a thing as parallel algorithm. This can be done by linking up a complex network of CPUs, or the same CPU can model numerous processes concurrently, systematically shifting its 'attention' amongst each process. Then there is the field of computer science dedicated to neural networks, which are able to demonstrate very promising results in the field of pattern recognition. It is fairly straightforward to construct networked data structures the nodes of which can be visited in turn by the processor.
[ Up to What is natural language?]
And then there's the question of whether a computer will ever achieve the language capabilities of the human. There are those who argue that language is firmly grounded in the wealth of our sensory/motor experiences. It may be that until computers get bodies with the sensory apparatus capable of encoding experiences similar to our own, they may never be able to make the inferences we take for granted with expressions such as 'put the past behind you', and 'drowning in sorrow'. This might be a good thing. It might be that computers will never be more than powerful tools for human language users rather than our replacements.