Do you know that? 38% are now earning more than they did as an employee in the same field next

See right interview answers on 30 common job interview questions


Collapse | Expand

1. Are the names of all element types and attributes in some XML namespace?

No.
   If an element type or attribute name is not specifically declared to be in an XML
   namespace -- that is, it is unprefixed and (in the case of element type names) there is no
   default XML namespace -- then that name is not in any XML namespace. If you want,
   you can think of it as having a null URI as its name, although no "null" XML namespace
   actually exists. For example, in the following, the element type name B and the attribute
   names C and E are not in any XML namespace:
   <google:A xmlns:google="http://www.google.org/">
   <B C="bar"/>
   <google:D E="bar"/>
 </google:A>

2. Aren't XML, SGML, and HTML all the same thing?

Not quite; SGML is the mother tongue, and has been used for describing thousands of

different document types in many fields of human activity, from transcriptions of ancient
  Irish manuscripts to the technical documentation for stealth bombers, and from patients'
  clinical records to musical notation. SGML is very large and complex, however, and
  probably overkill for most common office desktop applications.
  XML is an abbreviated version of SGML, to make it easier to use over the Web, easier
  for you to define your own document types, and easier for programmers to write
  programs to handle them. It omits all the complex and less-used options of SGML in
  return for the benefits of being easier to write applications for, easier to understand, and
  more suited to delivery and interoperability over the Web. But it is still SGML, and XML
  files may still be processed in the same way as any other SGML file (see the question on
  XML software).
  HTML is just one of many SGML or XML applications—the one most frequently used
  on the Web.
  Technical readers may find it more useful to think of XML as being SGML-- rather than
  HTML++.

3. Can I use Java to create or manage XML files?

Yes, any programming language can be used to output data from any source in XML
  format. There is a growing number of front-ends and back-ends for programming
  environments and data management environments to automate this. Java is just the most
  popular one at the moment.
  There is a large body of middleware (APIs) written in Java and other languages for
  managing data either in XML or with XML input or output.

4. Can I use JavaScript, ActiveX, etc in XML files?

This will depend on what facilities your users' browsers implement. XML is about
  describing information; scripting languages and languages for embedded functionality are
  software which enables the information to be manipulated at the user's end, so these
  languages do not normally have any place in an XML file itself, but in stylesheets like
  XSL and CSS where they can be added to generated HTML.
  XML itself provides a way to define the markup needed to implement scripting
  languages: as a neutral standard it neither encourages not discourages their use, and does
  not favour one language over another, so it is possible to use XML markup to store the
  program code, from where it can be retrieved by (for example) XSLT and re-expressed in
  a HTML script element.
  Server-side script embedding, like PHP or ASP, can be used with the relevant server to
  modify the XML code on the fly, as the document is served, just as they can with HTML.
  Authors should be aware, however, that embedding server-side scripting may mean the
  file as stored is not valid XML: it only becomes valid when processed and served, so care
  must be taken when using validating editors or other software to handle or manage such
  files. A better solution may be to use an XML serving solution like Cocoon, AxKit, or
  PropelX.

5. Can XML use non-Latin characters?

Yes, the XML Specification explicitly says XML uses ISO 10646, the international
  standard character repertoire which covers most known languages. Unicode is an
  identical repertoire, and the two standards track each other. The spec says (2.2): ‘All
  XML processors must accept the UTF-8 and UTF-16 encodings of ISO 10646…’. There
  is a Unicode FAQ at http://www.unicode.org/faq/FAQ.
  UTF-8 is an encoding of Unicode into 8-bit characters: the first 128 are the same as
  ASCII, and higher-order characters are used to encode anything else from Unicode into
  sequences of between 2 and 6 bytes. UTF-8 in its single-octet form is therefore the same
  as ISO 646 IRV (ASCII), so you can continue to use ASCII for English or other
  languages using the Latin alphabet without diacritics. Note that UTF-8 is incompatible
  with ISO 8859-1 (ISO Latin-1) after code point 127 decimal (the end of ASCII).
  UTF-16 is an encoding of Unicode into 16-bit characters, which lets it represent 16
  planes. UTF-16 is incompatible with ASCII because it uses two 8-bit bytes per character
  (four bytes above U+FFFF).

6. Can you walk us through the steps necessary to parse XML documents?

Superficially, this is a fairly basic question. However, the point is not to determine
  whether candidates understand the concept of a parser but rather have them walk through
  the process of parsing XML documents step-by-step. Determining whether a nonvalidating
  or validating parser is needed, choosing the appropriate parser, and handling
  errors are all important aspects to this process that should be included in the candidate's
  response.

7. Describe the role that XSL can play when dynamically generating HTML pagesfrom a relational database?

Even if candidates have never participated in a project involving this type of architecture,
  they should recognize it as one of the common uses of XML. Querying a database and
  then formatting the result set so that it can be validated as an XML document allows
  developers to translate the data into an HTML table using XSLT rules. Consequently, the
  format of the resulting HTML table can be modified without changing the database query
  or application code since the document rendering logic is isolated to the XSLT rules.

8. Do I have to change any of my server software to work with XML?

The only changes needed are to make sure your server serves up .xml, .css, .dtd, .xsl, and
  whatever other file types you will use as the correct MIME content (media) types.
  The details of the settings are specified in RFC 3023. Most new versions of Web server
  software come preset.
  If not, all that is needed is to edit the mime-types file (or its equivalent: as a server
  operator you already know where to do this, right?) and add or edit the relevant lines for
  the right media types. In some servers (eg Apache), individual content providers or
  directory owners may also be able to change the MIME types for specific file types from
  within their own directories by using directives in a .htaccess file. The media types
  required are:
  * text/xml for XML documents which are ‘readable by casual users’;
  * application/xml for XML documents which are ‘unreadable by casual users’;
  * text/xml-external-parsed-entity for external parsed entities such as document fragments
  (eg separate chapters which make up a book) subject to the readability distinction of
  text/xml;
  * application/xml-external-parsed-entity for external parsed entities subject to the

readability distinction of application/xml;
  * application/xml-dtd for DTD files and modules, including character entity sets.
  The RFC has further suggestions for the use of the +xml media type suffix for identifying
  ancillary files such as XSLT (application/xslt+xml).
  If you run scripts generating XHTML which you wish to be treated as XML rather than
  HTML, they may need to be modified to produce the relevant Document Type
  Declaration as well as the right media type if your application requires them to be
  validated.

9. Give a few examples of types of applications that can benefit from using XML?

There are literally thousands of applications that can benefit from XML technologies.
  The point of this question is not to have the candidate rattle off a laundry list of projects
  that they have worked on, but, rather, to allow the candidate to explain the rationale for
  choosing XML by citing a few real world examples. For instance, one appropriate answer
  is that XML allows content management systems to store documents independently of
  their format, which thereby reduces data redundancy. Another answer relates to B2B
  exchanges or supply chain management systems. In these instances, XML provides a
  mechanism for multiple companies to exchange data according to an agreed upon set of
  rules. A third common response involves wireless applications that require WML to

render data on hand held devices.

10. Give some examples of XML DTDs or schemas that you have worked with?

Although XML does not require data to be validated against a DTD, many of the benefits

of using the technology are derived from being able to validate XML documents against

business or technical architecture rules. Polling for the list of DTDs that developers have

worked with provides insight to their general exposure to the technology. The ideal

candidate will have knowledge of several of the commonly used DTDs such as FpML,

DocBook, HRML, and RDF, as well as experience designing a custom DTD for a

particular project where no standard existed.

11. How can I handle embedded HTML in my XML?

Apart from using CDATA Sections, there are two common occasions when people want
  to handle embedded HTML inside an XML element:
  1. when they have received (possibly poorly-designed) XML from somewhere else which
  they must find a way to handle;
  2. when they have an application which has been explicitly designed to store a string of
  characters containing < and & character entity references with the objective of turning
  them back into markup in a later process (eg FreeMind, Atom).
  Generally, you want to avoid this kind of trick, as it usually indicates that the document
  structure and design has been insufficiently thought out. However, there are occasions
  when it becomes unavoidable, so if you really need or want to use embedded HTML
  markup inside XML, and have it processable later as markup, there are a couple of
  techniques you may be able to use:
  * Provide templates for the handling of that markup in your XSLT transformation or
  whatever software you use which simply replicates what was there, eg
  <xsl:template match="b">
  <b>
  <xsl:apply-templates/>
  </b>
  </xsl:template/>
  * Use XSLT's ‘deep copy’ instruction, which outputs nested well-formed markup
  verbatim, eg
  <xsl:template match="ol">
  <xsl:copy-of select="."/>
  </xsl:template/>
  * As a last resort, use the disable-output-escaping attribute on the xsl:text element of
  XSL[T] which is available in some processors, eg
  <xsl:text disable-output-escaping="yes"><![CDATA[<b>Now!</b>]]></xsl:text>
  * Some processors (eg JX) are now providing their own equivalents for disabling output
  escaping. Their proponents claim it is ‘highly desirable’ or ‘what most people want’, but
  it still needs to be treated with care to prevent unwanted (possibly dangerous) arbitrary
  code from being passed untouched through your system. It also adds another dependency
  to your software.

12. How can I make my existing HTML files work in XML?

Either convert them to conform to some new document type (with or without a DTD or
  Schema) and write a style sheet to go with them; or edit them to conform to XHTML.
  It is necessary to convert existing HTML files because XML does not permit end-tag
  minimization (missing , etc), unquoted attribute values, and a number of other SGML shortcuts which have been normal in most HTML DTDs. However, many HTML authoring tools already producenalmost (but not quite) well-formed XML.
  You may be able to convert HTML to XHTML using the Dave Raggett's HTML Tidy
  program, which can clean up some of the formatting mess left behind by inadequate
  HTML editors, and even separate out some of the formatting to a stylesheet, but there is
  usually still some hand-editing to do.

13. How do I create my own document type?

Document types usually need a formal description, either a DTD or a Schema. Whilst it is

possible to process well-formed XML documents without any such description, trying to
  create them without one is asking for trouble. A DTD or Schema is used with an XML
  editor or API interface to guide and control the construction of the document, making
  sure the right elements go in the right places.
  Creating your own document type therefore begins with an analysis of the class of
  documents you want to describe: reports, invoices, letters, configuration files, credit-card
  verification requests, or whatever. Once you have the structure correct, you write code to
  express this formally, using DTD or Schema syntax.

14. How do I declare an XML namespace in an XML document?

To declare an XML namespace, you use an attribute whose name has the form:
  xmlns:prefix
  --OR--
  xmlns
  These attributes are often called xmlns attributes and their value is the name of the XML
  namespace being declared; this is a URI. The first form of the attribute (xmlns:prefix)
  declares a prefix to be associated with the XML namespace. The second form (xmlns)
  declares that the specified namespace is the default XML namespace.
  For example, the following declares two XML namespaces, named
  http://www.google.com/ito/addresses and http://www.google.com/ito/servers. The first
  declaration associates the addr prefix with the http://www.google.com/ito/addresses
  namespace and the second declaration states that the http://www.google.com/ito/servers
  namespace is the default XML namespace.
  <Department
  xmlns:addr="http://www.google.com/ito/addresses"
  xmlns="http://www.google.com/ito/servers">
  NOTE: Technically, xmlns attributes are not attributes at all -- they are XML namespace
  declarations that just happen to look like attributes. Unfortunately, they are not treated
  consistently by the various XML recommendations, which means that you must be
  careful when writing an XML application.
  For example, in the XML Information Set (http://www.w3.org/TR/xml-infoset), xmlns
  "attributes" do not appear as attribute information items. Instead, they appear as
  namespace declaration information items. On the other hand, both DOM level 2 and SAX
  2.0 treat namespace attributes somewhat ambiguously. In SAX 2.0, an application can
  instruct the parser to return xmlns "attributes" along with other attributes, or omit them
  from the list of attributes. Similarly, while DOM level 2 sets namespace information
  based on xmlns "attributes", it also forces applications to manually add namespace
  declarations using the same mechanism the application would use to set any other
  attributes.

15. How do I execute or run an XML file?

You can't and you don't. XML itself is not a programming language, so XML files don't
  ‘run’ or ‘execute’. XML is a markup specification language and XML files are just data:
  they sit there until you run a program which displays them (like a browser) or does some
  work with them (like a converter which writes the data in another format, or a database
  which reads the data), or modifies them (like an editor).
  If you want to view or display an XML file, open it with an XML editor or an question
  B.3, XML browser.
  The water is muddied by XSL (both XSLT and XSL:FO) which use XML syntax to
  implement a declarative programming language. In these cases it is arguable that you can
  ‘execute’ XML code, by running a processing application like Saxon, which compiles the
  directives specified in XSLT files into Java bytecode to process XML.

16. How do I get XML into or out of a database?

Ask your database manufacturer: they all provide XML import and export modules to
  connect XML applications with databases. In some trivial cases there will be a 1:1 match
  between field names in the database table and element type names in the XML Schema or
  DTD, but in most cases some programming will be required to establish the desired
  match. This can usually be stored as a procedure so that subsequent uses are simply
  commands or calls with the relevant parameters.
  In less trivial, but still simple, cases, you could export by writing a report routine that
  formats the output as an XML document, and you could import by writing an XSLT
  transformation that formatted the XML data as a load file.

17. How do I write my own DTD?

You need to use the XML Declaration Syntax (very simple: declaration keywords begin
  with
  <!ELEMENT Shopping-List (Item)+>
  <!ELEMENT Item (#PCDATA)>
  It says that there shall be an element called Shopping-List and that it shall contain
  elements called Item: there must be at least one Item (that's the plus sign) but there may
  be more than one. It also says that the Item element may contain only parsed character
  data (PCDATA, ie text: no further markup).
  Because there is no other element which contains Shopping-List, that element is assumed
  to be the ‘root’ element, which encloses everything else in the document. You can now
  use it to create an XML file: give your editor the declarations:
  <?xml version="1.0"?>
  <!DOCTYPE Shopping-List SYSTEM "shoplist.dtd">
  (assuming you put the DTD in that file). Now your editor will let you create files
  according to the pattern:
  <Shopping-List>
  <Item>Chocolate</Item>
  <Item>Sugar</Item>
  <Item>Butter</Item>
  </Shopping-List>
  It is possible to develop complex and powerful DTDs of great subtlety, but for any
  significant use you should learn more about document systems analysis and document
  type design. See for example Developing SGML DTDs: From Text to Model to Markup
  (Maler and el Andaloussi, 1995): this was written for SGML but perhaps 95% of it
  applies to XML as well, as XML is much simpler than full SGML—see the list of
  restrictions which shows what has been cut out.
  Warning
  Incidentally, a DTD file never has a DOCTYPE Declaration in it: that only occurs in an
  XML document instance (it's what references the DTD). And a DTD file also never has
  an XML Declaration at the top either. Unfortunately there is still software around which
  inserts one or both of these.

18. How does XML handle metadata?

Because XML lets you define your own markup languages, you can make full use of the
  extended hypertext features of XML (see the question on Links) to store or link to
  metadata in any format (eg using ISO 11179, as a Topic Maps Published Subject, with
  Dublin Core, Warwick Framework, or with Resource Description Framework (RDF), or
  even Platform for Internet Content Selection (PICS)).
  There are no predefined elements in XML, because it is an architecture, not an
  application, so it is not part of XML's job to specify how or if authors should or should
  not implement metadata. You are therefore free to use any suitable method. Browser
  makers may also have their own architectural recommendations or methods to propose.

19. How does XML handle white-space in my documents?

All white-space, including line breaks, TAB characters, and normal spaces, even between
  ‘structural’ elements where no text can ever appear, is passed by the parser unchanged to
  the application (browser, formatter, viewer, converter, etc), identifying the context in
  which the white-space was found (element content, data content, or mixed content, if this
  information is available to the parser, eg from a DTD or Schema). This means it is the
  application's responsibility to decide what to do with such space, not the parser's:
  * insignificant white-space between structural elements (space which occurs where only
  element content is allowed, i.e. between other elements, where text data never occurs)
  will get passed to the application (in SGML this white-space gets suppressed, which is
  why you can put all that extra space in HTML documents and not worry about it)
  * significant white-space (space which occurs within elements which can contain text and
  markup mixed together, usually mixed content or PCDATA) will still get passed to the

application exactly as under SGML. It is the application's responsibility to handle it
  correctly.
  The parser must inform the application that white-space has occurred in element content,
  if it can detect it. (Users of SGML will recognize that this information is not in the ESIS,
  but it is in the Grove.)
  <chapter>
  <title>
  My title for
  Chapter 1.
  </title>
  <para>
  text
  </para>
  </chapter>
  In the example above, the application will receive all the pretty-printing linebreaks,
  TABs, and spaces between the elements as well as those embedded in the chapter title. It
  is the function of the application, not the parser, to decide which type of white-space to
  discard and which to retain. Many XML applications have configurable options to allow
  programmers or users to control how such white-space is handled.

20. How will XML affect my document links?

The linking abilities of XML systems are potentially much more powerful than those of
  HTML, so you'll be able to do much more with them. Existing href-style links will
  remain usable, but the new linking technology is based on the lessons learned in the
  development of other standards involving hypertext, such as TEI and HyTime, which let
  you manage bidirectional and multi-way links, as well as links to a whole element or span
  of text (within your own or other documents) rather than to a single point. These features
  have been available to SGML users for many years, so there is considerable experience
  and expertise available in using them. Currently only Mozilla Firefox implements XLink.
  The XML Linking Specification (XLink) and the XML Extended Pointer Specification
  (XPointer) documents contain the details. An XLink can be either a URI or a TEI-style
  Extended Pointer (XPointer), or both. A URI on its own is assumed to be a resource; if an
  XPointer follows it, it is assumed to be a sub-resource of that URI; an XPointer on its
  own is assumed to apply to the current document (all exactly as with HTML).
  An XLink may use one of #, ?, or |. The # and ? mean the same as in HTML applications;
  the | means the sub-resource can be found by applying the link to the resource, but the
  method of doing this is left to the application. An XPointer can only follow a #.
  The TEI Extended Pointer Notation (EPN) is much more powerful than the fragment
  address on the end of some URIs, as it allows you to specify the location of a link end
  using the structure of the document as well as (or in addition to) known, fixed points like
  IDs. For example, the linked second occurrence of the word ‘XPointer’ two paragraphs
  back could be referred to with the URI (shown here with linebreaks and spaces for
  clarity: in practice it would of course be all one long string):
  http://xml.silmaril.ie/faq.xml#ID(hypertext)
  .child(1,#element,'answer')
  .child(2,#element,'para')
  .child(1,#element,'link')
  This means the first link element within the second paragraph within the answer in the
  element whose ID is hypertext (this question). Count the objects from the start of this
  question (which has the ID hypertext) in the XML source:
  1. the first child object is the element containing the question ();
  2. the second child object is the answer (the element);
  3. within this element go to the second paragraph;

4. find the first link element.
  Eve Maler explained the relationship of XLink and XPointer as follows:
  XLink governs how you insert links into your XML document, where the link might
  point to anything (eg a GIF file); XPointer governs the fragment identifier that can go on
  a URL when you're linking to an XML document, from anywhere (eg from an HTML
  file).
  [Or indeed from an XML file, a URI in a mail message, etc…Ed.]
  David Megginson has produced an xpointer function for Emacs/psgml which will deduce
  an XPointer for any location in an XML document. XML Spy has a similar function.

21. How would you build a search engine for large volumes of XML data?

The way candidates answer this question may provide insight into their view of XML
  data. For those who view XML primarily as a way to denote structure for text files, a
  common answer is to build a full-text search and handle the data similarly to the way
  Internet portals handle HTML pages. Others consider XML as a standard way of
  transferring structured data between disparate systems. These candidates often describe
  some scheme of importing XML into a relational or object database and relying on the
  database's engine for searching. Lastly, candidates that have worked with vendors
  specializing in this area often say that the best way the handle this situation is to use a
  third party software package optimized for XML data.

22. I keep hearing about alternatives to DTDs. What's a Schema?

The W3C XML Schema recommendation provides a means of specifying formal data
  typing and validation of element content in terms of data types, so that document type
  designers can provide criteria for checking the data content of elements as well as the
  markup itself. Schemas are written in XML Document Syntax, like XML documents are,
  avoiding the need for processing software to be able to read XML Declaration Syntax
  (used for DTDs).
  There is a separate Schema FAQ at http://www.schemavalid.comFAQ. The term
  ‘vocabulary’ is sometimes used to refer to DTDs and Schemas together. Schemas are
  aimed at e-commerce, data control, and database-style applications where character data
  content requires validation and where stricter data control is needed than is possible with
  DTDs; or where strong data typing is required. They are usually unnecessary for
  traditional text document publishing applications.
  Unlike DTDs, Schemas cannot be specified in an XML Document Type Declaration.
  They can be specified in a Namespace, where Schema-aware software should pick it up,
  but this is optional:
  <invoice id="abc123"
  xmlns="http://example.org/ns/books/"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://acme.wilycoyote.org/xsd/invoice.xsd">
  ...
  </invoice>
  More commonly, you specify the Schema in your processing software, which should
  record separately which Schema is used by which XML document instance.
  In contrast to the complexity of the W3C Schema model, Relax NG is a lightweight,
  easy-to-use XML schema language devised by James Clark (see http://relaxng.org/) with
  development hosted by OASIS. It allows similar richness of expression and the use of
  XML as its syntax, but it provides an additional, simplified, syntax which is easier to use
  for those accustomed to DTDs.

23. If XML is just a subset of SGML, can I use XML files directly with existing SGML tools?

Yes, provided you use up-to-date SGML software which knows about the WebSGML
  Adaptations TC to ISO 8879 (the features needed to support XML, such as the variant
  form for EMPTY elements; some aspects of the SGML Declaration such as NAMECASE
  GENERAL NO; multiple attribute token list declarations, etc).
  An alternative is to use an SGML DTD to let you create a fully-normalised SGML file,
  but one which does not use empty elements; and then remove the DocType Declaration
  so it becomes a well-formed DTDless XML file. Most SGML tools now handle XML
  files well, and provide an option switch between the two standards.

24. Is there an XML version of HTML?

Yes, the W3C recommends using XHTML which is ‘a reformulation of HTML 4 in
  XML 1.0’. This specification defines HTML as an XML application, and provides three
  DTDs corresponding to the ones defined by HTML 4.* (Strict, Transitional, and
  Frameset).
  The semantics of the elements and their attributes are as defined in the W3C
  Recommendation for HTML 4. These semantics provide the foundation for future
  extensibility of XHTML. Compatibility with existing HTML browsers is possible by
  following a small set of guidelines (see the W3C site).

25. Using XSLT, how would you extract a specific attribute from an element in an XMLdocument?

Successful candidates should recognize this as one of the most basic applications of
  XSLT. If they are not able to construct a reply similar to the example below, they should
  at least be able to identify the components necessary for this operation: xsl:template to
  match the appropriate XML element, xsl:value-of to select the attribute value, and the
  optional xsl:apply-templates to continue processing the document.
  Extract Attributes from XML Data
  Example 1.
  <xsl:template match="element-name">
  Attribute Value:

<xsl:value-of select="@attribute"/>
  <xsl:apply-templates/>
  </xsl:template>

26. What are the special characters in XML ?

For normal text (not markup), there are no special characters: just make sure your
  document refers to the correct encoding scheme for the language and/or writing system
  you want to use, and that your computer correctly stores the file using that encoding
  scheme. See the question on non-Latin characters for a longer explanation.
  If your keyboard will not allow you to type the characters you want, or if you want to use
  characters outside the limits of the encoding scheme you have chosen, you can use a
  symbolic notation called ‘entity referencing’. Entity references can either be numeric,

using the decimal or hexadecimal Unicode code point for the character (eg if your
  keyboard has no Euro symbol (€) you can type €); or they can be character, using an
  established name which you declare in your DTD (eg ) and then use as € in your
  document. If you are using a Schema, you must use the numeric form for all except the
  five below because Schemas have no way to make character entity declarations.
  If you use XML with no DTD, then these five character entities are assumed to be
  predeclared, and you can use them without declaring them:
  &lt;
  The less-than character (<) starts element markup (the first character of a start-tag or an
  end-tag).
  &amp;
  The ampersand character (>) starts entity markup (the first character of a character entity
  reference).
  &gt;
  The greater-than character (>) ends a start-tag or an end-tag.
  &quot;
  The double-quote character (") can be symbolised with this character entity reference
  when you need to embed a double-quote inside a string which is already double-quoted.
  '
  The apostrophe or single-quote character (') can be symbolised with this character entity
  reference when you need to embed a single-quote or apostrophe inside a string which is
  already single-quoted.
  If you are using a DTD then you must declare all the character entities you need to use (if
  any), including any of the five above that you plan on using (they cease to be predeclared
  if you use a DTD). If you are using a Schema, you must use the numeric form for all
  except the five above because Schemas have no way to make character entity
  declarations.

27. What does an XML document actually look like (inside)?

The basic structure of XML is similar to other applications of SGML, including HTML.
  The basic components can be seen in the following examples. An XML document starts
  with a Prolog:
  1. The XML Declaration
  which specifies that this is an XML document;
  2. Optionally a Document Type Declaration
  which identifies the type of document and says where the Document Type Description
  (DTD) is stored;
  The Prolog is followed by the document instance:
  1. A root element, which is the outermost (top level) element (start-tag plus end-tag)
  which encloses everything else: in the examples below the root elements are conversation
  and titlepage;
  2. A structured mix of descriptive or prescriptive elements enclosing the character data
  content (text), and optionally any attributes (‘name=value’ pairs) inside some start-tags.
  XML documents can be very simple, with straightforward nested markup of your own
  design:
  <?xml version="1.0" standalone="yes"?>
  <conversation><br>
  <greeting>Hello, world!</greeting>
  <response>Stop the planet, I want to get
  off!</response>
  </conversation>
  Or they can be more complicated, with a Schema or question C.11, Document Type
  Description (DTD) or internal subset (local DTD changes in [square brackets]), and an
  arbitrarily complex nested structure:
  <?xml version="1.0" encoding="iso-8859-1"?>
  <!DOCTYPE titlepage
  SYSTEM "http://www.google.bar/dtds/typo.dtd"
  [<!ENTITY % active.links "INCLUDE">]>
  <titlepage id="BG12273624">
  <white-space type="vertical" amount="36"/>

<title font="Baskerville" alignment="centered"
  size="24/30">Hello, world!</title>
  <white-space type="vertical" amount="12"/>
  <!-- In some copies the following
  decoration is hand-colored, presumably
  by the author -->
  <image location="http://www.google.bar/fleuron.eps"
  type="URI" alignment="centered"/>
  <white-space type="vertical" amount="24"/>
  <author font="Baskerville" size="18/22"
  style="italic">Vitam capias</author>
  <white-space type="vertical" role="filler"/>
  </titlepage>
  Or they can be anywhere between: a lot will depend on how you want to define your
  document type (or whose you use) and what it will be used for. Database-generated or
  program-generated XML documents used in e-commerce is usually unformatted (not for
  human reading) and may use very long names or values, with multiple redundancy and
  sometimes no character data content at all, just values in attributes:
  <?xml version="1.0"?> <ORDER-UPDATE
  AUTHMD5="4baf7d7cff5faa3ce67acf66ccda8248"
  ORDER-UPDATE-ISSUE="193E22C2-EAF3-11D9-9736-CAFC705A30B3"
  ORDER-UPDATE-DATE="2005-07-01T15:34:22.46" ORDER-UPDATEDESTINATION="
  6B197E02-EAF3-11D9-85D5-997710D9978F"
  ORDER-UPDATE-ORDERNO="8316ADEA-EAF3-11D9-9955-D289ECBC99F3">
  <ORDER-UPDATE-DELTA-MODIFICATION-DETAIL ORDER-UPDATEID="
  BAC352437484">
  <ORDER-UPDATE-DELTA-MODIFICATION-VALUE ORDER-UPDATEITEM="
  56"
  ORDER-UPDATE-QUANTITY="2000"/>
  </ORDER-UPDATE-DELTA-MODIFICATION-DETAIL>
  </ORDER-UPDATE>

28. What is a markup language?

A markup language is a set of words and symbols for describing the identity of pieces of
  a document (for example ‘this is a paragraph’, ‘this is a heading’, ‘this is a list’, ‘this is
  the caption of this figure’, etc). Programs can use this with a style sheet to create output
  for screen, print, audio, video, Braille, etc.
  Some markup languages (e.g. those used in word processors) only describe appearances
  (‘this is italics’, ‘this is bold’), but this method can only be used for display, and is not
  normally re-usable for anything else.

29. What is DOM and how does it relate to XML?

The Document Object Model (DOM) is an interface specification maintained by the W3C
  DOM Workgroup that defines an application independent mechanism to access, parse, or
  update XML data. In simple terms it is a hierarchical model that allows developers to
  manipulate XML documents easily Any developer that has worked extensively with
  XML should be able to discuss the concept and use of DOM objects freely. Additionally,
  it is not unreasonable to expect advanced candidates to thoroughly understand its internal
  workings and be able to explain how DOM differs from an event-based interface like
  SAX.

30. What is SGML?

SGML is the Standard Generalized Markup Language (ISO 8879:1986), the international
  standard for defining descriptions of the structure of different types of electronic
  document. There is an SGML FAQ from David Megginson at
  http://math.albany.edu:8800/hm/sgml/cts-faq.htmlFAQ; and Robin Cover's SGML Web
  pages are at http://www.oasis-open.org/cover/general.html. For a little light relief, try Joe
  English's ‘Not the SGML FAQ’ at http://www.flightlab.com/~joe/sgml/faq-not.txtFAQ.
  SGML is very large, powerful, and complex. It has been in heavy industrial and
  commercial use for nearly two decades, and there is a significant body of expertise and
  software to go with it.
  XML is a lightweight cut-down version of SGML which keeps enough of its
  functionality to make it useful but removes all the optional features which made SGML
  too complex to program for in a Web environment.

31. What is SOAP and how does it relate to XML?

The Simple Object Access Protocol (SOAP) uses XML to define a protocol for the
  exchange of information in distributed computing environments. SOAP consists of three
  components: an envelope, a set of encoding rules, and a convention for representing
  remote procedure calls. Unless experience with SOAP is a direct requirement for the
  open position, knowing the specifics of the protocol, or how it can be used in conjunction
  with HTTP, is not as important as identifying it as a natural application of XML

32. What is the difference between XML and C or C++ or Java ?

C and C++ (and other languages like FORTRAN, or Pascal, or Visual Basic, or Java or
  hundreds more) are programming languages with which you specify calculations, actions,
  and decisions to be carried out in order:
  mod curconfig[if left(date,6) = "01-Apr",
  t.put "April googlel!",
  f.put days('31102005','DDMMYYYY') -
  days(sdate,'DDMMYYYY')
  " more shopping days to Samhain"];
  XML is a markup specification language with which you can design ways of describing
  information (text or data), usually for storage, transmission, or processing by a program.
  It says nothing about what you should do with the data (although your choice of element
  names may hint at what they are for):
  <part num="DA42" models="LS AR DF HG KJ"
  update="2001-11-22">
  <name>Camshaft end bearing retention circlip</name>
  <image drawing="RR98-dh37" type="SVG" x="476"
  y="226"/> <maker id="RQ778">Ringtown Fasteners Ltd</maker>
  <notes>Angle-nosed insertion tool <tool
  id="GH25"/> is required for the removal
  and replacement of this part.</notes>
  </part>
  On its own, an SGML or XML file (including HTML) doesn't do anything. It's a data
  format which just sits there until you run a program which does something with it.

33. What is the relationship between XML namespaces and the XML 1.0 recommendation?

Although the XML 1.0 recommendation anticipated the need for XML namespaces by
  noting that element type and attribute names should not include colons, it did not actually
  support XML namespaces. Thus, XML namespaces are layered on top of XML 1.0. In
  particular, any XML document that uses XML namespaces is a legal XML 1.0 document
  and can be interpreted as such in the absence of XML namespaces. For example, consider
  the following document:
  <google:A xmlns:google="http://www.google.org/">
  <google:B google:C="bar"/>
  </google:A>
  If this document is processed by a namespace-unaware processor, that processor will see
  two elements whose names are google:A and google:B. The google:A element has an
  attribute named xmlns:google and the google:B element has an attribute named google:C.
  On the other hand, a namespace-aware processor will see two elements with universal
  names {http://www.google.org}A and {http://www.google.org}B. The
  {http://www.google.org}A does not have any attributes; instead, it has a namespace
  declaration that maps the google prefix to the URI http://www.google.org. The
  {http://www.google.org}B element has an attribute named {http://www.google.org}C.
  Needless to say, this has led to a certain amount of confusion. One area of confusion is
  the relationship between XML namespaces and validating XML documents against
  DTDs. This occurs because the XML namespaces recommendation did not describe how
  to use XML namespaces with DTDs. Fortunately, a similar situation does not occur with
  XML schema languages, as all of these support XML namespaces.
  The other main area of confusion is in recommendations and specifications such as DOM
  and SAX whose first version predates the XML namespaces recommendation. Although
  these have since been updated to include XML namespace support, the solutions have not
  always been pretty due to backwards compatibility requirements. All recommendations in
  the XML family now support XML namespaces.

34. What is XML?

XML is the Extensible Markup Language. It improves the functionality of the Web by
  letting you identify your information in a more accurate, flexible, and adaptable way.
  It is extensible because it is not a fixed format like HTML (which is a single, predefined
  markup language). Instead, XML is actually a Meta language—a language for describing
  other languages—which lets you designs your own markup languages for limitless
  different types of documents. XML can do this because it's written in SGML, the
  international standard Meta language for text document markup (ISO 8879).

35. What's a Document Type Definition (DTD) and where do I get one?

A DTD is a description in XML Declaration Syntax of a particular type or class of
  document. It sets out what names are to be used for the different types of element, where
  they may occur, and how they all fit together. (A question C.16, Schema does the same
  thing in XML Document Syntax, and allows more extensive data-checking.)

For example, if you want a document type to be able to describe Lists which contain
  Items, the relevant part of your DTD might contain something like this:
  <!ELEMENT List (Item)+>
  <!ELEMENT Item (#PCDATA)>
  This defines a list as an element type containing one or more items (that's the plus sign);
  and it defines items as element types containing just plain text (Parsed Character Data or
  PCDATA). Validators read the DTD before they read your document so that they can
  identify where every element type ought to come and how each relates to the other, so
  that applications which need to know this in advance (most editors, search engines,
  navigators, and databases) can set themselves up correctly. The example above lets you
  create lists like:
  <List>
  <Item>Chocolate</Item>
  <Item>Music</Item>
  <Item>Surfingv</Item>
  </List>
  (The indentation in the example is just for legibility while editing: it is not required by
  XML.)
  A DTD provides applications with advance notice of what names and structures can be
  used in a particular document type. Using a DTD and a validating editor means you can
  be certain that all documents of that particular type will be constructed and named in a
  consistent and conformant manner.
  DTDs are not required for processing the tip in question Bwell-formed documents, but
  they are needed if you want to take advantage of XML's special attribute types like the
  built-in ID/IDREF cross-reference mechanism; or the use of default attribute values; or
  references to external non-XML files (‘Notations’); or if you simply want a check on
  document validity before processing.
  There are thousands of DTDs already in existence in all kinds of areas (see the
  SGML/XML Web pages for pointers). Many of them can be downloaded and used freely;
  or you can write your own (see the question on creating your own DTD. Old SGML
  DTDs need to be converted to XML for use with XML systems: read the question on
  converting SGML DTDs to XML, but most popular SGML DTDs are already available
  in XML form.
  The alternatives to a DTD are various forms of question C.16, Schema. These provide
  more extensive validation features than DTDs, including character data content
  validation.

36. When should I use a CDATA Marked Section?

You should almost never need to use CDATA Sections. The CDATA mechanism was
  designed to let an author quote fragments of text containing markup characters (the openangle-
  bracket and the ampersand), for example when documenting XML (this FAQ uses
  CDATA Sections quite a lot, for obvious reasons). A CDATA Section turns off markup
  recognition for the duration of the section (it gets turned on again only by the closing
  sequence of double end-square-brackets and a close-angle-bracket).
  Consequently, nothing in a CDATA section can ever be recognised as anything to do
  with markup: it's just a string of opaque characters, and if you use an XML
  transformation language like XSLT, any markup characters in it will get turned into their
  character entity equivalent.
  If you try, for example, to use:
  some text with <![CDATA[markup]]> in it.
  in the expectation that the embedded markup would remain untouched, it won't: it will
  just output
  some text with <em>markup</em> in it.
  In other words, CDATA Sections cannot preserve the embedded markup as markup.
  Normally this is exactly what you want because this technique was designed to let people
  do things like write documentation about markup. It was not designed to allow the
  passing of little chunks of (possibly invalid) unparsed HTML embedded inside your own
  XML through to a subsequent process—because that would risk invalidating the output.
  As a result you cannot expect to keep markup untouched simply because it looked as if it

was safely ‘hidden’ inside a CDATA section: it can't be used as a magic shield to
  preserve HTML markup for future use as markup, only as characters.

37. Where should I use XML?

Its goal is to enable generic SGML to be served, received, and processed on the Web in
  the way that is now possible with HTML. XML has been designed for ease of
  implementation and for interoperability with both SGML and HTML.
  Despite early attempts, browsers never allowed other SGML, only HTML (although there
  were plugins), and they allowed it (even encouraged it) to be corrupted or broken, which
  held development back for over a decade by making it impossible to program for it
  reliably. XML fixes that by making it compulsory to stick to the rules, and by making the
  rules much simpler than SGML.
  But XML is not just for Web pages: in fact it's very rarely used for Web pages on its own
  because browsers still don't provide reliable support for formatting and transforming it.
  Common uses for XML include:
  Information identification
  because you can define your own markup, you can define meaningful names for all your
  information items. Information storage
  because XML is portable and non-proprietary, it can be used to store textual information
  across any platform. Because it is backed by an international standard, it will remain
  accessible and processable as a data format. Information structure
  XML can therefore be used to store and identify any kind of (hierarchical) information
  structure, especially for long, deep, or complex document sets or data sources, making it
  ideal for an information-management back-end to serving the Web. This is its most
  common Web application, with a transformation system to serve it as HTML until such
  time as browsers are able to handle XML consistently. Publishing
  The original goal of XML as defined in the quotation at the start of this section.
  Combining the three previous topics (identity, storage, structure) means it is possible to
  get all the benefits of robust document management and control (with XML) and publish
  to the Web (as HTML) as well as to paper (as PDF) and to other formats (e.g. Braille,

Audio, etc) from a single source document by using the appropriate style sheets.
  Messaging and data transfer
  XML is also very heavily used for enclosing or encapsulating information in order to pass
  it between different computing systems which would otherwise be unable to
  communicate. By providing a lingua franca for data identity and structure, it provides a
  common envelope for inter-process communication (messaging). Web services
  Building on all of these, as well as its use in browsers, machine-processable data can be
  exchanged between consenting systems, where before it was only comprehensible by
  humans (HTML). Weather services, e-commerce sites, blog newsfeeds, AJAX sites, and
  thousands of other data-exchange services use XML for data management and
  transmission, and the web browser for display and interaction.

38. Which parts of an XML document are case-sensitive?

All of it, both markup and text. This is significantly different from HTML and most other
  SGML applications. It was done to allow markup in non-Latin-alphabet languages, and to
  obviate problems with case-folding in writing systems which are caseless.
  * Element type names are case-sensitive: you must follow whatever combination of
  upper- or lower-case you use to define them (either by first usage or in a DTD or
  Schema). So you can't say <BODY>…</body>: upper- and lower-case must match; thus
  <Img/>, <IMG/>, and <img/> are three different element types;
  * For well-formed XML documents with no DTD, the first occurrence of an element type
  name defines the casing;
  * Attribute names are also case-sensitive, for example the two width attributes in <PIC
  width="7in"/> and <PIC WIDTH="6in"/> (if they occurred in the same file) are separate
  attributes, because of the different case of width and WIDTH;
  * Attribute values are also case-sensitive. CDATA values (eg Url="MyFile.SGML")
  always have been, but NAME types (ID and IDREF attributes, and token list attributes)
  are now case-sensitive as well;
  * All general and parameter entity names (eg Á), and your data content (text), are casesensitive
  as always.

39. Who is responsible for XML?

XML is a project of the World Wide Web Consortium (W3C), and the development of
  the specification is supervised by an XML Working Group. A Special Interest Group of
  co-opted contributors and experts from various fields contributed comments and reviews
  by email.
  XML is a public format: it is not a proprietary development of any company, although the
  membership of the WG and the SIG represented companies as well as research and
  academic institutions. The v1.0 specification was accepted by the W3C as a
  Recommendation on Feb 10, 1998.

40. Why is XML such an important development?

It removes two constraints which were holding back Web developments:
  1. dependence on a single, inflexible document type (HTML) which was being much
  abused for tasks it was never designed for;
  2. the complexity of full SGML, whose syntax allows many powerful but hard-toprogram
  options.
  XML allows the flexible development of user-defined document types. It provides a
  robust, non-proprietary, persistent, and verifiable file format for the storage and
  transmission of text and data both on and off the Web; and it removes the more complex
  options of SGML, making it easier to program for.