解析 XML 数据

可扩展标记语言 (XML) 是一组用于将文档编码为机器可读格式的规则。XML 是一种在互联网上共享数据的常用格式。

经常更新内容的网站（例如新闻网站或博客）通常会提供 XML feed，以便外部程序可以及时了解内容变化。上传和解析 XML 数据是联网应用的一项常见任务。本文档介绍了如何解析 XML 文档并使用其中的数据。

要详细了解如何在您的 Android 应用中创建基于网页的内容，请参阅基于网页的内容。

选择解析器

我们推荐 XmlPullParser，它是一种在 Android 上解析 XML 的高效且易于维护的方式。Android 提供了此接口的两种实现

KXmlParser，使用 XmlPullParserFactory.newPullParser()
ExpatPullParser，使用 Xml.newPullParser()

两种选择均可。本节中的示例使用 ExpatPullParser 和 Xml.newPullParser()。

分析 Feed

解析 feed 的第一步是确定您感兴趣的字段。解析器将提取这些字段的数据，并忽略其余字段。

请查看示例应用中已解析 feed 的以下摘录。发到 StackOverflow.com 的每篇帖子都以 entry 标签的形式显示在 feed 中，其中包含几个嵌套标签

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" ...">
<title type="text">newest questions tagged android - Stack Overflow</title>
...
    <entry>
    ...
    </entry>
    <entry>
        <id>http://stackoverflow.com/q/9439999</id>
        <re:rank scheme="http://stackoverflow.com">0</re:rank>
        <title type="text">Where is my data file?</title>
        <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="android"/>
        <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="file"/>
        <author>
            <name>cliff2310</name>
            <uri>http://stackoverflow.com/users/1128925</uri>
        </author>
        <link rel="alternate" href="http://stackoverflow.com/questions/9439999/where-is-my-data-file" />
        <published>2012-02-25T00:30:54Z</published>
        <updated>2012-02-25T00:30:54Z</updated>
        <summary type="html">
            <p>I have an Application that requires a data file...</p>

        </summary>
    </entry>
    <entry>
    ...
    </entry>
...
</feed>

示例应用会提取 entry 标签及其嵌套标签 title、link 和 summary 的数据。

实例化解析器

解析 feed 的下一步是实例化解析器并启动解析过程。此代码段初始化一个解析器，使其不处理命名空间，并使用提供的 InputStream 作为输入。它通过调用 nextTag() 启动解析过程，并调用 readFeed() 方法，该方法提取和处理应用感兴趣的数据

Kotlin

// We don't use namespaces.
private val ns: String? = null

class StackOverflowXmlParser {

    @Throws(XmlPullParserException::class, IOException::class)
    fun parse(inputStream: InputStream): List<*> {
        inputStream.use { inputStream ->
            val parser: XmlPullParser = Xml.newPullParser()
            parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false)
            parser.setInput(inputStream, null)
            parser.nextTag()
            return readFeed(parser)
        }
    }
 ...
}

Java

public class StackOverflowXmlParser {
    // We don't use namespaces.
    private static final String ns = null;

    public List parse(InputStream in) throws XmlPullParserException, IOException {
        try {
            XmlPullParser parser = Xml.newPullParser();
            parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false);
            parser.setInput(in, null);
            parser.nextTag();
            return readFeed(parser);
        } finally {
            in.close();
        }
    }
 ...
}

读取 Feed

readFeed() 方法执行处理 feed 的实际工作。它查找标记为“entry”的元素，作为递归处理 feed 的起点。如果某个标签不是 entry 标签，则跳过它。递归处理完整个 feed 后，readFeed() 返回一个包含从 feed 中提取的条目（包括嵌套数据成员）的 List。然后，此 List 由解析器返回。

Kotlin

@Throws(XmlPullParserException::class, IOException::class)
private fun readFeed(parser: XmlPullParser): List<Entry> {
    val entries = mutableListOf<Entry>()

    parser.require(XmlPullParser.START_TAG, ns, "feed")
    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.eventType != XmlPullParser.START_TAG) {
            continue
        }
        // Starts by looking for the entry tag.
        if (parser.name == "entry") {
            entries.add(readEntry(parser))
        } else {
            skip(parser)
        }
    }
    return entries
}

Java

private List readFeed(XmlPullParser parser) throws XmlPullParserException, IOException {
    List entries = new ArrayList();

    parser.require(XmlPullParser.START_TAG, ns, "feed");
    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.getEventType() != XmlPullParser.START_TAG) {
            continue;
        }
        String name = parser.getName();
        // Starts by looking for the entry tag.
        if (name.equals("entry")) {
            entries.add(readEntry(parser));
        } else {
            skip(parser);
        }
    }
    return entries;
}

解析 XML

解析 XML feed 的步骤如下：

如分析 feed 中所述，确定您想要包含在应用中的标签。此示例提取 entry 标签及其嵌套标签 title、link 和 summary 的数据。
创建以下方法
- 为您要包含的每个标签创建一个“read”方法，例如 readEntry() 和 readTitle()。解析器从输入流中读取标签。当遇到（在此示例中）名为 entry、title、link 或 summary 的标签时，它会调用该标签的相应方法。否则，会跳过该标签。
- 用于提取每种不同类型标签的数据并将解析器推进到下一个标签的方法。在此示例中，相关方法如下：
  - 对于 title 和 summary 标签，解析器会调用 readText()。此方法通过调用 parser.getText() 来提取这些标签的数据。
  - 对于 link 标签，解析器会首先确定链接是否属于它感兴趣的类型，然后提取链接数据。然后使用 parser.getAttributeValue() 提取链接的值。
  - 对于 entry 标签，解析器会调用 readEntry()。此方法解析 entry 的嵌套标签，并返回一个包含 title、link 和 summary 数据成员的 Entry 对象。
- 一个递归的辅助方法 skip()。有关此主题的更多讨论，请参阅跳过您不关心的标签。

此代码段展示了解析器如何解析条目、标题、链接和摘要。

Kotlin

data class Entry(val title: String?, val summary: String?, val link: String?)

// Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off
// to their respective "read" methods for processing. Otherwise, skips the tag.
@Throws(XmlPullParserException::class, IOException::class)
private fun readEntry(parser: XmlPullParser): Entry {
    parser.require(XmlPullParser.START_TAG, ns, "entry")
    var title: String? = null
    var summary: String? = null
    var link: String? = null
    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.eventType != XmlPullParser.START_TAG) {
            continue
        }
        when (parser.name) {
            "title" -> title = readTitle(parser)
            "summary" -> summary = readSummary(parser)
            "link" -> link = readLink(parser)
            else -> skip(parser)
        }
    }
    return Entry(title, summary, link)
}

// Processes title tags in the feed.
@Throws(IOException::class, XmlPullParserException::class)
private fun readTitle(parser: XmlPullParser): String {
    parser.require(XmlPullParser.START_TAG, ns, "title")
    val title = readText(parser)
    parser.require(XmlPullParser.END_TAG, ns, "title")
    return title
}

// Processes link tags in the feed.
@Throws(IOException::class, XmlPullParserException::class)
private fun readLink(parser: XmlPullParser): String {
    var link = ""
    parser.require(XmlPullParser.START_TAG, ns, "link")
    val tag = parser.name
    val relType = parser.getAttributeValue(null, "rel")
    if (tag == "link") {
        if (relType == "alternate") {
            link = parser.getAttributeValue(null, "href")
            parser.nextTag()
        }
    }
    parser.require(XmlPullParser.END_TAG, ns, "link")
    return link
}

// Processes summary tags in the feed.
@Throws(IOException::class, XmlPullParserException::class)
private fun readSummary(parser: XmlPullParser): String {
    parser.require(XmlPullParser.START_TAG, ns, "summary")
    val summary = readText(parser)
    parser.require(XmlPullParser.END_TAG, ns, "summary")
    return summary
}

// For the tags title and summary, extracts their text values.
@Throws(IOException::class, XmlPullParserException::class)
private fun readText(parser: XmlPullParser): String {
    var result = ""
    if (parser.next() == XmlPullParser.TEXT) {
        result = parser.text
        parser.nextTag()
    }
    return result
}
...

Java

public static class Entry {
    public final String title;
    public final String link;
    public final String summary;

    private Entry(String title, String summary, String link) {
        this.title = title;
        this.summary = summary;
        this.link = link;
    }
}

// Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off
// to their respective "read" methods for processing. Otherwise, skips the tag.
private Entry readEntry(XmlPullParser parser) throws XmlPullParserException, IOException {
    parser.require(XmlPullParser.START_TAG, ns, "entry");
    String title = null;
    String summary = null;
    String link = null;
    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.getEventType() != XmlPullParser.START_TAG) {
            continue;
        }
        String name = parser.getName();
        if (name.equals("title")) {
            title = readTitle(parser);
        } else if (name.equals("summary")) {
            summary = readSummary(parser);
        } else if (name.equals("link")) {
            link = readLink(parser);
        } else {
            skip(parser);
        }
    }
    return new Entry(title, summary, link);
}

// Processes title tags in the feed.
private String readTitle(XmlPullParser parser) throws IOException, XmlPullParserException {
    parser.require(XmlPullParser.START_TAG, ns, "title");
    String title = readText(parser);
    parser.require(XmlPullParser.END_TAG, ns, "title");
    return title;
}

// Processes link tags in the feed.
private String readLink(XmlPullParser parser) throws IOException, XmlPullParserException {
    String link = "";
    parser.require(XmlPullParser.START_TAG, ns, "link");
    String tag = parser.getName();
    String relType = parser.getAttributeValue(null, "rel");
    if (tag.equals("link")) {
        if (relType.equals("alternate")){
            link = parser.getAttributeValue(null, "href");
            parser.nextTag();
        }
    }
    parser.require(XmlPullParser.END_TAG, ns, "link");
    return link;
}

// Processes summary tags in the feed.
private String readSummary(XmlPullParser parser) throws IOException, XmlPullParserException {
    parser.require(XmlPullParser.START_TAG, ns, "summary");
    String summary = readText(parser);
    parser.require(XmlPullParser.END_TAG, ns, "summary");
    return summary;
}

// For the tags title and summary, extracts their text values.
private String readText(XmlPullParser parser) throws IOException, XmlPullParserException {
    String result = "";
    if (parser.next() == XmlPullParser.TEXT) {
        result = parser.getText();
        parser.nextTag();
    }
    return result;
}
  ...
}

跳过您不关心的标签

解析器需要跳过它不感兴趣的标签。以下是解析器的 skip() 方法

Kotlin

@Throws(XmlPullParserException::class, IOException::class)
private fun skip(parser: XmlPullParser) {
    if (parser.eventType != XmlPullParser.START_TAG) {
        throw IllegalStateException()
    }
    var depth = 1
    while (depth != 0) {
        when (parser.next()) {
            XmlPullParser.END_TAG -> depth--
            XmlPullParser.START_TAG -> depth++
        }
    }
}

Java

private void skip(XmlPullParser parser) throws XmlPullParserException, IOException {
    if (parser.getEventType() != XmlPullParser.START_TAG) {
        throw new IllegalStateException();
    }
    int depth = 1;
    while (depth != 0) {
        switch (parser.next()) {
        case XmlPullParser.END_TAG:
            depth--;
            break;
        case XmlPullParser.START_TAG:
            depth++;
            break;
        }
    }
 }

其工作方式如下：

如果当前事件不是 START_TAG，则抛出异常。
它会消耗 START_TAG 以及从 START_TAG 到匹配的 END_TAG（包括 END_TAG）的所有事件。
它会跟踪嵌套深度，以确保在正确的 END_TAG 处停止，而不是在遇到原始 START_TAG 后的第一个标签处停止。

因此，如果当前元素包含嵌套元素，则 depth 的值不会是 0，直到解析器消耗完原始 START_TAG 与其匹配的 END_TAG 之间的所有事件。例如，考虑解析器如何跳过包含两个嵌套元素 <name> 和 <uri> 的 <author> 元素：

在第一个 while 循环中，解析器在 <author> 之后遇到的下一个标签是 <name> 的 START_TAG。depth 的值增加到 2。
在第二个 while 循环中，解析器遇到的下一个标签是 END_TAG </name>。depth 的值减小到 1。
在第三个 while 循环中，解析器遇到的下一个标签是 START_TAG <uri>。depth 的值增加到 2。
在第四个 while 循环中，解析器遇到的下一个标签是 END_TAG </uri>。depth 的值减小到 1。
在第五个也是最后一个 while 循环中，解析器遇到的下一个标签是 END_TAG </author>。depth 的值减小到 0，表明 <author> 元素已成功跳过。

使用 XML 数据

示例应用会异步获取和解析 XML feed。这会将处理过程从主 UI 线程中移除。处理完成后，应用会更新其主 Activity (NetworkActivity) 中的 UI。

在以下摘录中，loadPage() 方法执行以下操作：

使用 XML feed 的 URL 初始化一个字符串变量。
如果用户的设置和网络连接允许，则调用 downloadXml(url) 方法。此方法下载并解析 feed，并返回一个要显示在 UI 中的字符串结果。

Kotlin

class NetworkActivity : Activity() {

    companion object {

        const val WIFI = "Wi-Fi"
        const val ANY = "Any"
        const val SO_URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest"
        // Whether there is a Wi-Fi connection.
        private var wifiConnected = false
        // Whether there is a mobile connection.
        private var mobileConnected = false

        // Whether the display should be refreshed.
        var refreshDisplay = true
        // The user's current network preference setting.
        var sPref: String? = null
    }
    ...
    // Asynchronously downloads the XML feed from stackoverflow.com.
    fun loadPage() {

        if (sPref.equals(ANY) && (wifiConnected || mobileConnected)) {
            downloadXml(SO_URL)
        } else if (sPref.equals(WIFI) && wifiConnected) {
            downloadXml(SO_URL)
        } else {
            // Show error.
        }
    }
    ...
}

Java

public class NetworkActivity extends Activity {
    public static final String WIFI = "Wi-Fi";
    public static final String ANY = "Any";
    private static final String URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest";

    // Whether there is a Wi-Fi connection.
    private static boolean wifiConnected = false;
    // Whether there is a mobile connection.
    private static boolean mobileConnected = false;
    // Whether the display should be refreshed.
    public static boolean refreshDisplay = true;
    public static String sPref = null;
    ...
    // Asynchronously downloads the XML feed from stackoverflow.com.
    public void loadPage() {

        if((sPref.equals(ANY)) && (wifiConnected || mobileConnected)) {
            downloadXml(URL);
        }
        else if ((sPref.equals(WIFI)) && (wifiConnected)) {
            downloadXml(URL);
        } else {
            // Show error.
        }
    }

downloadXml 方法调用 Kotlin 中的以下方法：

lifecycleScope.launch(Dispatchers.IO)，它使用 Kotlin 协程在 IO 线程上启动方法 loadXmlFromNetwork()。它将 feed URL 作为参数传递。方法 loadXmlFromNetwork() 获取并处理 feed。完成后，它会返回一个结果字符串。
withContext(Dispatchers.Main)，它使用 Kotlin 协程返回主线程，获取返回的字符串，并在 UI 中显示。

在 Java 编程语言中，过程如下：

一个 Executor 在后台线程上执行 loadXmlFromNetwork() 方法。它将 feed URL 作为参数传递。方法 loadXmlFromNetwork() 获取并处理 feed。完成后，它会返回一个结果字符串。
一个 Handler 调用 post 返回主线程，获取返回的字符串，并在 UI 中显示。

Kotlin

// Implementation of Kotlin coroutines used to download XML feed from stackoverflow.com.
private fun downloadXml(vararg urls: String) {
    var result: String? = null
    lifecycleScope.launch(Dispatchers.IO) {
        result = try {
            loadXmlFromNetwork(urls[0])
        } catch (e: IOException) {
            resources.getString(R.string.connection_error)
        } catch (e: XmlPullParserException) {
            resources.getString(R.string.xml_error)
        }
        withContext(Dispatchers.Main) {
            setContentView(R.layout.main)
            // Displays the HTML string in the UI via a WebView.
            findViewById<WebView>(R.id.webview)?.apply {
                loadData(result?: "", "text/html", null)
            }
        }
    }
}

Java

// Implementation of Executor and Handler used to download XML feed asynchronously from stackoverflow.com.
private void downloadXml(String... urls) {
    ExecutorService executor = Executors.newSingleThreadExecutor();
    Handler handler = new Handler(Looper.getMainLooper());
    executor.execute(() -> {
        String result;
            try {
                result = loadXmlFromNetwork(urls[0]);
            } catch (IOException e) {
                result = getResources().getString(R.string.connection_error);
            } catch (XmlPullParserException e) {
                result = getResources().getString(R.string.xml_error);
            }
        String finalResult = result;
        handler.post(() -> {
            setContentView(R.layout.main);
            // Displays the HTML string in the UI via a WebView.
            WebView myWebView = (WebView) findViewById(R.id.webview);
            myWebView.loadData(finalResult, "text/html", null);
        });
    });
}

从 downloadXml 调用的 loadXmlFromNetwork() 方法在下一个代码段中显示。它执行以下操作：

实例化一个 StackOverflowXmlParser。它还创建了用于存储 List 类型的 Entry 对象 (entries) 以及用于存储从 XML feed 中提取的 title、url 和 summary 值的变量。
调用 downloadUrl()，该方法获取 feed 并将其作为 InputStream 返回。
使用 StackOverflowXmlParser 解析 InputStream。StackOverflowXmlParser 使用 feed 中的数据填充 entries List。
处理 entries List 并将 feed 数据与 HTML 标记组合。
返回一个 HTML 字符串，该字符串显示在主 Activity UI 中。

Kotlin

// Uploads XML from stackoverflow.com, parses it, and combines it with
// HTML markup. Returns HTML string.
@Throws(XmlPullParserException::class, IOException::class)
private fun loadXmlFromNetwork(urlString: String): String {
    // Checks whether the user set the preference to include summary text.
    val pref: Boolean = PreferenceManager.getDefaultSharedPreferences(this)?.run {
        getBoolean("summaryPref", false)
    } ?: false

    val entries: List<Entry> = downloadUrl(urlString)?.use { stream ->
        // Instantiates the parser.
        StackOverflowXmlParser().parse(stream)
    } ?: emptyList()

    return StringBuilder().apply {
        append("<h3>${resources.getString(R.string.page_title)}</h3>")
        append("<em>${resources.getString(R.string.updated)} ")
        append("${formatter.format(rightNow.time)}</em>")
        // StackOverflowXmlParser returns a List (called "entries") of Entry objects.
        // Each Entry object represents a single post in the XML feed.
        // This section processes the entries list to combine each entry with HTML markup.
        // Each entry is displayed in the UI as a link that optionally includes
        // a text summary.
        entries.forEach { entry ->
            append("<p><a href='")
            append(entry.link)
            append("'>" + entry.title + "</a></p>")
            // If the user set the preference to include summary text,
            // adds it to the display.
            if (pref) {
                append(entry.summary)
            }
        }
    }.toString()
}

// Given a string representation of a URL, sets up a connection and gets
// an input stream.
@Throws(IOException::class)
private fun downloadUrl(urlString: String): InputStream? {
    val url = URL(urlString)
    return (url.openConnection() as? HttpURLConnection)?.run {
        readTimeout = 10000
        connectTimeout = 15000
        requestMethod = "GET"
        doInput = true
        // Starts the query.
        connect()
        inputStream
    }
}

Java

// Uploads XML from stackoverflow.com, parses it, and combines it with
// HTML markup. Returns HTML string.
private String loadXmlFromNetwork(String urlString) throws XmlPullParserException, IOException {
    InputStream stream = null;
    // Instantiates the parser.
    StackOverflowXmlParser stackOverflowXmlParser = new StackOverflowXmlParser();
    List<Entry> entries = null;
    String title = null;
    String url = null;
    String summary = null;
    Calendar rightNow = Calendar.getInstance();
    DateFormat formatter = new SimpleDateFormat("MMM dd h:mmaa");

    // Checks whether the user set the preference to include summary text.
    SharedPreferences sharedPrefs = PreferenceManager.getDefaultSharedPreferences(this);
    boolean pref = sharedPrefs.getBoolean("summaryPref", false);

    StringBuilder htmlString = new StringBuilder();
    htmlString.append("<h3>" + getResources().getString(R.string.page_title) + "</h3>");
    htmlString.append("<em>" + getResources().getString(R.string.updated) + " " +
            formatter.format(rightNow.getTime()) + "</em>");

    try {
        stream = downloadUrl(urlString);
        entries = stackOverflowXmlParser.parse(stream);
    // Makes sure that the InputStream is closed after the app is
    // finished using it.
    } finally {
        if (stream != null) {
            stream.close();
        }
     }

    // StackOverflowXmlParser returns a List (called "entries") of Entry objects.
    // Each Entry object represents a single post in the XML feed.
    // This section processes the entries list to combine each entry with HTML markup.
    // Each entry is displayed in the UI as a link that optionally includes
    // a text summary.
    for (Entry entry : entries) {
        htmlString.append("<p><a href='");
        htmlString.append(entry.link);
        htmlString.append("'>" + entry.title + "</a></p>");
        // If the user set the preference to include summary text,
        // adds it to the display.
        if (pref) {
            htmlString.append(entry.summary);
        }
    }
    return htmlString.toString();
}

// Given a string representation of a URL, sets up a connection and gets
// an input stream.
private InputStream downloadUrl(String urlString) throws IOException {
    URL url = new URL(urlString);
    HttpURLConnection conn = (HttpURLConnection) url.openConnection();
    conn.setReadTimeout(10000 /* milliseconds */);
    conn.setConnectTimeout(15000 /* milliseconds */);
    conn.setRequestMethod("GET");
    conn.setDoInput(true);
    // Starts the query.
    conn.connect();
    return conn.getInputStream();
}

解析 XML 数据 通过收藏集保持井然有序 根据您的偏好保存内容并分类。

选择解析器

分析 Feed

实例化解析器

Kotlin

Java

读取 Feed

Kotlin

Java

解析 XML

Kotlin

Java

跳过您不关心的标签

Kotlin

Java

使用 XML 数据

Kotlin

Java

Kotlin

Java

Kotlin

Java

解析 XML 数据