解析 XML 数据

可扩展标记语言 (XML) 是一组用于以机器可读格式编码文档的规则。XML 是一种在互联网上共享数据的常用格式。

经常更新其内容的网站(例如新闻网站或博客)通常会提供 XML 提要,以便外部程序能够及时了解内容更改。上传和解析 XML 数据是网络连接应用的常见任务。本主题说明如何解析 XML 文档并使用其数据。

要详细了解如何在 Android 应用中创建基于 Web 的内容,请参阅 基于 Web 的内容

选择解析器

我们推荐使用 XmlPullParser,这是一种在 Android 上解析 XML 的高效且易于维护的方法。Android 有两种此接口的实现

两种选择都可以。本节中的示例使用 ExpatPullParserXml.newPullParser()

分析提要

解析提要的第一步是确定您感兴趣的字段。解析器会提取这些字段的数据,并忽略其余字段。

请参阅示例应用中已解析提要的以下摘录。发布到 StackOverflow.com 的每个帖子在提要中都显示为一个 entry 标签,其中包含多个嵌套标签

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" ...">
<title type="text">newest questions tagged android - Stack Overflow</title>
...
    <entry>
    ...
    </entry>
    <entry>
        <id>http://stackoverflow.com/q/9439999</id>
        <re:rank scheme="http://stackoverflow.com">0</re:rank>
        <title type="text">Where is my data file?</title>
        <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="android"/>
        <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="file"/>
        <author>
            <name>cliff2310</name>
            <uri>http://stackoverflow.com/users/1128925</uri>
        </author>
        <link rel="alternate" href="http://stackoverflow.com/questions/9439999/where-is-my-data-file" />
        <published>2012-02-25T00:30:54Z</published>
        <updated>2012-02-25T00:30:54Z</updated>
        <summary type="html">
            <p>I have an Application that requires a data file...</p>

        </summary>
    </entry>
    <entry>
    ...
    </entry>
...
</feed>

示例应用提取 entry 标签及其嵌套标签 titlelinksummary 的数据。

实例化解析器

解析提要的下一步是实例化解析器并启动解析过程。此代码段初始化一个解析器,使其不处理命名空间,并使用提供的 InputStream 作为其输入。它使用对 nextTag() 的调用启动解析过程,并调用 readFeed() 方法,该方法提取并处理应用感兴趣的数据

Kotlin

// We don't use namespaces.
private val ns: String? = null

class StackOverflowXmlParser {

    @Throws(XmlPullParserException::class, IOException::class)
    fun parse(inputStream: InputStream): List<*> {
        inputStream.use { inputStream ->
            val parser: XmlPullParser = Xml.newPullParser()
            parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false)
            parser.setInput(inputStream, null)
            parser.nextTag()
            return readFeed(parser)
        }
    }
 ...
}

Java

public class StackOverflowXmlParser {
    // We don't use namespaces.
    private static final String ns = null;

    public List parse(InputStream in) throws XmlPullParserException, IOException {
        try {
            XmlPullParser parser = Xml.newPullParser();
            parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false);
            parser.setInput(in, null);
            parser.nextTag();
            return readFeed(parser);
        } finally {
            in.close();
        }
    }
 ...
}

读取提要

readFeed() 方法执行处理 Feed 的实际工作。它将带“entry”标签的元素作为递归处理 Feed 的起点。如果某个标签不是 entry 标签,则跳过它。递归处理完整个 Feed 后,readFeed() 返回一个包含从 Feed 中提取的条目(包括嵌套数据成员)的 List。然后,解析器返回此 List

Kotlin

@Throws(XmlPullParserException::class, IOException::class)
private fun readFeed(parser: XmlPullParser): List<Entry> {
    val entries = mutableListOf<Entry>()

    parser.require(XmlPullParser.START_TAG, ns, "feed")
    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.eventType != XmlPullParser.START_TAG) {
            continue
        }
        // Starts by looking for the entry tag.
        if (parser.name == "entry") {
            entries.add(readEntry(parser))
        } else {
            skip(parser)
        }
    }
    return entries
}

Java

private List readFeed(XmlPullParser parser) throws XmlPullParserException, IOException {
    List entries = new ArrayList();

    parser.require(XmlPullParser.START_TAG, ns, "feed");
    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.getEventType() != XmlPullParser.START_TAG) {
            continue;
        }
        String name = parser.getName();
        // Starts by looking for the entry tag.
        if (name.equals("entry")) {
            entries.add(readEntry(parser));
        } else {
            skip(parser);
        }
    }
    return entries;
}

解析 XML

解析 XML Feed 的步骤如下所示

  1. 分析 Feed 中所述,识别您希望在应用中包含的标签。此示例提取 entry 标签及其嵌套标签(titlelinksummary)的数据。
  2. 创建以下方法

    • 要包含的每个标签的“读取”方法,例如 readEntry()readTitle()。解析器从输入流读取标签。当它遇到名为(在此示例中为)entrytitlelinksummary 的标签时,它会调用该标签的相应方法。否则,它会跳过该标签。
    • 提取每种不同类型标签的数据并使解析器前进到下一个标签的方法。在此示例中,相关方法如下所示
      • 对于 titlesummary 标签,解析器会调用 readText()。此方法通过调用 parser.getText() 提取这些标签的数据。
      • 对于 link 标签,解析器首先确定链接是否是它感兴趣的类型,然后通过调用 parser.getAttributeValue() 提取链接的值来提取链接数据。
      • 对于 entry 标签,解析器会调用 readEntry()。此方法会解析条目的嵌套标签,并返回一个包含数据成员 titlelinksummaryEntry 对象。
    • 一个名为 skip() 的递归辅助方法。有关此主题的更多讨论,请参阅 跳过您不关心的标签

此代码片段显示了解析器如何解析条目、标题、链接和摘要。

Kotlin

data class Entry(val title: String?, val summary: String?, val link: String?)

// Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off
// to their respective "read" methods for processing. Otherwise, skips the tag.
@Throws(XmlPullParserException::class, IOException::class)
private fun readEntry(parser: XmlPullParser): Entry {
    parser.require(XmlPullParser.START_TAG, ns, "entry")
    var title: String? = null
    var summary: String? = null
    var link: String? = null
    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.eventType != XmlPullParser.START_TAG) {
            continue
        }
        when (parser.name) {
            "title" -> title = readTitle(parser)
            "summary" -> summary = readSummary(parser)
            "link" -> link = readLink(parser)
            else -> skip(parser)
        }
    }
    return Entry(title, summary, link)
}

// Processes title tags in the feed.
@Throws(IOException::class, XmlPullParserException::class)
private fun readTitle(parser: XmlPullParser): String {
    parser.require(XmlPullParser.START_TAG, ns, "title")
    val title = readText(parser)
    parser.require(XmlPullParser.END_TAG, ns, "title")
    return title
}

// Processes link tags in the feed.
@Throws(IOException::class, XmlPullParserException::class)
private fun readLink(parser: XmlPullParser): String {
    var link = ""
    parser.require(XmlPullParser.START_TAG, ns, "link")
    val tag = parser.name
    val relType = parser.getAttributeValue(null, "rel")
    if (tag == "link") {
        if (relType == "alternate") {
            link = parser.getAttributeValue(null, "href")
            parser.nextTag()
        }
    }
    parser.require(XmlPullParser.END_TAG, ns, "link")
    return link
}

// Processes summary tags in the feed.
@Throws(IOException::class, XmlPullParserException::class)
private fun readSummary(parser: XmlPullParser): String {
    parser.require(XmlPullParser.START_TAG, ns, "summary")
    val summary = readText(parser)
    parser.require(XmlPullParser.END_TAG, ns, "summary")
    return summary
}

// For the tags title and summary, extracts their text values.
@Throws(IOException::class, XmlPullParserException::class)
private fun readText(parser: XmlPullParser): String {
    var result = ""
    if (parser.next() == XmlPullParser.TEXT) {
        result = parser.text
        parser.nextTag()
    }
    return result
}
...

Java

public static class Entry {
    public final String title;
    public final String link;
    public final String summary;

    private Entry(String title, String summary, String link) {
        this.title = title;
        this.summary = summary;
        this.link = link;
    }
}

// Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off
// to their respective "read" methods for processing. Otherwise, skips the tag.
private Entry readEntry(XmlPullParser parser) throws XmlPullParserException, IOException {
    parser.require(XmlPullParser.START_TAG, ns, "entry");
    String title = null;
    String summary = null;
    String link = null;
    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.getEventType() != XmlPullParser.START_TAG) {
            continue;
        }
        String name = parser.getName();
        if (name.equals("title")) {
            title = readTitle(parser);
        } else if (name.equals("summary")) {
            summary = readSummary(parser);
        } else if (name.equals("link")) {
            link = readLink(parser);
        } else {
            skip(parser);
        }
    }
    return new Entry(title, summary, link);
}

// Processes title tags in the feed.
private String readTitle(XmlPullParser parser) throws IOException, XmlPullParserException {
    parser.require(XmlPullParser.START_TAG, ns, "title");
    String title = readText(parser);
    parser.require(XmlPullParser.END_TAG, ns, "title");
    return title;
}

// Processes link tags in the feed.
private String readLink(XmlPullParser parser) throws IOException, XmlPullParserException {
    String link = "";
    parser.require(XmlPullParser.START_TAG, ns, "link");
    String tag = parser.getName();
    String relType = parser.getAttributeValue(null, "rel");
    if (tag.equals("link")) {
        if (relType.equals("alternate")){
            link = parser.getAttributeValue(null, "href");
            parser.nextTag();
        }
    }
    parser.require(XmlPullParser.END_TAG, ns, "link");
    return link;
}

// Processes summary tags in the feed.
private String readSummary(XmlPullParser parser) throws IOException, XmlPullParserException {
    parser.require(XmlPullParser.START_TAG, ns, "summary");
    String summary = readText(parser);
    parser.require(XmlPullParser.END_TAG, ns, "summary");
    return summary;
}

// For the tags title and summary, extracts their text values.
private String readText(XmlPullParser parser) throws IOException, XmlPullParserException {
    String result = "";
    if (parser.next() == XmlPullParser.TEXT) {
        result = parser.getText();
        parser.nextTag();
    }
    return result;
}
  ...
}

跳过您不关心的标签

解析器需要跳过它不感兴趣的标签。以下是解析器的 skip() 方法

Kotlin

@Throws(XmlPullParserException::class, IOException::class)
private fun skip(parser: XmlPullParser) {
    if (parser.eventType != XmlPullParser.START_TAG) {
        throw IllegalStateException()
    }
    var depth = 1
    while (depth != 0) {
        when (parser.next()) {
            XmlPullParser.END_TAG -> depth--
            XmlPullParser.START_TAG -> depth++
        }
    }
}

Java

private void skip(XmlPullParser parser) throws XmlPullParserException, IOException {
    if (parser.getEventType() != XmlPullParser.START_TAG) {
        throw new IllegalStateException();
    }
    int depth = 1;
    while (depth != 0) {
        switch (parser.next()) {
        case XmlPullParser.END_TAG:
            depth--;
            break;
        case XmlPullParser.START_TAG:
            depth++;
            break;
        }
    }
 }

其工作原理如下

  • 如果当前事件不是 START_TAG,则引发异常。
  • 它会使用 START_TAG 和所有事件(直至并包括匹配的 END_TAG)。
  • 它会跟踪嵌套深度,以确保它在正确的 END_TAG 处停止,而不是在遇到原始 START_TAG 后的第一个标签处停止。

因此,如果当前元素具有嵌套元素,则在解析器使用原始 START_TAG 和其匹配的 END_TAG 之间的所有事件之前,depth 的值不会为 0。例如,请考虑解析器如何跳过具有 2 个嵌套元素(<name><uri>)的 <author> 元素。

  • 第一次遍历 while 循环时,解析器在 <author> 后遇到的下一个标签是 <name>START_TAGdepth 的值增至 2。
  • 第二次遍历 while 循环时,解析器遇到的下一个标签是 END_TAG </name>depth 的值减至 1。
  • 第三次遍历 while 循环时,解析器遇到的下一个标签是 START_TAG <uri>depth 的值增至 2。
  • 第四次遍历 while 循环时,解析器遇到的下一个标签是 END_TAG </uri>depth 的值减至 1。
  • 第五次(也是最后一次)遍历 while 循环时,解析器遇到的下一个标签是 END_TAG </author>depth 的值减至 0,表示已成功跳过 <author> 元素。

使用 XML 数据

示例应用会异步获取和解析 XML Feed。这会将处理从主 UI 线程中移除。处理完成后,应用会在其主活动 NetworkActivity 中更新 UI。

在以下摘录中,loadPage() 方法执行以下操作

  • 使用 XML Feed 的 URL 初始化字符串变量。
  • 如果用户的设置和网络连接允许,则调用 downloadXml(url) 方法。此方法下载并解析 Feed,并将字符串结果返回以在 UI 中显示。

Kotlin

class NetworkActivity : Activity() {

    companion object {

        const val WIFI = "Wi-Fi"
        const val ANY = "Any"
        const val SO_URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest"
        // Whether there is a Wi-Fi connection.
        private var wifiConnected = false
        // Whether there is a mobile connection.
        private var mobileConnected = false

        // Whether the display should be refreshed.
        var refreshDisplay = true
        // The user's current network preference setting.
        var sPref: String? = null
    }
    ...
    // Asynchronously downloads the XML feed from stackoverflow.com.
    fun loadPage() {

        if (sPref.equals(ANY) && (wifiConnected || mobileConnected)) {
            downloadXml(SO_URL)
        } else if (sPref.equals(WIFI) && wifiConnected) {
            downloadXml(SO_URL)
        } else {
            // Show error.
        }
    }
    ...
}

Java

public class NetworkActivity extends Activity {
    public static final String WIFI = "Wi-Fi";
    public static final String ANY = "Any";
    private static final String URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest";

    // Whether there is a Wi-Fi connection.
    private static boolean wifiConnected = false;
    // Whether there is a mobile connection.
    private static boolean mobileConnected = false;
    // Whether the display should be refreshed.
    public static boolean refreshDisplay = true;
    public static String sPref = null;
    ...
    // Asynchronously downloads the XML feed from stackoverflow.com.
    public void loadPage() {

        if((sPref.equals(ANY)) && (wifiConnected || mobileConnected)) {
            downloadXml(URL);
        }
        else if ((sPref.equals(WIFI)) && (wifiConnected)) {
            downloadXml(URL);
        } else {
            // Show error.
        }
    }

在 Kotlin 中,downloadXml 方法会调用以下方法

  • lifecycleScope.launch(Dispatchers.IO) 使用 Kotlin 协程 在 IO 线程上启动方法 loadXmlFromNetwork()。它将 Feed URL 作为参数传递。方法 loadXmlFromNetwork() 获取并处理 Feed。完成后,它会传回结果字符串。
  • withContext(Dispatchers.Main) 使用 Kotlin 协程返回到主线程,获取返回的字符串,并在 UI 中显示它。

在 Java 编程语言中,流程如下所示

  • Executor 在后台线程上执行方法 loadXmlFromNetwork()。它将 Feed URL 作为参数传递。方法 loadXmlFromNetwork() 获取并处理 Feed。完成后,它会传回结果字符串。
  • Handler 调用 post 返回到主线程,获取返回的字符串,并在 UI 中显示它。

Kotlin

// Implementation of Kotlin coroutines used to download XML feed from stackoverflow.com.
private fun downloadXml(vararg urls: String) {
    var result: String? = null
    lifecycleScope.launch(Dispatchers.IO) {
        result = try {
            loadXmlFromNetwork(urls[0])
        } catch (e: IOException) {
            resources.getString(R.string.connection_error)
        } catch (e: XmlPullParserException) {
            resources.getString(R.string.xml_error)
        }
        withContext(Dispatchers.Main) {
            setContentView(R.layout.main)
            // Displays the HTML string in the UI via a WebView.
            findViewById<WebView>(R.id.webview)?.apply {
                loadData(result?: "", "text/html", null)
            }
        }
    }
}

Java

// Implementation of Executor and Handler used to download XML feed asynchronously from stackoverflow.com.
private void downloadXml(String... urls) {
    ExecutorService executor = Executors.newSingleThreadExecutor();
    Handler handler = new Handler(Looper.getMainLooper());
    executor.execute(() -> {
        String result;
            try {
                result = loadXmlFromNetwork(urls[0]);
            } catch (IOException e) {
                result = getResources().getString(R.string.connection_error);
            } catch (XmlPullParserException e) {
                result = getResources().getString(R.string.xml_error);
            }
        String finalResult = result;
        handler.post(() -> {
            setContentView(R.layout.main);
            // Displays the HTML string in the UI via a WebView.
            WebView myWebView = (WebView) findViewById(R.id.webview);
            myWebView.loadData(finalResult, "text/html", null);
        });
    });
}

downloadXml 调用 loadXmlFromNetwork() 方法,如下一段代码所示。它执行以下操作

  1. 实例化一个 StackOverflowXmlParser。它还会为 Entry 对象的 Listentries)以及 titleurlsummary 创建变量,以保存从 XML Feed 中为这些字段提取的值。
  2. 调用 downloadUrl(),它获取 Feed 并将其作为 InputStream 返回。
  3. 使用 StackOverflowXmlParser 解析 InputStreamStackOverflowXmlParser 使用 Feed 数据填充 entriesList
  4. 处理 entries List 并将 Feed 数据与 HTML 标记结合。
  5. 返回一个在主活动 UI 中显示的 HTML 字符串。

Kotlin

// Uploads XML from stackoverflow.com, parses it, and combines it with
// HTML markup. Returns HTML string.
@Throws(XmlPullParserException::class, IOException::class)
private fun loadXmlFromNetwork(urlString: String): String {
    // Checks whether the user set the preference to include summary text.
    val pref: Boolean = PreferenceManager.getDefaultSharedPreferences(this)?.run {
        getBoolean("summaryPref", false)
    } ?: false

    val entries: List<Entry> = downloadUrl(urlString)?.use { stream ->
        // Instantiates the parser.
        StackOverflowXmlParser().parse(stream)
    } ?: emptyList()

    return StringBuilder().apply {
        append("<h3>${resources.getString(R.string.page_title)}</h3>")
        append("<em>${resources.getString(R.string.updated)} ")
        append("${formatter.format(rightNow.time)}</em>")
        // StackOverflowXmlParser returns a List (called "entries") of Entry objects.
        // Each Entry object represents a single post in the XML feed.
        // This section processes the entries list to combine each entry with HTML markup.
        // Each entry is displayed in the UI as a link that optionally includes
        // a text summary.
        entries.forEach { entry ->
            append("<p><a href='")
            append(entry.link)
            append("'>" + entry.title + "</a></p>")
            // If the user set the preference to include summary text,
            // adds it to the display.
            if (pref) {
                append(entry.summary)
            }
        }
    }.toString()
}

// Given a string representation of a URL, sets up a connection and gets
// an input stream.
@Throws(IOException::class)
private fun downloadUrl(urlString: String): InputStream? {
    val url = URL(urlString)
    return (url.openConnection() as? HttpURLConnection)?.run {
        readTimeout = 10000
        connectTimeout = 15000
        requestMethod = "GET"
        doInput = true
        // Starts the query.
        connect()
        inputStream
    }
}

Java

// Uploads XML from stackoverflow.com, parses it, and combines it with
// HTML markup. Returns HTML string.
private String loadXmlFromNetwork(String urlString) throws XmlPullParserException, IOException {
    InputStream stream = null;
    // Instantiates the parser.
    StackOverflowXmlParser stackOverflowXmlParser = new StackOverflowXmlParser();
    List<Entry> entries = null;
    String title = null;
    String url = null;
    String summary = null;
    Calendar rightNow = Calendar.getInstance();
    DateFormat formatter = new SimpleDateFormat("MMM dd h:mmaa");

    // Checks whether the user set the preference to include summary text.
    SharedPreferences sharedPrefs = PreferenceManager.getDefaultSharedPreferences(this);
    boolean pref = sharedPrefs.getBoolean("summaryPref", false);

    StringBuilder htmlString = new StringBuilder();
    htmlString.append("<h3>" + getResources().getString(R.string.page_title) + "</h3>");
    htmlString.append("<em>" + getResources().getString(R.string.updated) + " " +
            formatter.format(rightNow.getTime()) + "</em>");

    try {
        stream = downloadUrl(urlString);
        entries = stackOverflowXmlParser.parse(stream);
    // Makes sure that the InputStream is closed after the app is
    // finished using it.
    } finally {
        if (stream != null) {
            stream.close();
        }
     }

    // StackOverflowXmlParser returns a List (called "entries") of Entry objects.
    // Each Entry object represents a single post in the XML feed.
    // This section processes the entries list to combine each entry with HTML markup.
    // Each entry is displayed in the UI as a link that optionally includes
    // a text summary.
    for (Entry entry : entries) {
        htmlString.append("<p><a href='");
        htmlString.append(entry.link);
        htmlString.append("'>" + entry.title + "</a></p>");
        // If the user set the preference to include summary text,
        // adds it to the display.
        if (pref) {
            htmlString.append(entry.summary);
        }
    }
    return htmlString.toString();
}

// Given a string representation of a URL, sets up a connection and gets
// an input stream.
private InputStream downloadUrl(String urlString) throws IOException {
    URL url = new URL(urlString);
    HttpURLConnection conn = (HttpURLConnection) url.openConnection();
    conn.setReadTimeout(10000 /* milliseconds */);
    conn.setConnectTimeout(15000 /* milliseconds */);
    conn.setRequestMethod("GET");
    conn.setDoInput(true);
    // Starts the query.
    conn.connect();
    return conn.getInputStream();
}