解析 XML 数据

可扩展标记语言 (XML) 是一组用于以机器可读形式编码文档的规则。XML 是一种在互联网上共享数据的常用格式。

经常更新其内容的网站(例如新闻网站或博客)通常会提供 XML 提要,以便外部程序可以随时了解内容变化。上传和解析 XML 数据是联网应用的常见任务。本主题说明如何解析 XML 文档并使用其数据。

要了解有关在您的 Android 应用中创建基于 Web 的内容的更多信息,请参阅 基于 Web 的内容

选择解析器

我们推荐 XmlPullParser,这是一种在 Android 上解析 XML 的高效且易于维护的方法。Android 有此接口的两个实现

任何选择都可以。本节中的示例使用 ExpatPullParserXml.newPullParser()

分析提要

解析提要的第一步是确定您感兴趣的字段。解析器会提取这些字段的数据并忽略其余数据。

请参阅示例应用中已解析 Feed 的以下摘录。每个发往StackOverflow.com 的帖子都会在 Feed 中显示为包含多个嵌套标签的entry标签。

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" ...">
<title type="text">newest questions tagged android - Stack Overflow</title>
...
    <entry>
    ...
    </entry>
    <entry>
        <id>http://stackoverflow.com/q/9439999</id>
        <re:rank scheme="http://stackoverflow.com">0</re:rank>
        <title type="text">Where is my data file?</title>
        <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="android"/>
        <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="file"/>
        <author>
            <name>cliff2310</name>
            <uri>http://stackoverflow.com/users/1128925</uri>
        </author>
        <link rel="alternate" href="http://stackoverflow.com/questions/9439999/where-is-my-data-file" />
        <published>2012-02-25T00:30:54Z</published>
        <updated>2012-02-25T00:30:54Z</updated>
        <summary type="html">
            <p>I have an Application that requires a data file...</p>

        </summary>
    </entry>
    <entry>
    ...
    </entry>
...
</feed>

示例应用提取entry标签及其嵌套标签titlelinksummary的数据。

实例化解析器

解析 Feed 的下一步是实例化解析器并启动解析过程。此代码段初始化一个解析器,使其不处理命名空间,并使用提供的InputStream作为其输入。它通过调用nextTag()启动解析过程,并调用readFeed()方法,该方法提取并处理应用感兴趣的数据。

Kotlin

// We don't use namespaces.
private val ns: String? = null

class StackOverflowXmlParser {

    @Throws(XmlPullParserException::class, IOException::class)
    fun parse(inputStream: InputStream): List<*> {
        inputStream.use { inputStream ->
            val parser: XmlPullParser = Xml.newPullParser()
            parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false)
            parser.setInput(inputStream, null)
            parser.nextTag()
            return readFeed(parser)
        }
    }
 ...
}

Java

public class StackOverflowXmlParser {
    // We don't use namespaces.
    private static final String ns = null;

    public List parse(InputStream in) throws XmlPullParserException, IOException {
        try {
            XmlPullParser parser = Xml.newPullParser();
            parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false);
            parser.setInput(in, null);
            parser.nextTag();
            return readFeed(parser);
        } finally {
            in.close();
        }
    }
 ...
}

读取 Feed

readFeed()方法执行处理 Feed 的实际工作。它查找标记为“entry”的元素作为递归处理 Feed 的起点。如果某个标签不是entry标签,则会跳过它。一旦整个 Feed 被递归处理,readFeed()就会返回一个包含它从 Feed 中提取的条目(包括嵌套数据成员)的List。然后,解析器返回此List

Kotlin

@Throws(XmlPullParserException::class, IOException::class)
private fun readFeed(parser: XmlPullParser): List<Entry> {
    val entries = mutableListOf<Entry>()

    parser.require(XmlPullParser.START_TAG, ns, "feed")
    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.eventType != XmlPullParser.START_TAG) {
            continue
        }
        // Starts by looking for the entry tag.
        if (parser.name == "entry") {
            entries.add(readEntry(parser))
        } else {
            skip(parser)
        }
    }
    return entries
}

Java

private List readFeed(XmlPullParser parser) throws XmlPullParserException, IOException {
    List entries = new ArrayList();

    parser.require(XmlPullParser.START_TAG, ns, "feed");
    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.getEventType() != XmlPullParser.START_TAG) {
            continue;
        }
        String name = parser.getName();
        // Starts by looking for the entry tag.
        if (name.equals("entry")) {
            entries.add(readEntry(parser));
        } else {
            skip(parser);
        }
    }
    return entries;
}

解析 XML

解析 XML Feed 的步骤如下:

  1. 分析 Feed中所述,确定要包含在应用中的标签。此示例提取entry标签及其嵌套标签的数据:titlelinksummary
  2. 创建以下方法:

    • 每个要包含的标签的“读取”方法,例如readEntry()readTitle()。解析器从输入流中读取标签。例如,当它遇到名为entrytitlelinksummary的标签时,它会调用该标签的相应方法。否则,它会跳过该标签。
    • 用于提取每种不同类型标签的数据并使解析器前进到下一个标签的方法。在此示例中,相关方法如下:
      • 对于titlesummary标签,解析器会调用readText()。此方法通过调用parser.getText()提取这些标签的数据。
      • 对于link标签,解析器首先确定链接是否为其感兴趣的类型,然后提取链接的数据。然后,它使用parser.getAttributeValue()提取链接的值。
      • 对于entry标签,解析器调用readEntry()。此方法解析条目的嵌套标签,并返回一个包含数据成员titlelinksummaryEntry对象。
    • 一个递归的辅助skip()方法。有关此主题的更多讨论,请参阅跳过您不关心的标签

此代码段显示解析器如何解析条目、标题、链接和摘要。

Kotlin

data class Entry(val title: String?, val summary: String?, val link: String?)

// Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off
// to their respective "read" methods for processing. Otherwise, skips the tag.
@Throws(XmlPullParserException::class, IOException::class)
private fun readEntry(parser: XmlPullParser): Entry {
    parser.require(XmlPullParser.START_TAG, ns, "entry")
    var title: String? = null
    var summary: String? = null
    var link: String? = null
    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.eventType != XmlPullParser.START_TAG) {
            continue
        }
        when (parser.name) {
            "title" -> title = readTitle(parser)
            "summary" -> summary = readSummary(parser)
            "link" -> link = readLink(parser)
            else -> skip(parser)
        }
    }
    return Entry(title, summary, link)
}

// Processes title tags in the feed.
@Throws(IOException::class, XmlPullParserException::class)
private fun readTitle(parser: XmlPullParser): String {
    parser.require(XmlPullParser.START_TAG, ns, "title")
    val title = readText(parser)
    parser.require(XmlPullParser.END_TAG, ns, "title")
    return title
}

// Processes link tags in the feed.
@Throws(IOException::class, XmlPullParserException::class)
private fun readLink(parser: XmlPullParser): String {
    var link = ""
    parser.require(XmlPullParser.START_TAG, ns, "link")
    val tag = parser.name
    val relType = parser.getAttributeValue(null, "rel")
    if (tag == "link") {
        if (relType == "alternate") {
            link = parser.getAttributeValue(null, "href")
            parser.nextTag()
        }
    }
    parser.require(XmlPullParser.END_TAG, ns, "link")
    return link
}

// Processes summary tags in the feed.
@Throws(IOException::class, XmlPullParserException::class)
private fun readSummary(parser: XmlPullParser): String {
    parser.require(XmlPullParser.START_TAG, ns, "summary")
    val summary = readText(parser)
    parser.require(XmlPullParser.END_TAG, ns, "summary")
    return summary
}

// For the tags title and summary, extracts their text values.
@Throws(IOException::class, XmlPullParserException::class)
private fun readText(parser: XmlPullParser): String {
    var result = ""
    if (parser.next() == XmlPullParser.TEXT) {
        result = parser.text
        parser.nextTag()
    }
    return result
}
...

Java

public static class Entry {
    public final String title;
    public final String link;
    public final String summary;

    private Entry(String title, String summary, String link) {
        this.title = title;
        this.summary = summary;
        this.link = link;
    }
}

// Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off
// to their respective "read" methods for processing. Otherwise, skips the tag.
private Entry readEntry(XmlPullParser parser) throws XmlPullParserException, IOException {
    parser.require(XmlPullParser.START_TAG, ns, "entry");
    String title = null;
    String summary = null;
    String link = null;
    while (parser.next() != XmlPullParser.END_TAG) {
        if (parser.getEventType() != XmlPullParser.START_TAG) {
            continue;
        }
        String name = parser.getName();
        if (name.equals("title")) {
            title = readTitle(parser);
        } else if (name.equals("summary")) {
            summary = readSummary(parser);
        } else if (name.equals("link")) {
            link = readLink(parser);
        } else {
            skip(parser);
        }
    }
    return new Entry(title, summary, link);
}

// Processes title tags in the feed.
private String readTitle(XmlPullParser parser) throws IOException, XmlPullParserException {
    parser.require(XmlPullParser.START_TAG, ns, "title");
    String title = readText(parser);
    parser.require(XmlPullParser.END_TAG, ns, "title");
    return title;
}

// Processes link tags in the feed.
private String readLink(XmlPullParser parser) throws IOException, XmlPullParserException {
    String link = "";
    parser.require(XmlPullParser.START_TAG, ns, "link");
    String tag = parser.getName();
    String relType = parser.getAttributeValue(null, "rel");
    if (tag.equals("link")) {
        if (relType.equals("alternate")){
            link = parser.getAttributeValue(null, "href");
            parser.nextTag();
        }
    }
    parser.require(XmlPullParser.END_TAG, ns, "link");
    return link;
}

// Processes summary tags in the feed.
private String readSummary(XmlPullParser parser) throws IOException, XmlPullParserException {
    parser.require(XmlPullParser.START_TAG, ns, "summary");
    String summary = readText(parser);
    parser.require(XmlPullParser.END_TAG, ns, "summary");
    return summary;
}

// For the tags title and summary, extracts their text values.
private String readText(XmlPullParser parser) throws IOException, XmlPullParserException {
    String result = "";
    if (parser.next() == XmlPullParser.TEXT) {
        result = parser.getText();
        parser.nextTag();
    }
    return result;
}
  ...
}

跳过您不关心的标签

解析器需要跳过它不感兴趣的标签。以下是解析器的skip()方法:

Kotlin

@Throws(XmlPullParserException::class, IOException::class)
private fun skip(parser: XmlPullParser) {
    if (parser.eventType != XmlPullParser.START_TAG) {
        throw IllegalStateException()
    }
    var depth = 1
    while (depth != 0) {
        when (parser.next()) {
            XmlPullParser.END_TAG -> depth--
            XmlPullParser.START_TAG -> depth++
        }
    }
}

Java

private void skip(XmlPullParser parser) throws XmlPullParserException, IOException {
    if (parser.getEventType() != XmlPullParser.START_TAG) {
        throw new IllegalStateException();
    }
    int depth = 1;
    while (depth != 0) {
        switch (parser.next()) {
        case XmlPullParser.END_TAG:
            depth--;
            break;
        case XmlPullParser.START_TAG:
            depth++;
            break;
        }
    }
 }

其工作原理如下:

  • 如果当前事件不是START_TAG,则会抛出异常。
  • 它会使用START_TAG和直到且包括匹配的END_TAG的所有事件。
  • 它跟踪嵌套深度,以确保它在正确的END_TAG处停止,而不是在遇到原始START_TAG后的第一个标签处停止。

因此,如果当前元素具有嵌套元素,则depth的值在解析器使用原始START_TAG与其匹配的END_TAG之间的所有事件之前不会为 0。例如,考虑解析器如何跳过具有 2 个嵌套元素<name><uri><author>元素。

  • 第一次遍历while循环时,解析器在<author>之后遇到的下一个标签是<name>START_TAGdepth的值增加到 2。
  • 第二次遍历while循环时,解析器遇到的下一个标签是END_TAG </name>depth的值减少到 1。
  • 第三次遍历while循环时,解析器遇到的下一个标签是START_TAG <uri>depth的值增加到 2。
  • 第四次遍历while循环时,解析器遇到的下一个标签是END_TAG </uri>depth的值减少到 1。
  • 第五次也是最后一次遍历while循环时,解析器遇到的下一个标签是END_TAG </author>depth的值减少到 0,表明<author>元素已成功跳过。

使用 XML 数据

示例应用程序异步地获取并解析 XML Feed。这将处理从主 UI 线程中移除。处理完成后,应用会在其主活动NetworkActivity中更新 UI。

在以下摘录中,loadPage()方法执行以下操作:

  • 使用 XML Feed 的 URL 初始化字符串变量。
  • 如果用户的设置和网络连接允许,则调用downloadXml(url)方法。此方法下载并解析 Feed,并返回一个要在 UI 中显示的字符串结果。

Kotlin

class NetworkActivity : Activity() {

    companion object {

        const val WIFI = "Wi-Fi"
        const val ANY = "Any"
        const val SO_URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest"
        // Whether there is a Wi-Fi connection.
        private var wifiConnected = false
        // Whether there is a mobile connection.
        private var mobileConnected = false

        // Whether the display should be refreshed.
        var refreshDisplay = true
        // The user's current network preference setting.
        var sPref: String? = null
    }
    ...
    // Asynchronously downloads the XML feed from stackoverflow.com.
    fun loadPage() {

        if (sPref.equals(ANY) && (wifiConnected || mobileConnected)) {
            downloadXml(SO_URL)
        } else if (sPref.equals(WIFI) && wifiConnected) {
            downloadXml(SO_URL)
        } else {
            // Show error.
        }
    }
    ...
}

Java

public class NetworkActivity extends Activity {
    public static final String WIFI = "Wi-Fi";
    public static final String ANY = "Any";
    private static final String URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest";

    // Whether there is a Wi-Fi connection.
    private static boolean wifiConnected = false;
    // Whether there is a mobile connection.
    private static boolean mobileConnected = false;
    // Whether the display should be refreshed.
    public static boolean refreshDisplay = true;
    public static String sPref = null;
    ...
    // Asynchronously downloads the XML feed from stackoverflow.com.
    public void loadPage() {

        if((sPref.equals(ANY)) && (wifiConnected || mobileConnected)) {
            downloadXml(URL);
        }
        else if ((sPref.equals(WIFI)) && (wifiConnected)) {
            downloadXml(URL);
        } else {
            // Show error.
        }
    }

在 Kotlin 中,downloadXml方法调用以下方法:

  • lifecycleScope.launch(Dispatchers.IO),它使用 Kotlin 协程在 IO 线程上启动loadXmlFromNetwork()方法。它将 Feed URL 作为参数传递。loadXmlFromNetwork()方法获取并处理 Feed。完成后,它会传回结果字符串。
  • withContext(Dispatchers.Main),它使用 Kotlin 协程返回主线程,获取返回的字符串,并在 UI 中显示它。

在 Java 编程语言中,过程如下:

  • 一个Executor在后台线程上执行loadXmlFromNetwork()方法。它将 Feed URL 作为参数传递。loadXmlFromNetwork()方法获取并处理 Feed。完成后,它会传回结果字符串。
  • 一个Handler调用post返回主线程,获取返回的字符串,并在 UI 中显示它。

Kotlin

// Implementation of Kotlin coroutines used to download XML feed from stackoverflow.com.
private fun downloadXml(vararg urls: String) {
    var result: String? = null
    lifecycleScope.launch(Dispatchers.IO) {
        result = try {
            loadXmlFromNetwork(urls[0])
        } catch (e: IOException) {
            resources.getString(R.string.connection_error)
        } catch (e: XmlPullParserException) {
            resources.getString(R.string.xml_error)
        }
        withContext(Dispatchers.Main) {
            setContentView(R.layout.main)
            // Displays the HTML string in the UI via a WebView.
            findViewById<WebView>(R.id.webview)?.apply {
                loadData(result?: "", "text/html", null)
            }
        }
    }
}

Java

// Implementation of Executor and Handler used to download XML feed asynchronously from stackoverflow.com.
private void downloadXml(String... urls) {
    ExecutorService executor = Executors.newSingleThreadExecutor();
    Handler handler = new Handler(Looper.getMainLooper());
    executor.execute(() -> {
        String result;
            try {
                result = loadXmlFromNetwork(urls[0]);
            } catch (IOException e) {
                result = getResources().getString(R.string.connection_error);
            } catch (XmlPullParserException e) {
                result = getResources().getString(R.string.xml_error);
            }
        String finalResult = result;
        handler.post(() -> {
            setContentView(R.layout.main);
            // Displays the HTML string in the UI via a WebView.
            WebView myWebView = (WebView) findViewById(R.id.webview);
            myWebView.loadData(finalResult, "text/html", null);
        });
    });
}

downloadXml调用的loadXmlFromNetwork()方法显示在下一个代码段中。它执行以下操作:

  1. 实例化一个StackOverflowXmlParser。它还为Entry对象的Listentries)以及titleurlsummary创建变量,以保存从 XML Feed 中提取的这些字段的值。
  2. 调用downloadUrl(),它获取 Feed 并将其作为InputStream返回。
  3. 使用StackOverflowXmlParser解析InputStreamStackOverflowXmlParser使用Feed中的数据填充entriesList
  4. 处理entries List并将 Feed 数据与 HTML 标记组合。
  5. 返回在主活动 UI 中显示的 HTML 字符串。

Kotlin

// Uploads XML from stackoverflow.com, parses it, and combines it with
// HTML markup. Returns HTML string.
@Throws(XmlPullParserException::class, IOException::class)
private fun loadXmlFromNetwork(urlString: String): String {
    // Checks whether the user set the preference to include summary text.
    val pref: Boolean = PreferenceManager.getDefaultSharedPreferences(this)?.run {
        getBoolean("summaryPref", false)
    } ?: false

    val entries: List<Entry> = downloadUrl(urlString)?.use { stream ->
        // Instantiates the parser.
        StackOverflowXmlParser().parse(stream)
    } ?: emptyList()

    return StringBuilder().apply {
        append("<h3>${resources.getString(R.string.page_title)}</h3>")
        append("<em>${resources.getString(R.string.updated)} ")
        append("${formatter.format(rightNow.time)}</em>")
        // StackOverflowXmlParser returns a List (called "entries") of Entry objects.
        // Each Entry object represents a single post in the XML feed.
        // This section processes the entries list to combine each entry with HTML markup.
        // Each entry is displayed in the UI as a link that optionally includes
        // a text summary.
        entries.forEach { entry ->
            append("<p><a href='")
            append(entry.link)
            append("'>" + entry.title + "</a></p>")
            // If the user set the preference to include summary text,
            // adds it to the display.
            if (pref) {
                append(entry.summary)
            }
        }
    }.toString()
}

// Given a string representation of a URL, sets up a connection and gets
// an input stream.
@Throws(IOException::class)
private fun downloadUrl(urlString: String): InputStream? {
    val url = URL(urlString)
    return (url.openConnection() as? HttpURLConnection)?.run {
        readTimeout = 10000
        connectTimeout = 15000
        requestMethod = "GET"
        doInput = true
        // Starts the query.
        connect()
        inputStream
    }
}

Java

// Uploads XML from stackoverflow.com, parses it, and combines it with
// HTML markup. Returns HTML string.
private String loadXmlFromNetwork(String urlString) throws XmlPullParserException, IOException {
    InputStream stream = null;
    // Instantiates the parser.
    StackOverflowXmlParser stackOverflowXmlParser = new StackOverflowXmlParser();
    List<Entry> entries = null;
    String title = null;
    String url = null;
    String summary = null;
    Calendar rightNow = Calendar.getInstance();
    DateFormat formatter = new SimpleDateFormat("MMM dd h:mmaa");

    // Checks whether the user set the preference to include summary text.
    SharedPreferences sharedPrefs = PreferenceManager.getDefaultSharedPreferences(this);
    boolean pref = sharedPrefs.getBoolean("summaryPref", false);

    StringBuilder htmlString = new StringBuilder();
    htmlString.append("<h3>" + getResources().getString(R.string.page_title) + "</h3>");
    htmlString.append("<em>" + getResources().getString(R.string.updated) + " " +
            formatter.format(rightNow.getTime()) + "</em>");

    try {
        stream = downloadUrl(urlString);
        entries = stackOverflowXmlParser.parse(stream);
    // Makes sure that the InputStream is closed after the app is
    // finished using it.
    } finally {
        if (stream != null) {
            stream.close();
        }
     }

    // StackOverflowXmlParser returns a List (called "entries") of Entry objects.
    // Each Entry object represents a single post in the XML feed.
    // This section processes the entries list to combine each entry with HTML markup.
    // Each entry is displayed in the UI as a link that optionally includes
    // a text summary.
    for (Entry entry : entries) {
        htmlString.append("<p><a href='");
        htmlString.append(entry.link);
        htmlString.append("'>" + entry.title + "</a></p>");
        // If the user set the preference to include summary text,
        // adds it to the display.
        if (pref) {
            htmlString.append(entry.summary);
        }
    }
    return htmlString.toString();
}

// Given a string representation of a URL, sets up a connection and gets
// an input stream.
private InputStream downloadUrl(String urlString) throws IOException {
    URL url = new URL(urlString);
    HttpURLConnection conn = (HttpURLConnection) url.openConnection();
    conn.setReadTimeout(10000 /* milliseconds */);
    conn.setConnectTimeout(15000 /* milliseconds */);
    conn.setRequestMethod("GET");
    conn.setDoInput(true);
    // Starts the query.
    conn.connect();
    return conn.getInputStream();
}