可扩展标记语言 (XML) 是一组用于以机器可读格式编码文档的规则。XML 是一种在互联网上共享数据的常用格式。
经常更新其内容的网站(例如新闻网站或博客)通常会提供 XML 提要,以便外部程序能够及时了解内容更改。上传和解析 XML 数据是网络连接应用的常见任务。本主题说明如何解析 XML 文档并使用其数据。
要详细了解如何在 Android 应用中创建基于 Web 的内容,请参阅 基于 Web 的内容。
选择解析器
我们推荐使用 XmlPullParser
,这是一种在 Android 上解析 XML 的高效且易于维护的方法。Android 有两种此接口的实现
KXmlParser
,使用XmlPullParserFactory.newPullParser()
ExpatPullParser
,使用Xml.newPullParser()
两种选择都可以。本节中的示例使用 ExpatPullParser
和 Xml.newPullParser()
。
分析提要
解析提要的第一步是确定您感兴趣的字段。解析器会提取这些字段的数据,并忽略其余字段。
请参阅示例应用中已解析提要的以下摘录。发布到 StackOverflow.com 的每个帖子在提要中都显示为一个 entry
标签,其中包含多个嵌套标签
<?xml version="1.0" encoding="utf-8"?> <feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" ..."> <title type="text">newest questions tagged android - Stack Overflow</title> ... <entry> ... </entry> <entry> <id>http://stackoverflow.com/q/9439999</id> <re:rank scheme="http://stackoverflow.com">0</re:rank> <title type="text">Where is my data file?</title> <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="android"/> <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="file"/> <author> <name>cliff2310</name> <uri>http://stackoverflow.com/users/1128925</uri> </author> <link rel="alternate" href="http://stackoverflow.com/questions/9439999/where-is-my-data-file" /> <published>2012-02-25T00:30:54Z</published> <updated>2012-02-25T00:30:54Z</updated> <summary type="html"> <p>I have an Application that requires a data file...</p> </summary> </entry> <entry> ... </entry> ... </feed>
示例应用提取 entry
标签及其嵌套标签 title
、link
和 summary
的数据。
实例化解析器
解析提要的下一步是实例化解析器并启动解析过程。此代码段初始化一个解析器,使其不处理命名空间,并使用提供的 InputStream
作为其输入。它使用对 nextTag()
的调用启动解析过程,并调用 readFeed()
方法,该方法提取并处理应用感兴趣的数据
Kotlin
// We don't use namespaces. private val ns: String? = null class StackOverflowXmlParser { @Throws(XmlPullParserException::class, IOException::class) fun parse(inputStream: InputStream): List<*> { inputStream.use { inputStream -> val parser: XmlPullParser = Xml.newPullParser() parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false) parser.setInput(inputStream, null) parser.nextTag() return readFeed(parser) } } ... }
Java
public class StackOverflowXmlParser { // We don't use namespaces. private static final String ns = null; public List parse(InputStream in) throws XmlPullParserException, IOException { try { XmlPullParser parser = Xml.newPullParser(); parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false); parser.setInput(in, null); parser.nextTag(); return readFeed(parser); } finally { in.close(); } } ... }
读取提要
readFeed()
方法执行处理 Feed 的实际工作。它将带“entry”标签的元素作为递归处理 Feed 的起点。如果某个标签不是 entry
标签,则跳过它。递归处理完整个 Feed 后,readFeed()
返回一个包含从 Feed 中提取的条目(包括嵌套数据成员)的 List
。然后,解析器返回此 List
。
Kotlin
@Throws(XmlPullParserException::class, IOException::class) private fun readFeed(parser: XmlPullParser): List<Entry> { val entries = mutableListOf<Entry>() parser.require(XmlPullParser.START_TAG, ns, "feed") while (parser.next() != XmlPullParser.END_TAG) { if (parser.eventType != XmlPullParser.START_TAG) { continue } // Starts by looking for the entry tag. if (parser.name == "entry") { entries.add(readEntry(parser)) } else { skip(parser) } } return entries }
Java
private List readFeed(XmlPullParser parser) throws XmlPullParserException, IOException { List entries = new ArrayList(); parser.require(XmlPullParser.START_TAG, ns, "feed"); while (parser.next() != XmlPullParser.END_TAG) { if (parser.getEventType() != XmlPullParser.START_TAG) { continue; } String name = parser.getName(); // Starts by looking for the entry tag. if (name.equals("entry")) { entries.add(readEntry(parser)); } else { skip(parser); } } return entries; }
解析 XML
解析 XML Feed 的步骤如下所示
- 如 分析 Feed 中所述,识别您希望在应用中包含的标签。此示例提取
entry
标签及其嵌套标签(title
、link
和summary
)的数据。 - 创建以下方法
- 要包含的每个标签的“读取”方法,例如
readEntry()
和readTitle()
。解析器从输入流读取标签。当它遇到名为(在此示例中为)entry
、title
、link
或summary
的标签时,它会调用该标签的相应方法。否则,它会跳过该标签。 - 提取每种不同类型标签的数据并使解析器前进到下一个标签的方法。在此示例中,相关方法如下所示
- 对于
title
和summary
标签,解析器会调用readText()
。此方法通过调用parser.getText()
提取这些标签的数据。 - 对于
link
标签,解析器首先确定链接是否是它感兴趣的类型,然后通过调用parser.getAttributeValue()
提取链接的值来提取链接数据。 - 对于
entry
标签,解析器会调用readEntry()
。此方法会解析条目的嵌套标签,并返回一个包含数据成员title
、link
和summary
的Entry
对象。
- 对于
- 一个名为
skip()
的递归辅助方法。有关此主题的更多讨论,请参阅 跳过您不关心的标签。
- 要包含的每个标签的“读取”方法,例如
此代码片段显示了解析器如何解析条目、标题、链接和摘要。
Kotlin
data class Entry(val title: String?, val summary: String?, val link: String?) // Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off // to their respective "read" methods for processing. Otherwise, skips the tag. @Throws(XmlPullParserException::class, IOException::class) private fun readEntry(parser: XmlPullParser): Entry { parser.require(XmlPullParser.START_TAG, ns, "entry") var title: String? = null var summary: String? = null var link: String? = null while (parser.next() != XmlPullParser.END_TAG) { if (parser.eventType != XmlPullParser.START_TAG) { continue } when (parser.name) { "title" -> title = readTitle(parser) "summary" -> summary = readSummary(parser) "link" -> link = readLink(parser) else -> skip(parser) } } return Entry(title, summary, link) } // Processes title tags in the feed. @Throws(IOException::class, XmlPullParserException::class) private fun readTitle(parser: XmlPullParser): String { parser.require(XmlPullParser.START_TAG, ns, "title") val title = readText(parser) parser.require(XmlPullParser.END_TAG, ns, "title") return title } // Processes link tags in the feed. @Throws(IOException::class, XmlPullParserException::class) private fun readLink(parser: XmlPullParser): String { var link = "" parser.require(XmlPullParser.START_TAG, ns, "link") val tag = parser.name val relType = parser.getAttributeValue(null, "rel") if (tag == "link") { if (relType == "alternate") { link = parser.getAttributeValue(null, "href") parser.nextTag() } } parser.require(XmlPullParser.END_TAG, ns, "link") return link } // Processes summary tags in the feed. @Throws(IOException::class, XmlPullParserException::class) private fun readSummary(parser: XmlPullParser): String { parser.require(XmlPullParser.START_TAG, ns, "summary") val summary = readText(parser) parser.require(XmlPullParser.END_TAG, ns, "summary") return summary } // For the tags title and summary, extracts their text values. @Throws(IOException::class, XmlPullParserException::class) private fun readText(parser: XmlPullParser): String { var result = "" if (parser.next() == XmlPullParser.TEXT) { result = parser.text parser.nextTag() } return result } ...
Java
public static class Entry { public final String title; public final String link; public final String summary; private Entry(String title, String summary, String link) { this.title = title; this.summary = summary; this.link = link; } } // Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off // to their respective "read" methods for processing. Otherwise, skips the tag. private Entry readEntry(XmlPullParser parser) throws XmlPullParserException, IOException { parser.require(XmlPullParser.START_TAG, ns, "entry"); String title = null; String summary = null; String link = null; while (parser.next() != XmlPullParser.END_TAG) { if (parser.getEventType() != XmlPullParser.START_TAG) { continue; } String name = parser.getName(); if (name.equals("title")) { title = readTitle(parser); } else if (name.equals("summary")) { summary = readSummary(parser); } else if (name.equals("link")) { link = readLink(parser); } else { skip(parser); } } return new Entry(title, summary, link); } // Processes title tags in the feed. private String readTitle(XmlPullParser parser) throws IOException, XmlPullParserException { parser.require(XmlPullParser.START_TAG, ns, "title"); String title = readText(parser); parser.require(XmlPullParser.END_TAG, ns, "title"); return title; } // Processes link tags in the feed. private String readLink(XmlPullParser parser) throws IOException, XmlPullParserException { String link = ""; parser.require(XmlPullParser.START_TAG, ns, "link"); String tag = parser.getName(); String relType = parser.getAttributeValue(null, "rel"); if (tag.equals("link")) { if (relType.equals("alternate")){ link = parser.getAttributeValue(null, "href"); parser.nextTag(); } } parser.require(XmlPullParser.END_TAG, ns, "link"); return link; } // Processes summary tags in the feed. private String readSummary(XmlPullParser parser) throws IOException, XmlPullParserException { parser.require(XmlPullParser.START_TAG, ns, "summary"); String summary = readText(parser); parser.require(XmlPullParser.END_TAG, ns, "summary"); return summary; } // For the tags title and summary, extracts their text values. private String readText(XmlPullParser parser) throws IOException, XmlPullParserException { String result = ""; if (parser.next() == XmlPullParser.TEXT) { result = parser.getText(); parser.nextTag(); } return result; } ... }
跳过您不关心的标签
解析器需要跳过它不感兴趣的标签。以下是解析器的 skip()
方法
Kotlin
@Throws(XmlPullParserException::class, IOException::class) private fun skip(parser: XmlPullParser) { if (parser.eventType != XmlPullParser.START_TAG) { throw IllegalStateException() } var depth = 1 while (depth != 0) { when (parser.next()) { XmlPullParser.END_TAG -> depth-- XmlPullParser.START_TAG -> depth++ } } }
Java
private void skip(XmlPullParser parser) throws XmlPullParserException, IOException { if (parser.getEventType() != XmlPullParser.START_TAG) { throw new IllegalStateException(); } int depth = 1; while (depth != 0) { switch (parser.next()) { case XmlPullParser.END_TAG: depth--; break; case XmlPullParser.START_TAG: depth++; break; } } }
其工作原理如下
- 如果当前事件不是
START_TAG
,则引发异常。 - 它会使用
START_TAG
和所有事件(直至并包括匹配的END_TAG
)。 - 它会跟踪嵌套深度,以确保它在正确的
END_TAG
处停止,而不是在遇到原始START_TAG
后的第一个标签处停止。
因此,如果当前元素具有嵌套元素,则在解析器使用原始 START_TAG
和其匹配的 END_TAG
之间的所有事件之前,depth
的值不会为 0。例如,请考虑解析器如何跳过具有 2 个嵌套元素(<name>
和 <uri>
)的 <author>
元素。
- 第一次遍历
while
循环时,解析器在<author>
后遇到的下一个标签是<name>
的START_TAG
。depth
的值增至 2。 - 第二次遍历
while
循环时,解析器遇到的下一个标签是END_TAG
</name>
。depth
的值减至 1。 - 第三次遍历
while
循环时,解析器遇到的下一个标签是START_TAG
<uri>
。depth
的值增至 2。 - 第四次遍历
while
循环时,解析器遇到的下一个标签是END_TAG
</uri>
。depth
的值减至 1。 - 第五次(也是最后一次)遍历
while
循环时,解析器遇到的下一个标签是END_TAG
</author>
。depth
的值减至 0,表示已成功跳过<author>
元素。
使用 XML 数据
示例应用会异步获取和解析 XML Feed。这会将处理从主 UI 线程中移除。处理完成后,应用会在其主活动 NetworkActivity
中更新 UI。
在以下摘录中,loadPage()
方法执行以下操作
- 使用 XML Feed 的 URL 初始化字符串变量。
- 如果用户的设置和网络连接允许,则调用
downloadXml(url)
方法。此方法下载并解析 Feed,并将字符串结果返回以在 UI 中显示。
Kotlin
class NetworkActivity : Activity() { companion object { const val WIFI = "Wi-Fi" const val ANY = "Any" const val SO_URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest" // Whether there is a Wi-Fi connection. private var wifiConnected = false // Whether there is a mobile connection. private var mobileConnected = false // Whether the display should be refreshed. var refreshDisplay = true // The user's current network preference setting. var sPref: String? = null } ... // Asynchronously downloads the XML feed from stackoverflow.com. fun loadPage() { if (sPref.equals(ANY) && (wifiConnected || mobileConnected)) { downloadXml(SO_URL) } else if (sPref.equals(WIFI) && wifiConnected) { downloadXml(SO_URL) } else { // Show error. } } ... }
Java
public class NetworkActivity extends Activity { public static final String WIFI = "Wi-Fi"; public static final String ANY = "Any"; private static final String URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest"; // Whether there is a Wi-Fi connection. private static boolean wifiConnected = false; // Whether there is a mobile connection. private static boolean mobileConnected = false; // Whether the display should be refreshed. public static boolean refreshDisplay = true; public static String sPref = null; ... // Asynchronously downloads the XML feed from stackoverflow.com. public void loadPage() { if((sPref.equals(ANY)) && (wifiConnected || mobileConnected)) { downloadXml(URL); } else if ((sPref.equals(WIFI)) && (wifiConnected)) { downloadXml(URL); } else { // Show error. } }
在 Kotlin 中,downloadXml
方法会调用以下方法
lifecycleScope.launch(Dispatchers.IO)
使用 Kotlin 协程 在 IO 线程上启动方法loadXmlFromNetwork()
。它将 Feed URL 作为参数传递。方法loadXmlFromNetwork()
获取并处理 Feed。完成后,它会传回结果字符串。withContext(Dispatchers.Main)
使用 Kotlin 协程返回到主线程,获取返回的字符串,并在 UI 中显示它。
在 Java 编程语言中,流程如下所示
Executor
在后台线程上执行方法loadXmlFromNetwork()
。它将 Feed URL 作为参数传递。方法loadXmlFromNetwork()
获取并处理 Feed。完成后,它会传回结果字符串。Handler
调用post
返回到主线程,获取返回的字符串,并在 UI 中显示它。
Kotlin
// Implementation of Kotlin coroutines used to download XML feed from stackoverflow.com. private fun downloadXml(vararg urls: String) { var result: String? = null lifecycleScope.launch(Dispatchers.IO) { result = try { loadXmlFromNetwork(urls[0]) } catch (e: IOException) { resources.getString(R.string.connection_error) } catch (e: XmlPullParserException) { resources.getString(R.string.xml_error) } withContext(Dispatchers.Main) { setContentView(R.layout.main) // Displays the HTML string in the UI via a WebView. findViewById<WebView>(R.id.webview)?.apply { loadData(result?: "", "text/html", null) } } } }
Java
// Implementation of Executor and Handler used to download XML feed asynchronously from stackoverflow.com. private void downloadXml(String... urls) { ExecutorService executor = Executors.newSingleThreadExecutor(); Handler handler = new Handler(Looper.getMainLooper()); executor.execute(() -> { String result; try { result = loadXmlFromNetwork(urls[0]); } catch (IOException e) { result = getResources().getString(R.string.connection_error); } catch (XmlPullParserException e) { result = getResources().getString(R.string.xml_error); } String finalResult = result; handler.post(() -> { setContentView(R.layout.main); // Displays the HTML string in the UI via a WebView. WebView myWebView = (WebView) findViewById(R.id.webview); myWebView.loadData(finalResult, "text/html", null); }); }); }
从 downloadXml
调用 loadXmlFromNetwork()
方法,如下一段代码所示。它执行以下操作
- 实例化一个
StackOverflowXmlParser
。它还会为Entry
对象的List
(entries
)以及title
、url
和summary
创建变量,以保存从 XML Feed 中为这些字段提取的值。 - 调用
downloadUrl()
,它获取 Feed 并将其作为InputStream
返回。 - 使用
StackOverflowXmlParser
解析InputStream
。StackOverflowXmlParser
使用 Feed 数据填充entries
的List
。 - 处理
entries
List
并将 Feed 数据与 HTML 标记结合。 - 返回一个在主活动 UI 中显示的 HTML 字符串。
Kotlin
// Uploads XML from stackoverflow.com, parses it, and combines it with // HTML markup. Returns HTML string. @Throws(XmlPullParserException::class, IOException::class) private fun loadXmlFromNetwork(urlString: String): String { // Checks whether the user set the preference to include summary text. val pref: Boolean = PreferenceManager.getDefaultSharedPreferences(this)?.run { getBoolean("summaryPref", false) } ?: false val entries: List<Entry> = downloadUrl(urlString)?.use { stream -> // Instantiates the parser. StackOverflowXmlParser().parse(stream) } ?: emptyList() return StringBuilder().apply { append("<h3>${resources.getString(R.string.page_title)}</h3>") append("<em>${resources.getString(R.string.updated)} ") append("${formatter.format(rightNow.time)}</em>") // StackOverflowXmlParser returns a List (called "entries") of Entry objects. // Each Entry object represents a single post in the XML feed. // This section processes the entries list to combine each entry with HTML markup. // Each entry is displayed in the UI as a link that optionally includes // a text summary. entries.forEach { entry -> append("<p><a href='") append(entry.link) append("'>" + entry.title + "</a></p>") // If the user set the preference to include summary text, // adds it to the display. if (pref) { append(entry.summary) } } }.toString() } // Given a string representation of a URL, sets up a connection and gets // an input stream. @Throws(IOException::class) private fun downloadUrl(urlString: String): InputStream? { val url = URL(urlString) return (url.openConnection() as? HttpURLConnection)?.run { readTimeout = 10000 connectTimeout = 15000 requestMethod = "GET" doInput = true // Starts the query. connect() inputStream } }
Java
// Uploads XML from stackoverflow.com, parses it, and combines it with // HTML markup. Returns HTML string. private String loadXmlFromNetwork(String urlString) throws XmlPullParserException, IOException { InputStream stream = null; // Instantiates the parser. StackOverflowXmlParser stackOverflowXmlParser = new StackOverflowXmlParser(); List<Entry> entries = null; String title = null; String url = null; String summary = null; Calendar rightNow = Calendar.getInstance(); DateFormat formatter = new SimpleDateFormat("MMM dd h:mmaa"); // Checks whether the user set the preference to include summary text. SharedPreferences sharedPrefs = PreferenceManager.getDefaultSharedPreferences(this); boolean pref = sharedPrefs.getBoolean("summaryPref", false); StringBuilder htmlString = new StringBuilder(); htmlString.append("<h3>" + getResources().getString(R.string.page_title) + "</h3>"); htmlString.append("<em>" + getResources().getString(R.string.updated) + " " + formatter.format(rightNow.getTime()) + "</em>"); try { stream = downloadUrl(urlString); entries = stackOverflowXmlParser.parse(stream); // Makes sure that the InputStream is closed after the app is // finished using it. } finally { if (stream != null) { stream.close(); } } // StackOverflowXmlParser returns a List (called "entries") of Entry objects. // Each Entry object represents a single post in the XML feed. // This section processes the entries list to combine each entry with HTML markup. // Each entry is displayed in the UI as a link that optionally includes // a text summary. for (Entry entry : entries) { htmlString.append("<p><a href='"); htmlString.append(entry.link); htmlString.append("'>" + entry.title + "</a></p>"); // If the user set the preference to include summary text, // adds it to the display. if (pref) { htmlString.append(entry.summary); } } return htmlString.toString(); } // Given a string representation of a URL, sets up a connection and gets // an input stream. private InputStream downloadUrl(String urlString) throws IOException { URL url = new URL(urlString); HttpURLConnection conn = (HttpURLConnection) url.openConnection(); conn.setReadTimeout(10000 /* milliseconds */); conn.setConnectTimeout(15000 /* milliseconds */); conn.setRequestMethod("GET"); conn.setDoInput(true); // Starts the query. conn.connect(); return conn.getInputStream(); }