
本教程详细阐述了如何使用java dom parser高效解析多层级xml文件。针对`getelementsbytagname`全局搜索的常见问题,提供了通过限定上下文进行局部解析的解决方案。同时,指导读者如何将解析出的分散数据进行结构化存储和关联,最终实现按需分组输出,提升数据处理的准确性和可读性。
Java的DOM(Document Object Model)Parser是一种将XML文档完全加载到内存中,并将其表示为一棵节点树的解析方式。它允许开发者通过遍历这棵树来访问和操作XML文档的任何部分。对于结构清晰、层级分明的XML文件,DOM Parser提供了直观的编程接口。然而,在处理多层级XML文件时,如果不当使用,例如过度依赖全局搜索方法,可能会导致解析结果不准确或难以组织。
考虑以下员工信息的多层级XML结构:
<?xml version="1.0" encoding="UTF-8"?>
<employee>
<employee_list>
<employee ID="1">
<firstname>Andrei</firstname>
<lastname>Rus</lastname>
<age>23</age>
<position-skill ref="Java"/>
<detail-ref ref="AndreiR"/>
</employee>
<!-- ... 更多员工 ... -->
</employee_list>
<position_details>
<position ID="Java">
<role>Junior Developer</role>
<skill_name>Java</skill_name>
<experience>1</experience>
</position>
<!-- ... 更多职位详情 ... -->
</position_details>
<employee_info>
<detail ID="AndreiR">
<username>AndreiR</username>
<residence>Timisoara</residence>
<yearOfBirth>1999</yearOfBirth>
<phone>0</phone>
</detail>
<!-- ... 更多员工信息 ... -->
</employee_info>
</employee>这个XML文件包含employee_list、position_details和employee_info三个主要类别,它们都嵌套在根元素<employee>之下。如果直接使用doc.getElementsByTagName("employee")来获取员工节点,可能会意外地将根元素<employee>也包含在结果中,从而导致解析错误或不期望的输出。
Document.getElementsByTagName()方法会全局搜索文档中所有指定名称的元素。这意味着它不仅会查找<employee_list>下的<employee>元素,还会查找根元素<employee>本身。为了避免这种混淆,我们应该限定搜索的上下文,即在父元素内部进行子元素的查找。
立即学习“Java免费学习笔记(深入)”;
问题分析: 在上述XML结构中,根元素是<employee>。当执行doc.getElementsByTagName("employee")时,DOM Parser会返回所有名为"employee"的节点,包括根元素<employee>以及<employee_list>下的所有<employee>子元素。由于根元素<employee>没有ID属性,也没有直接的lastname、firstname等子元素,对其进行属性或子元素访问将导致问题。
解决方案:限定搜索上下文 正确的做法是首先获取特定的类别父元素(如employee_list),然后在其内部搜索所需的子元素(如employee)。这样可以确保我们只处理目标层级的数据。
以下是针对XML中三个主要类别的修正解析方法:
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class XmlParserExample {
// 定义数据模型类
static class Employee {
String id;
String firstname;
String lastname;
int age;
String positionSkillRef;
String detailRef;
// 构造器、Getter/Setter省略
public Employee(String id, String firstname, String lastname, int age, String positionSkillRef, String detailRef) {
this.id = id;
this.firstname = firstname;
this.lastname = lastname;
this.age = age;
this.positionSkillRef = positionSkillRef;
this.detailRef = detailRef;
}
public String getId() { return id; }
public String getFirstname() { return firstname; }
public String getLastname() { return lastname; }
public int getAge() { return age; }
public String getPositionSkillRef() { return positionSkillRef; }
public String getDetailRef() { return detailRef; }
@Override
public String toString() {
return "Employee{" +
"id='" + id + '\'' +
", firstname='" + firstname + '\'' +
", lastname='" + lastname + '\'' +
", age=" + age +
", positionSkillRef='" + positionSkillRef + '\'' +
", detailRef='" + detailRef + '\'' +
'}';
}
}
static class PositionDetail {
String id;
String role;
String skillName;
int experience;
// 构造器、Getter/Setter省略
public PositionDetail(String id, String role, String skillName, int experience) {
this.id = id;
this.role = role;
this.skillName = skillName;
this.experience = experience;
}
public String getId() { return id; }
public String getRole() { return role; }
public String getSkillName() { return skillName; }
public int getExperience() { return experience; }
@Override
public String toString() {
return "PositionDetail{" +
"id='" + id + '\'' +
", role='" + role + '\'' +
", skillName='" + skillName + '\'' +
", experience=" + experience +
'}';
}
}
static class EmployeeInfo {
String id;
String username;
String residence;
int yearOfBirth;
String phone;
// 构造器、Getter/Setter省略
public EmployeeInfo(String id, String username, String residence, int yearOfBirth, String phone) {
this.id = id;
this.username = username;
this.residence = residence;
this.yearOfBirth = yearOfBirth;
this.phone = phone;
}
public String getId() { return id; }
public String getUsername() { return username; }
public String getResidence() { return residence; }
public int getYearOfBirth() { return yearOfBirth; }
public String getPhone() { return phone; }
@Override
public String toString() {
return "EmployeeInfo{" +
"id='" + id + '\'' +
", username='" + username + '\'' +
", residence='" + residence + '\'' +
", yearOfBirth=" + yearOfBirth +
", phone='" + phone + '\'' +
'}';
}
}
// 聚合类,用于整合所有相关信息
static class FullEmployeeRecord {
Employee employee;
PositionDetail positionDetail;
EmployeeInfo employeeInfo;
public FullEmployeeRecord(Employee employee, PositionDetail positionDetail, EmployeeInfo employeeInfo) {
this.employee = employee;
this.positionDetail = positionDetail;
this.employeeInfo = employeeInfo;
}
// Getter方法
public Employee getEmployee() { return employee; }
public PositionDetail getPositionDetail() { return positionDetail; }
public EmployeeInfo getEmployeeInfo() { return employeeInfo; }
public void printGroupedInfo() {
System.out.println("PersonId: " + employee.getId());
System.out.println("Firstname: " + employee.getFirstname());
System.out.println("Lastname: " + employee.getLastname());
System.out.println("Age: " + employee.getAge());
if (positionDetail != null) {
System.out.println("Role: " + positionDetail.getRole());
System.out.println("Skill Name: " + positionDetail.getSkillName());
System.out.println("Experience: " + positionDetail.getExperience());
} else {
System.out.println("Role: N/A");
System.out.println("Skill Name: N/A");
System.out.println("Experience: N/A");
}
if (employeeInfo != null) {
System.out.println("Username: " + employeeInfo.getUsername());
System.out.println("Residence: " + employeeInfo.getResidence());
System.out.println("Year Of Birth: " + employeeInfo.getYearOfBirth());
System.out.println("Phone: " + employeeInfo.getPhone());
} else {
System.out.println("Username: N/A");
System.out.println("Residence: N/A");
System.out.println("Year Of Birth: N/A");
System.out.println("Phone: N/A");
}
System.out.println("--------------------------------------------------------------------------");
}
}
public static void main(String[] args) {
try {
File xmlDoc = new File("employees.xml"); // 确保XML文件存在于项目根目录或指定路径
DocumentBuilderFactory dbFact = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuild = dbFact.newDocumentBuilder();
Document doc = dBuild.parse(xmlDoc);
// 规范化文档,去除空白文本节点
doc.getDocumentElement().normalize();
System.out.println("Root element: " + doc.getDocumentElement().getNodeName());
System.out.println("-----------------------------------------------------------------------------");
// 1. 解析 employee_list
List<Employee> employees = new ArrayList<>();
NodeList employeeListNodes = doc.getElementsByTagName("employee_list");
if (employeeListNodes.getLength() > 0) {
Element employeeListElement = (Element) employeeListNodes.item(0);
NodeList employeeNodes = employeeListElement.getElementsByTagName("employee");
for (int i = 0; i < employeeNodes.getLength(); i++) {
Node node = employeeNodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) node;
String id = eElement.getAttribute("ID");
String firstname = getTagValue("firstname", eElement);
String lastname = getTagValue("lastname", eElement);
int age = Integer.parseInt(getTagValue("age", eElement));
String positionSkillRef = ((Element)eElement.getElementsByTagName("position-skill").item(0)).getAttribute("ref");
String detailRef = ((Element)eElement.getElementsByTagName("detail-ref").item(0)).getAttribute("ref");
employees.add(new Employee(id, firstname, lastname, age, positionSkillRef, detailRef));
}
}
}
System.out.println("Parsed " + employees.size() + " employees.");
// 2. 解析 position_details 并存储到Map以便快速查找
Map<String, PositionDetail> positionDetailsMap = new HashMap<>();
NodeList positionDetailsNodes = doc.getElementsByTagName("position_details");
if (positionDetailsNodes.getLength() > 0) {
Element positionDetailsElement = (Element) positionDetailsNodes.item(0);
NodeList positionNodes = positionDetailsElement.getElementsByTagName("position");
for (int i = 0; i < positionNodes.getLength(); i++) {
Node node = positionNodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) node;
String id = eElement.getAttribute("ID");
String role = getTagValue("role", eElement);
String skillName = getTagValue("skill_name", eElement);
int experience = Integer.parseInt(getTagValue("experience", eElement));
positionDetailsMap.put(id, new PositionDetail(id, role, skillName, experience));
}
}
}
System.out.println("Parsed " + positionDetailsMap.size() + " position details.");
// 3. 解析 employee_info 并存储到Map以便快速查找
Map<String, EmployeeInfo> employeeInfoMap = new HashMap<>();
NodeList employeeInfoNodes = doc.getElementsByTagName("employee_info");
if (employeeInfoNodes.getLength() > 0) {
Element employeeInfoElement = (Element) employeeInfoNodes.item(0);
NodeList detailNodes = employeeInfoElement.getElementsByTagName("detail");
for (int i = 0; i < detailNodes.getLength(); i++) {
Node node = detailNodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) node;
String id = eElement.getAttribute("ID");
String username = getTagValue("username", eElement);
String residence = getTagValue("residence", eElement);
int yearOfBirth = Integer.parseInt(getTagValue("yearOfBirth", eElement));
String phone = getTagValue("phone", eElement);
employeeInfoMap.put(id, new EmployeeInfo(id, username, residence, yearOfBirth, phone));
}
}
}
System.out.println("Parsed " + employeeInfoMap.size() + " employee info records.");
System.out.println("\n=============================================================================================");
System.out.println("Grouped Employee Information:");
System.out.println("=============================================================================================");
// 4. 关联数据并输出
for (Employee emp : employees) {
PositionDetail posDetail = positionDetailsMap.get(emp.getPositionSkillRef());
EmployeeInfo empInfo = employeeInfoMap.get(emp.getDetailRef());
FullEmployeeRecord record = new FullEmployeeRecord(emp, posDetail, empInfo);
record.printGroupedInfo();
}
} catch (Exception e) {
e.printStackTrace();
}
}
// 辅助方法,用于安全获取元素文本内容
private static String getTagValue(String tag, Element element) {
NodeList nodeList = element.getElementsByTagName(tag);
if (nodeList != null && nodeList.getLength() > 0) {
Node node = nodeList.item(0);
if (node != null) {
return node.getTextContent();
}
}
return ""; // 返回空字符串而不是null,避免NullPointerException
}
}代码说明:
NodeList employeeListNodes = doc.getElementsByTagName("employee_list");
Element employeeListElement = (Element) employeeListNodes.item(0);
NodeList employeeNodes = employeeListElement.getElementsByTagName("employee"); // 在 employee_list 内部查找 employee原始的代码直接在解析过程中打印输出,导致数据分散且难以进行复杂的关联或后续处理。为了实现更灵活、更结构化的输出,我们应该将解析出的数据存储在自定义的Java对象中。
痛点:分散输出与数据关联 XML文件中的数据通常是相互关联的。例如,一个<employee>通过position-skill和detail-ref属性引用了<position>和<detail>信息。如果只是独立地解析和打印,就无法将这些相关信息整合到一起。
解决方案:定义数据模型(POJO) 为XML中的每个主要实体(如Employee, PositionDetail, EmployeeInfo)创建对应的Java类(Plain Old Java Object, POJO)。这些类将包含与XML元素对应的字段、构造器和Getter/Setter方法。
解决方案:构建关联数据模型 为了实现最终所需的按人分组输出,我们可以创建一个聚合类,例如FullEmployeeRecord,它包含一个Employee对象、一个PositionDetail对象和一个EmployeeInfo对象。在解析完所有独立类别的数据后,我们可以通过它们之间的引用(例如employee的position-skill ref和detail-ref属性)来构建这些聚合对象。
在上述示例代码中:
以上就是Java DOM Parser:解析多层级XML文件的策略与实践的详细内容,更多请关注php中文网其它相关文章!
每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号