I'm populating a corpus from XML files that all contain the same http URL pointing to the DTD. This is very slow because the XML parser seems to download the DTD again for each file. If possible, the parser should cache the DTD locally.
It might be a good idea to add support for catalogs. Then you could just declare your favorite DTD's to be available locally. The coding is much less complex than a cache.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It might be a good idea to add support for catalogs. Then you could just declare your favorite DTD's to be available locally. The coding is much less complex than a cache.