Activity for JSpider

  • EJP EJP modified ticket #14

    Logging revisions

  • EJP EJP modified ticket #8

    Sitemap plugin improvements

  • EJP EJP modified ticket #15

    Rationalize Rules subsystem

  • EJP EJP modified ticket #17

    Sitemap parser

  • EJP EJP modified ticket #9

    Set If-Modified-Since header and observe 304 status

  • EJP EJP modified ticket #10

    Use HEAD request when spidering: only fetch content if actually required

  • EJP EJP modified ticket #11

    Rewrite cookie handling to use JDK facilities

  • EJP EJP modified ticket #12

    Rewrite authentication to use standard JDK facilities

  • EJP EJP modified ticket #13

    Rewrite proxying to use standard JDK facilities

  • EJP EJP modified ticket #14

    Logging revisions

  • EJP EJP modified ticket #16

    Use 'default' configuration as the default for all other general configurations

  • EJP EJP modified ticket #18

    Accept Content-Encoding: gzip, deflate

  • EJP EJP created ticket #18

    Accept Content-Encoding: gzip, deflate

  • EJP EJP modified ticket #1

    Lucene Plug-In

  • EJP EJP modified ticket #9

    Set If-Modified-Since header and observe 304 status

  • EJP EJP modified ticket #10

    Use HEAD request when spidering: only fetch content if actually required

  • EJP EJP modified ticket #11

    Rewrite cookie handling to use JDK facilities

  • EJP EJP modified ticket #11

    Rewrite cookie handling to use JDK facilities

  • EJP EJP modified ticket #12

    Rewrite authentication to use standard JDK facilities

  • EJP EJP modified ticket #13

    Rewrite proxying to use standard JDK facilities

  • EJP EJP modified ticket #14

    Logging revisions

  • EJP EJP modified ticket #15

    Rationalize Rules subsystem

  • EJP EJP modified ticket #16

    Use 'default' configuration as the default for all other general configurations

  • EJP EJP created ticket #17

    Sitemap parser

  • EJP EJP modified ticket #8

    Sitemap plugin improvements

  • EJP EJP created ticket #16

    Use 'default' configuration as the default for all other general configurations

  • EJP EJP created ticket #15

    Rationalize Rules subsystem

  • EJP EJP created ticket #14

    Logging revisions

  • EJP EJP created ticket #13

    Rewrite proxying to use standard JDK facilities

  • EJP EJP created ticket #12

    Rewrite authentication to use standard JDK facilities

  • EJP EJP created ticket #11

    Rewrite cookie handling to use JDK facilities

  • EJP EJP created ticket #10

    Use HEAD request when spidering: only fetch content if actually required

  • EJP EJP created ticket #9

    Set If-Modified-Since header and observe 304 status

  • EJP EJP created ticket #8

    Sitemap plugin improvements

  • EJP EJP posted a comment on ticket #7

    Deleted this code again in favour of using a java.net.Authenticator as God intended.

  • EJP EJP modified ticket #1

    Lucene Plug-In

  • EJP EJP posted a comment on ticket #1

    Accepted for upcoming 1.0 release. Configurability remains an issue, mainly of Analyzers. I presently have it that you can name your own class if it just needs no-args construction, default StandardAnalyzer, and if you need to use a more complex one you override a getAnalyzer() method. Not sure what else can be done. It's a long time since I used Lucene and they keep changing the API, which doesn't help. Also it introduces a dependency on SLF4J grrr.

  • EJP EJP posted a comment on discussion Help

    All this is being addressed in the upcoming 1.0 release.

  • EJP EJP posted a comment on discussion Open Discussion

    Anthony, I am preparing a 1.0 release to appear in the coming weeks. Many improvements. I plan to support this as long as I can, and to leave it in a clean state for when I can't. EJP

  • EJP EJP modified ticket #1

    Lucene Plug-In

  • EJP EJP posted a comment on ticket #5

    Fixed in upcoming 1.0 release.

  • EJP EJP modified ticket #5

    jSpider reusability.

  • EJP EJP created ticket #28

    Events not visited

  • EJP EJP modified ticket #24

    no favicon.ico support

  • EJP EJP modified ticket #4

    String index out of range: -1

  • EJP EJP modified ticket #6

    Spider CSS and JS files

  • EJP EJP posted a comment on ticket #6

    Good idea. A CSS parser and a more flexible mime-type system have been added to the upcoming 1.0 release.

  • EJP EJP modified ticket #3

    Make Caching site specific property

  • EJP EJP posted a comment on ticket #3

    This patch also requires adding the following to SpiderContextImpl.registerNewSite(), after all the other 'sitei' configurations: sitei.setCacheControl(siteProps.getString(ConfigConstants.SITE_CACHE_CONTROL, Constants.CACHE_CONTROL)); sitei.setUseCache(siteProps.getBoolean(ConfigConstants.SITE_CACHE_USE, false));

  • EJP EJP modified ticket #4

    OnDiskStorage implementation

  • EJP EJP modified ticket #7

    Add basic http auth (with patch)

  • EJP EJP modified ticket #4

    OnDiskStorage implementation

  • EJP EJP modified ticket #7

    Add basic http auth (with patch)

  • EJP EJP modified ticket #3

    Make Caching site specific property

  • EJP EJP posted a comment on ticket #3

    This patch also requires adding the following to SpiderContextImpl.registerNewSite(), after all the other 'sitei' configurations: sitei.setCacheControl(siteProps.getString(ConfigConstants.SITE_CACHE_CONTROL, Constants.CACHE_CONTROL)); sitei.setUseCache(siteProps.getBoolean(ConfigConstants.SITE_CACHE_USE, false));

  • EJP EJP modified ticket #2

    Added functionality to Storage

  • EJP EJP modified ticket #4

    OnDiskStorage implementation

  • EJP EJP posted a comment on ticket #4

    Incorporated in upcoming 1.0 release.

  • EJP EJP modified ticket #7

    Add basic http auth (with patch)

  • EJP EJP posted a comment on ticket #7

    Incorporated in upcoming 1.0 release.

  • EJP EJP modified ticket #1

    new to search engine

  • EJP EJP posted a comment on ticket #1

    Certainly.

  • EJP EJP modified ticket #1

    fixes problem with redirects not specifying full url

  • EJP EJP modified ticket #20

    Java 1.5.0 API Changes Thread.getState()

  • EJP EJP modified ticket #23

    WorkerThread int getState method clashes with Java 1.5

  • EJP EJP modified ticket #25

    Jspider crash on Web site

  • EJP EJP posted a comment on ticket #25

    The StringIndexOutOfBoundsException is a duplicate of #5. The HTTP 500 status isn't a problem with JSpider. The notfound.htm error should have been fixed.

  • EJP EJP modified ticket #5

    PANIC! Task net.javacoding.jspider.core.task.work.FetchRobot

  • EJP EJP posted a comment on ticket #5

    Fixed in upcoming 1.0 release.

  • EJP EJP modified ticket #8

    No activity besides start page when robotstxt.fetch is false

  • EJP EJP modified ticket #7

    TextHtmlMimeTypeOnlyRule: Exception not caught

  • EJP EJP posted a comment on ticket #7

    The only exception that can be thrown is InvalidStateForActionException, which indicates a coding bug. Unclear why this should be fixed.

  • EJP EJP modified a comment on ticket #6

    Fixed in upcoming 1.0 release. Plugins can now have any of four constructors: () (PropertySet config) (String name) (String name, PropertySet config)

  • EJP EJP modified ticket #6

    DiskWriterPlugin constructor missing parameter

  • EJP EJP posted a comment on ticket #6

    Fixed in upcoming 1.0 release. Plugins can now have any of three constructors: () (PropertySet config) (String name, PropertySet config)

  • EJP EJP modified ticket #9

    Port is not included in the DiskWriterPlugin folder.

  • EJP EJP posted a comment on ticket #9

    After consideration it seems to me that the present behaviour is correct. A site is identified by a hostname. A different protocol (http/https) or port doesn't make it a different site.

  • EJP EJP modified ticket #11

    Two sites on same physical machine considered equal

  • EJP EJP posted a comment on ticket #11

    Fixed in upcoming 1.0 release. SiteDAOImpl now keys from the hostname, not the URL, which solves other problems as well, e.g. different protocols or ports.

  • EJP EJP modified ticket #12

    Improper resolving of relative URLs

  • EJP EJP posted a comment on ticket #12

    Fixed in upcoming 1.0 release.

  • EJP EJP modified ticket #13

    I solve a spider exit jdbc bug

  • EJP EJP posted a comment on ticket #13

    Fixed in both setError() methods in upcoming 1.0 release.

  • EJP EJP modified ticket #14

    Spider follows commented out links

  • EJP EJP posted a comment on ticket #14

    Fixed in upcoming 1.0 release. A proper HTML parser is now used and links are identifed via XPath expressions. Comments are therefore ignored, as are element/attribute pairs that can't be links, even if they are.

  • EJP EJP modified ticket #15

    Query part is trimmed from URLs

  • EJP EJP posted a comment on ticket #15

    Queyr removal has been made optional in the upcoming 1.0 release.

  • EJP EJP modified ticket #26

    CookieDAOImpl uses incorrect SQL

  • EJP EJP modified ticket #16

    Apparent URLs in HTML PRE elements are checked

  • EJP EJP modified ticket #18

    Plugin compatibility with libgcj

  • EJP EJP modified a comment on ticket #18

    The GNU CLASSPATH project to which libgcj belongs is defunct. It never really progressed beyond Java 1.2: it hasn't been updated since 2012; and the companion GCJ Java compiler project was terminated some years ago. Closing as out of date.

  • EJP EJP posted a comment on ticket #18

    The GNIU CLASSPATH project is defunct. It never really progressed beyond Java 1.2: it hasn't been updated since 2012; and the companion GCJ Java compiler project was terminated some years ago. Closing as out of date.

  • EJP EJP modified ticket #26

    CookieDAOImpl uses incorrect SQL

  • EJP EJP posted a comment on ticket #26

    Fixed in upcoming 1.0 release.

  • EJP EJP posted a comment on discussion Open Discussion

    You can get it by listening to the SpideringStartedEvent event, i.e. by providing a visit(SpideringStartedEvent event) method.

  • EJP EJP modified a comment on discussion Help

    MySQL Blobs have 2^16-1 bytes. Change the column to a MEDIUMBLOB (2^24-1 bytes).

  • EJP EJP modified a comment on ticket #9

    It could be argued that the current behaviour is correct, given that say you don't want to treat http://localhost:80 and https://localhost:443 as different sites. Maybe it should distinguish different ports within the same protocol. I will ponder on this for the upcoming release.

  • EJP EJP posted a comment on ticket #9

    It could be argued that this is correct, given that say you don't want to treat http://localhost:80 and https://localhost:443 as different sites. Maybe it should distinguish different ports within the same protocol. I will ponder on this for the upcoming release.

1 >
MongoDB Logo MongoDB