JSpider Activity

Status: Alpha

Brought to you by: esmondpitt, vanrogu

Activity for JSpider

4 years ago
EJP modified ticket #14

Logging revisions
4 years ago
EJP modified ticket #8

Sitemap plugin improvements
4 years ago
EJP modified ticket #15

Rationalize Rules subsystem
4 years ago
EJP modified ticket #17

Sitemap parser
4 years ago
EJP modified ticket #9

Set If-Modified-Since header and observe 304 status
4 years ago
EJP modified ticket #10

Use HEAD request when spidering: only fetch content if actually required
4 years ago
EJP modified ticket #11

Rewrite cookie handling to use JDK facilities
4 years ago
EJP modified ticket #12

Rewrite authentication to use standard JDK facilities
4 years ago
EJP modified ticket #13

Rewrite proxying to use standard JDK facilities
4 years ago
EJP modified ticket #14

Logging revisions
4 years ago
EJP modified ticket #16

Use 'default' configuration as the default for all other general configurations
4 years ago
EJP modified ticket #18

Accept Content-Encoding: gzip, deflate
4 years ago
EJP created ticket #18

Accept Content-Encoding: gzip, deflate
4 years ago
EJP modified ticket #1

Lucene Plug-In
4 years ago
EJP modified ticket #9

Set If-Modified-Since header and observe 304 status
4 years ago
EJP modified ticket #10

Use HEAD request when spidering: only fetch content if actually required
4 years ago
EJP modified ticket #11

Rewrite cookie handling to use JDK facilities
4 years ago
EJP modified ticket #11

Rewrite cookie handling to use JDK facilities
4 years ago
EJP modified ticket #12

Rewrite authentication to use standard JDK facilities
4 years ago
EJP modified ticket #13

Rewrite proxying to use standard JDK facilities
4 years ago
EJP modified ticket #14

Logging revisions
4 years ago
EJP modified ticket #15

Rationalize Rules subsystem
4 years ago
EJP modified ticket #16

Use 'default' configuration as the default for all other general configurations
4 years ago
EJP created ticket #17

Sitemap parser
4 years ago
EJP modified ticket #8

Sitemap plugin improvements
4 years ago
EJP created ticket #16

Use 'default' configuration as the default for all other general configurations
4 years ago
EJP created ticket #15

Rationalize Rules subsystem
4 years ago
EJP created ticket #14

Logging revisions
4 years ago
EJP created ticket #13

Rewrite proxying to use standard JDK facilities
4 years ago
EJP created ticket #12

Rewrite authentication to use standard JDK facilities
4 years ago
EJP created ticket #11

Rewrite cookie handling to use JDK facilities
4 years ago
EJP created ticket #10

Use HEAD request when spidering: only fetch content if actually required
4 years ago
EJP created ticket #9

Set If-Modified-Since header and observe 304 status
4 years ago
EJP created ticket #8

Sitemap plugin improvements
5 years ago
EJP posted a comment on ticket #7

Deleted this code again in favour of using a java.net.Authenticator as God intended.
5 years ago
EJP modified ticket #1

Lucene Plug-In
5 years ago
EJP posted a comment on ticket #1

Accepted for upcoming 1.0 release. Configurability remains an issue, mainly of Analyzers. I presently have it that you can name your own class if it just needs no-args construction, default StandardAnalyzer, and if you need to use a more complex one you override a getAnalyzer() method. Not sure what else can be done. It's a long time since I used Lucene and they keep changing the API, which doesn't help. Also it introduces a dependency on SLF4J grrr.
5 years ago
EJP posted a comment on discussion Help

All this is being addressed in the upcoming 1.0 release.
5 years ago
EJP posted a comment on discussion Open Discussion

Anthony, I am preparing a 1.0 release to appear in the coming weeks. Many improvements. I plan to support this as long as I can, and to leave it in a clean state for when I can't. EJP
5 years ago
EJP modified ticket #1

Lucene Plug-In
5 years ago
EJP posted a comment on ticket #5

Fixed in upcoming 1.0 release.
5 years ago
EJP modified ticket #5

jSpider reusability.
5 years ago
EJP created ticket #28

Events not visited
5 years ago
EJP modified ticket #24

no favicon.ico support
5 years ago
EJP modified ticket #4

String index out of range: -1
5 years ago
EJP modified ticket #6

Spider CSS and JS files
5 years ago
EJP posted a comment on ticket #6

Good idea. A CSS parser and a more flexible mime-type system have been added to the upcoming 1.0 release.
5 years ago
EJP modified ticket #3

Make Caching site specific property
5 years ago
EJP posted a comment on ticket #3

This patch also requires adding the following to SpiderContextImpl.registerNewSite(), after all the other 'sitei' configurations: sitei.setCacheControl(siteProps.getString(ConfigConstants.SITE_CACHE_CONTROL, Constants.CACHE_CONTROL)); sitei.setUseCache(siteProps.getBoolean(ConfigConstants.SITE_CACHE_USE, false));
5 years ago
EJP modified ticket #4

OnDiskStorage implementation
5 years ago
EJP modified ticket #7

Add basic http auth (with patch)
5 years ago
EJP modified ticket #4

OnDiskStorage implementation
5 years ago
EJP modified ticket #7

Add basic http auth (with patch)
5 years ago
EJP modified ticket #3

Make Caching site specific property
5 years ago
EJP posted a comment on ticket #3

This patch also requires adding the following to SpiderContextImpl.registerNewSite(), after all the other 'sitei' configurations: sitei.setCacheControl(siteProps.getString(ConfigConstants.SITE_CACHE_CONTROL, Constants.CACHE_CONTROL)); sitei.setUseCache(siteProps.getBoolean(ConfigConstants.SITE_CACHE_USE, false));
5 years ago
EJP modified ticket #2

Added functionality to Storage
5 years ago
EJP modified ticket #4

OnDiskStorage implementation
5 years ago
EJP posted a comment on ticket #4

Incorporated in upcoming 1.0 release.
5 years ago
EJP modified ticket #7

Add basic http auth (with patch)
5 years ago
EJP posted a comment on ticket #7

Incorporated in upcoming 1.0 release.
5 years ago
EJP modified ticket #1

new to search engine
5 years ago
EJP posted a comment on ticket #1

Certainly.
5 years ago
EJP modified ticket #1

fixes problem with redirects not specifying full url
5 years ago
EJP modified ticket #20

Java 1.5.0 API Changes Thread.getState()
5 years ago
EJP modified ticket #23

WorkerThread int getState method clashes with Java 1.5
5 years ago
EJP modified ticket #25

Jspider crash on Web site
5 years ago
EJP posted a comment on ticket #25

The StringIndexOutOfBoundsException is a duplicate of #5. The HTTP 500 status isn't a problem with JSpider. The notfound.htm error should have been fixed.
5 years ago
EJP modified ticket #5

PANIC! Task net.javacoding.jspider.core.task.work.FetchRobot
5 years ago
EJP posted a comment on ticket #5

Fixed in upcoming 1.0 release.
5 years ago
EJP modified ticket #8

No activity besides start page when robotstxt.fetch is false
5 years ago
EJP modified ticket #7

TextHtmlMimeTypeOnlyRule: Exception not caught
5 years ago
EJP posted a comment on ticket #7

The only exception that can be thrown is InvalidStateForActionException, which indicates a coding bug. Unclear why this should be fixed.
5 years ago
EJP modified a comment on ticket #6

Fixed in upcoming 1.0 release. Plugins can now have any of four constructors: () (PropertySet config) (String name) (String name, PropertySet config)
5 years ago
EJP modified ticket #6

DiskWriterPlugin constructor missing parameter
5 years ago
EJP posted a comment on ticket #6

Fixed in upcoming 1.0 release. Plugins can now have any of three constructors: () (PropertySet config) (String name, PropertySet config)
5 years ago
EJP modified ticket #9

Port is not included in the DiskWriterPlugin folder.
5 years ago
EJP posted a comment on ticket #9

After consideration it seems to me that the present behaviour is correct. A site is identified by a hostname. A different protocol (http/https) or port doesn't make it a different site.
5 years ago
EJP modified ticket #11

Two sites on same physical machine considered equal
5 years ago
EJP posted a comment on ticket #11

Fixed in upcoming 1.0 release. SiteDAOImpl now keys from the hostname, not the URL, which solves other problems as well, e.g. different protocols or ports.
5 years ago
EJP modified ticket #12

Improper resolving of relative URLs
5 years ago
EJP posted a comment on ticket #12

Fixed in upcoming 1.0 release.
5 years ago
EJP modified ticket #13

I solve a spider exit jdbc bug
5 years ago
EJP posted a comment on ticket #13

Fixed in both setError() methods in upcoming 1.0 release.
5 years ago
EJP modified ticket #14

Spider follows commented out links
5 years ago
EJP posted a comment on ticket #14

Fixed in upcoming 1.0 release. A proper HTML parser is now used and links are identifed via XPath expressions. Comments are therefore ignored, as are element/attribute pairs that can't be links, even if they are.
5 years ago
EJP modified ticket #15

Query part is trimmed from URLs
5 years ago
EJP posted a comment on ticket #15

Queyr removal has been made optional in the upcoming 1.0 release.
5 years ago
EJP modified ticket #26

CookieDAOImpl uses incorrect SQL
5 years ago
EJP modified ticket #16

Apparent URLs in HTML PRE elements are checked
5 years ago
EJP modified ticket #18

Plugin compatibility with libgcj
5 years ago
EJP modified a comment on ticket #18

The GNU CLASSPATH project to which libgcj belongs is defunct. It never really progressed beyond Java 1.2: it hasn't been updated since 2012; and the companion GCJ Java compiler project was terminated some years ago. Closing as out of date.
5 years ago
EJP posted a comment on ticket #18

The GNIU CLASSPATH project is defunct. It never really progressed beyond Java 1.2: it hasn't been updated since 2012; and the companion GCJ Java compiler project was terminated some years ago. Closing as out of date.
5 years ago
EJP modified ticket #26

CookieDAOImpl uses incorrect SQL
5 years ago
EJP posted a comment on ticket #26

Fixed in upcoming 1.0 release.
5 years ago
EJP posted a comment on discussion Open Discussion

You can get it by listening to the SpideringStartedEvent event, i.e. by providing a visit(SpideringStartedEvent event) method.
5 years ago
EJP modified a comment on discussion Help

MySQL Blobs have 2^16-1 bytes. Change the column to a MEDIUMBLOB (2^24-1 bytes).
5 years ago
EJP modified a comment on ticket #9

It could be argued that the current behaviour is correct, given that say you don't want to treat http://localhost:80 and https://localhost:443 as different sites. Maybe it should distinguish different ports within the same protocol. I will ponder on this for the upcoming release.
5 years ago
EJP posted a comment on ticket #9

It could be argued that this is correct, given that say you don't want to treat http://localhost:80 and https://localhost:443 as different sites. Maybe it should distinguish different ports within the same protocol. I will ponder on this for the upcoming release.

1 >

JSpider Activity

Activity for JSpider

EJP modified ticket #14

EJP modified ticket #8

EJP modified ticket #15

EJP modified ticket #17

EJP modified ticket #9

EJP modified ticket #10

EJP modified ticket #11

EJP modified ticket #12

EJP modified ticket #13

EJP modified ticket #14

EJP modified ticket #16

EJP modified ticket #18

EJP created ticket #18

EJP modified ticket #1

EJP modified ticket #9

EJP modified ticket #10

EJP modified ticket #11

EJP modified ticket #11

EJP modified ticket #12

EJP modified ticket #13

EJP modified ticket #14

EJP modified ticket #15

EJP modified ticket #16

EJP created ticket #17

EJP modified ticket #8

EJP created ticket #16

EJP created ticket #15

EJP created ticket #14

EJP created ticket #13

EJP created ticket #12

EJP created ticket #11

EJP created ticket #10

EJP created ticket #9

EJP created ticket #8

EJP posted a comment on ticket #7

EJP modified ticket #1

EJP posted a comment on ticket #1

EJP posted a comment on discussion Help

EJP posted a comment on discussion Open Discussion

EJP modified ticket #1

EJP posted a comment on ticket #5

EJP modified ticket #5

EJP created ticket #28

EJP modified ticket #24

EJP modified ticket #4

EJP modified ticket #6

EJP posted a comment on ticket #6

EJP modified ticket #3

EJP posted a comment on ticket #3

EJP modified ticket #4

EJP modified ticket #7

EJP modified ticket #4

EJP modified ticket #7

EJP modified ticket #3

EJP posted a comment on ticket #3

EJP modified ticket #2

EJP modified ticket #4

EJP posted a comment on ticket #4

EJP modified ticket #7

EJP posted a comment on ticket #7

EJP modified ticket #1

EJP posted a comment on ticket #1

EJP modified ticket #1

EJP modified ticket #20

EJP modified ticket #23

EJP modified ticket #25

EJP posted a comment on ticket #25

EJP modified ticket #5

EJP posted a comment on ticket #5

EJP modified ticket #8

EJP modified ticket #7

EJP posted a comment on ticket #7

EJP modified a comment on ticket #6

EJP modified ticket #6

EJP posted a comment on ticket #6

EJP modified ticket #9

EJP posted a comment on ticket #9

EJP modified ticket #11