You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
|
Feb
(19) |
Mar
(1) |
Apr
(9) |
May
(4) |
Jun
(15) |
Jul
(9) |
Aug
(11) |
Sep
(3) |
Oct
(3) |
Nov
(2) |
Dec
(13) |
2011 |
Jan
(1) |
Feb
|
Mar
(6) |
Apr
(2) |
May
(3) |
Jun
|
Jul
(3) |
Aug
(3) |
Sep
(3) |
Oct
(2) |
Nov
(5) |
Dec
(1) |
2012 |
Jan
(5) |
Feb
(2) |
Mar
(1) |
Apr
|
May
(5) |
Jun
(13) |
Jul
(18) |
Aug
(7) |
Sep
(1) |
Oct
(21) |
Nov
(2) |
Dec
(6) |
2013 |
Jan
(12) |
Feb
(3) |
Mar
|
Apr
(22) |
May
(1) |
Jun
|
Jul
(4) |
Aug
(2) |
Sep
(7) |
Oct
(1) |
Nov
(7) |
Dec
(1) |
2014 |
Jan
(4) |
Feb
|
Mar
(4) |
Apr
|
May
(13) |
Jun
(8) |
Jul
|
Aug
(1) |
Sep
|
Oct
|
Nov
(9) |
Dec
(1) |
2015 |
Jan
(5) |
Feb
(2) |
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Rob V. <rv...@do...> - 2014-06-16 16:02:50
|
Then please provide a minimal reproducible test case that shows the problem so I can look into it Thanks, Rob On 16/06/2014 16:55, "Tomasz Pluskiewicz" <tom...@gm...> wrote: >On Mon, Jun 16, 2014 at 4:55 PM, Rob Vesse <rv...@do...> wrote: >> Which version is this? I'm assuming pre 1.0.4 or lower? >> > >No, I got that behaviour in 1.0.5.x as well. > >> >> Since you are talking about Store Manager then this sounds like >>TOOLS-409 >> which was already reported and fixed for the 1.0.5 release. It was a >>bug >> in how Store Manager passed the data to the underlying writers. >> > >This isn't only the Store Manager but also instances of TripleStore. >When LoadFromFile() of LOAD <x> is used it's fine. But after updating >a fresh store with INSERT DATA, wrong dataset will be serialized. > >> >> Note that serializing an in-memory store directly in code was not >>affected >> in any way. >> >> The creation of the empty default graph is an implementation detail, any >> SPARQL Update which inserts data potentially causes the default graph to >> be created because the SPARQL specification states that a dataset always >> contains an unnamed default graph and some parts of the implementation >> assume graphs will already exist so it is safer and faster to pre-create >> any graphs that will potentially be affected. >> >> LOAD is a special case because it is really just a shim to the parser >> sub-system and the way the parser sub-system generates data means that >> only specifically mentioned graphs are ever created. >> >> Rob >> >> On 16/06/2014 15:14, "Tomasz Pluskiewicz" <tom...@gm...> >> wrote: >> >>>Hi >>> >>>We've noticed weird behaviour with in-memory triple store, when >>>serializing to NQuads. Here's what happens: >>> >>>1. Create an empty TripleStore >>>2. Run UPDATE >>> >>>INSERT DATA { >>>GRAPH <http://test.org/user> { >>><http://test.org/user> >>><http://www.w3.org/1999/02/22-rdf-syntax-ns#type> >>><http://schema.org/Person> . >>><http://test.org/user> <http://some/ontology/favorite> >>><http://test.org/product/name> . >>>} >>>GRAPH <http://test.org/prodList/> { >>><http://test.org/user> <http://xmlns.com/foaf/0.1/primaryTopic> >>><http://test.org/user> . >>>} >>>} >>> >>>3. Serialize to NQuads. Store Manager correctly informs that 3 triple >>>were serialized in 3 graphs (including empty default graph). >>> >>>The output file contains all triples, but without graph names. Thus >>>they are all serialized in the default graph. It's not a problem with >>>the store in-memory. The insert creates correct graph with data. >>>I've confirmed this occurs in all version since 1.0.0. >>> >>>Curiously only when data is loaded with a LOAD <x> INTO GRAPH <y> >>>command or with dNetRDF API, the store is serialized correctly. Only >>>INSERT DATA causes the problem. >>> >>>Is this a known problem? >>> >>>And by the way. Why INSERT DATA creates an empty default graph in the >>>store while loading or LOAD <x> only creates those graphs actually >>>included in the source files? >>> >>>Greets, >>>Tom >>> >>>------------------------------------------------------------------------ >>>-- >>>---- >>>HPCC Systems Open Source Big Data Platform from LexisNexis Risk >>>Solutions >>>Find What Matters Most in Your Big Data with HPCC Systems >>>Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. >>>Leverages Graph Analysis for Fast Processing & Easy Data Exploration >>>http://p.sf.net/sfu/hpccsystems >>>_______________________________________________ >>>dotNetRDF-bugs mailing list >>>dot...@li... >>>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> >> >> >> >> >> >>------------------------------------------------------------------------- >>----- >> HPCC Systems Open Source Big Data Platform from LexisNexis Risk >>Solutions >> Find What Matters Most in Your Big Data with HPCC Systems >> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. >> Leverages Graph Analysis for Fast Processing & Easy Data Exploration >> http://p.sf.net/sfu/hpccsystems >> _______________________________________________ >> dotNetRDF-bugs mailing list >> dot...@li... >> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > >-------------------------------------------------------------------------- >---- >HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions >Find What Matters Most in Your Big Data with HPCC Systems >Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. >Leverages Graph Analysis for Fast Processing & Easy Data Exploration >http://p.sf.net/sfu/hpccsystems >_______________________________________________ >dotNetRDF-bugs mailing list >dot...@li... >https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |
From: Tomasz P. <tom...@gm...> - 2014-06-16 15:56:39
|
On Mon, Jun 16, 2014 at 4:55 PM, Rob Vesse <rv...@do...> wrote: > Which version is this? I'm assuming pre 1.0.4 or lower? > No, I got that behaviour in 1.0.5.x as well. > > Since you are talking about Store Manager then this sounds like TOOLS-409 > which was already reported and fixed for the 1.0.5 release. It was a bug > in how Store Manager passed the data to the underlying writers. > This isn't only the Store Manager but also instances of TripleStore. When LoadFromFile() of LOAD <x> is used it's fine. But after updating a fresh store with INSERT DATA, wrong dataset will be serialized. > > Note that serializing an in-memory store directly in code was not affected > in any way. > > The creation of the empty default graph is an implementation detail, any > SPARQL Update which inserts data potentially causes the default graph to > be created because the SPARQL specification states that a dataset always > contains an unnamed default graph and some parts of the implementation > assume graphs will already exist so it is safer and faster to pre-create > any graphs that will potentially be affected. > > LOAD is a special case because it is really just a shim to the parser > sub-system and the way the parser sub-system generates data means that > only specifically mentioned graphs are ever created. > > Rob > > On 16/06/2014 15:14, "Tomasz Pluskiewicz" <tom...@gm...> > wrote: > >>Hi >> >>We've noticed weird behaviour with in-memory triple store, when >>serializing to NQuads. Here's what happens: >> >>1. Create an empty TripleStore >>2. Run UPDATE >> >>INSERT DATA { >>GRAPH <http://test.org/user> { >><http://test.org/user> >><http://www.w3.org/1999/02/22-rdf-syntax-ns#type> >><http://schema.org/Person> . >><http://test.org/user> <http://some/ontology/favorite> >><http://test.org/product/name> . >>} >>GRAPH <http://test.org/prodList/> { >><http://test.org/user> <http://xmlns.com/foaf/0.1/primaryTopic> >><http://test.org/user> . >>} >>} >> >>3. Serialize to NQuads. Store Manager correctly informs that 3 triple >>were serialized in 3 graphs (including empty default graph). >> >>The output file contains all triples, but without graph names. Thus >>they are all serialized in the default graph. It's not a problem with >>the store in-memory. The insert creates correct graph with data. >>I've confirmed this occurs in all version since 1.0.0. >> >>Curiously only when data is loaded with a LOAD <x> INTO GRAPH <y> >>command or with dNetRDF API, the store is serialized correctly. Only >>INSERT DATA causes the problem. >> >>Is this a known problem? >> >>And by the way. Why INSERT DATA creates an empty default graph in the >>store while loading or LOAD <x> only creates those graphs actually >>included in the source files? >> >>Greets, >>Tom >> >>-------------------------------------------------------------------------- >>---- >>HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions >>Find What Matters Most in Your Big Data with HPCC Systems >>Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. >>Leverages Graph Analysis for Fast Processing & Easy Data Exploration >>http://p.sf.net/sfu/hpccsystems >>_______________________________________________ >>dotNetRDF-bugs mailing list >>dot...@li... >>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > > > > > ------------------------------------------------------------------------------ > HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions > Find What Matters Most in Your Big Data with HPCC Systems > Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. > Leverages Graph Analysis for Fast Processing & Easy Data Exploration > http://p.sf.net/sfu/hpccsystems > _______________________________________________ > dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |
From: Rob V. <rv...@do...> - 2014-06-16 14:56:20
|
Which version is this? I'm assuming pre 1.0.4 or lower? Since you are talking about Store Manager then this sounds like TOOLS-409 which was already reported and fixed for the 1.0.5 release. It was a bug in how Store Manager passed the data to the underlying writers. Note that serializing an in-memory store directly in code was not affected in any way. The creation of the empty default graph is an implementation detail, any SPARQL Update which inserts data potentially causes the default graph to be created because the SPARQL specification states that a dataset always contains an unnamed default graph and some parts of the implementation assume graphs will already exist so it is safer and faster to pre-create any graphs that will potentially be affected. LOAD is a special case because it is really just a shim to the parser sub-system and the way the parser sub-system generates data means that only specifically mentioned graphs are ever created. Rob On 16/06/2014 15:14, "Tomasz Pluskiewicz" <tom...@gm...> wrote: >Hi > >We've noticed weird behaviour with in-memory triple store, when >serializing to NQuads. Here's what happens: > >1. Create an empty TripleStore >2. Run UPDATE > >INSERT DATA { >GRAPH <http://test.org/user> { ><http://test.org/user> ><http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ><http://schema.org/Person> . ><http://test.org/user> <http://some/ontology/favorite> ><http://test.org/product/name> . >} >GRAPH <http://test.org/prodList/> { ><http://test.org/user> <http://xmlns.com/foaf/0.1/primaryTopic> ><http://test.org/user> . >} >} > >3. Serialize to NQuads. Store Manager correctly informs that 3 triple >were serialized in 3 graphs (including empty default graph). > >The output file contains all triples, but without graph names. Thus >they are all serialized in the default graph. It's not a problem with >the store in-memory. The insert creates correct graph with data. >I've confirmed this occurs in all version since 1.0.0. > >Curiously only when data is loaded with a LOAD <x> INTO GRAPH <y> >command or with dNetRDF API, the store is serialized correctly. Only >INSERT DATA causes the problem. > >Is this a known problem? > >And by the way. Why INSERT DATA creates an empty default graph in the >store while loading or LOAD <x> only creates those graphs actually >included in the source files? > >Greets, >Tom > >-------------------------------------------------------------------------- >---- >HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions >Find What Matters Most in Your Big Data with HPCC Systems >Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. >Leverages Graph Analysis for Fast Processing & Easy Data Exploration >http://p.sf.net/sfu/hpccsystems >_______________________________________________ >dotNetRDF-bugs mailing list >dot...@li... >https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |
From: Tomasz P. <tom...@gm...> - 2014-06-16 14:15:14
|
Hi We've noticed weird behaviour with in-memory triple store, when serializing to NQuads. Here's what happens: 1. Create an empty TripleStore 2. Run UPDATE INSERT DATA { GRAPH <http://test.org/user> { <http://test.org/user> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person> . <http://test.org/user> <http://some/ontology/favorite> <http://test.org/product/name> . } GRAPH <http://test.org/prodList/> { <http://test.org/user> <http://xmlns.com/foaf/0.1/primaryTopic> <http://test.org/user> . } } 3. Serialize to NQuads. Store Manager correctly informs that 3 triple were serialized in 3 graphs (including empty default graph). The output file contains all triples, but without graph names. Thus they are all serialized in the default graph. It's not a problem with the store in-memory. The insert creates correct graph with data. I've confirmed this occurs in all version since 1.0.0. Curiously only when data is loaded with a LOAD <x> INTO GRAPH <y> command or with dNetRDF API, the store is serialized correctly. Only INSERT DATA causes the problem. Is this a known problem? And by the way. Why INSERT DATA creates an empty default graph in the store while loading or LOAD <x> only creates those graphs actually included in the source files? Greets, Tom |
From: Rob V. <rv...@do...> - 2014-06-03 08:22:44
|
Hey All As some of you will have already noticed I put out the 1.0.5 release on Friday, see the blog post at http://www.dotnetrdf.org/blogitem.asp?blogID=82 for a brief overview of the release The major change in this release is to our dependencies, we have upgraded to the 6.x line of Json.Net releases and as part of this have revised our PCL offering to be Profile 136 and removed our standalone Silverlight 4 and Windows Phone 7 builds. The release also includes a variety of bug fixes to various bug reports around SPARQL corner cases and improving compatibility with some third part triple stores, thanks as always to everyone in the community for reporting and contributing to these improvements. Regards, Rob |
From: Rob V. <rv...@do...> - 2014-05-27 08:44:30
|
No I didn't look at that at all, the other bugs with GRAPH and sub-query execution meant the results were incorrect anyway so I didn't attempt to look into what effect the ORDER BY has as well since ORDER BY wasn't necessary to reproduce the poor performance I cut a 1.0.5 release on Friday so if you still experience issues with ORDER BY please file a new bug that describes that issue Note that 1.0.5 will cause results for some queries to change because of the fixes to execution of GRAPH clauses and sub-queries. Rob On 23/05/2014 13:49, "Tomek Pluskiewicz" <to...@pl...> wrote: >Thanks. I'm always equally impressed with the speed and efficiency! > >Any idea though why the ORDER BY is required for the query to return >correct results reliably? > >We're good with 1.0.3 for now so you need not rush. > >Cheers, >Tom > >On May 22, 2014 5:53 PM, "Rob Vesse" <rv...@do...> wrote: >> >> Ah, I think I see what the problem is (well there's two in fact) >> >> One is that the sub-query is getting scheduled too early in the query >>which I have fixed >> >> The other I have just found was likely introduced by a commit that went >>into 1.0.4 hence why I was asking if this was a regression from 1.0.3. >>It relates to algebra generation and means we're potentially executing >>the graph clause too many times. This is probably gonna be a little >>tricker to fix but I will aim to have it fixed for 1.0.5 and try and get >>you a pre-release build with a fix as soon as I can >> >> Rob >> >> From: Tomek Pluskiewicz <to...@pl...> >> Reply-To: dotNetRDF Bug Report tracking and resolution >><dot...@li...> >> Date: Thursday, 22 May 2014 16:18 >> To: dotNetRDF Bug Report tracking and resolution >><dot...@li...> >> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >> >> I tried 1.0.4 and 1.0.5-pre2 and both are equally slow. >> >> Tom >> >> On May 22, 2014 4:40 PM, "Rob Vesse" <rv...@do...> wrote: >>> >>> Tom >>> >>> Are you saying that performance is substantially worse with 1.0.4 >>>versus 1.0.3 or the performance is just as bad across all recent >>>releases? >>> >>> Rob >>> >>> From: Tomasz Pluskiewicz <tom...@gm...> >>> Reply-To: dotNetRDF Bug Report tracking and resolution >>><dot...@li...> >>> Date: Thursday, 22 May 2014 14:48 >>> To: dotNetRDF Bug Report tracking and resolution >>><dot...@li...> >>> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >>> >>> Rob, thanks for responding. >>> >>> Always +1 for additional diagnostic tools (I mean the ExplainProcessor >>>enhancement). >>> >>> I've been fiddling with our query and the ?s ?p ?o pattern seems to >>>have little but noticeable impact on the synthetic dataset. But indeed >>>moving the subquery as-is outside the first GRAPH ?var boosts the query >>>by an order of magnitude. I've also tried to remove the duplicate >>>triple patterns on both GRAPH ?v patterns but it doesn't help much >>>either. Interestingly a query which combines subquery moved, ?s ?p ?o >>>extracted and duplicate triple patters removed is significantly slower >>>then the one with just subquery moved outside the GRAPH ?var. >>> >>> I've ran all kinds of queries against our real-life data (20k quads in >>>over 900 graphs) and the conclusions are the same. Moving subquery and >>>?s ?p ?o graph pattern gives best results. >>> >>> Regarding the ORDER BY it still seems like a bug. I wanted to blame >>>inconsistent results on the fact that the subquery is nested inside the >>>GRAPH ?var but with the subquery moved I observe the same bahaviour. >>> >>> All the above is true for 1.0.3. Now regarding 1.0.4+ there are >>>additional problems as I wrote yesterday. With the real-life data the >>>original query takes over 2.5 minutes to complete, while in previous >>>version only about a quarter of a second is needed! The optimized >>>queries actually took so long that I never had them finished. >>> >>> Tom >>> >>> >>> On Wed, May 21, 2014 at 3:47 PM, Rob Vesse <rv...@do...> >>>wrote: >>>> >>>> Tom >>>> >>>> Thanks for the report, I haven't done any debugging yet but I have a >>>>few thoughts based on what you've described >>>> >>>> ORDER BY causing indeterminate results could be a bug but it also >>>>could just be an artefact of two things: >>>> >>>> SPARQL only defines a partial ordering so there are some combinations >>>>of terms for which ordering is left to the implementation though since >>>>we're just talking about dotNetRDF such indeterminate orderings should >>>>be defined consistently >>>> That you have multiple terms in the data that compare to be >>>>equivalent, in this case we're at the mercy of .Net's sort >>>>implementation for which items float to the top and so are returned >>>>each time >>>> >>>> GRAPH ?var can be quite expensive because what it does is evaluate >>>>the inner operations over each individual named graph in the dataset >>>>in turn. Where ?var is already bound this might be a small subset but >>>>given the structure of your query I suspect there are at least some >>>>places where this is happening. So with two points in your query >>>>where you have GRAPH ?var being potentially unbound (or bound to a >>>>large number of possible values) you would get the O(n2) exponential >>>>scaling behaviour you describe >>>> >>>> Also the ?s ?p ?o in the start of your first GRAPH clause may be >>>>causing a substantial increase in intermediate results early on in the >>>>query. It might be better to have a separate GRAPH clause after the >>>>first GRAPH clause to pull out all the triples once you've determined >>>>the graphs you actually care about. >>>> >>>> There is of course a possibility that dotNetRDF is optimising the >>>>query badly but that will require some debugging to figure out if this >>>>is the case. >>>> >>>> Using the ExplainQueryProcessor >>>>(http://www.dotnetrdf.org/api/index.asp?Topic=VDS.RDF.Query.ExplainQuer >>>>yProcessor) with the ExplanationLevel turned up to Full as described >>>>at >>>>https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/HowTo/Debug%20SPARQL%20Q >>>>ueries.wiki#!debugging-sparql-queries might be enlightening since >>>>it'll include things like intermediate result count. Though it >>>>doesn't currently analyse how many graphs a given GRAPH clause has to >>>>consider which it'll make it hard to spot that exponential looping on >>>>GRAPH ?var if that is the culprit, that would certainly be interesting >>>>information so I may try and add that in the future. >>>> >>>> Let me know if you guys figure anything more out, I'll aim to take a >>>>proper look and debug this later in the week >>>> >>>> Cheers, >>>> >>>> Rob >>>> >>>> From: Tomek Pluskiewicz <to...@pl...> >>>> Reply-To: dotNetRDF Bug Report tracking and resolution >>>><dot...@li...> >>>> Date: Wednesday, 21 May 2014 13:46 >>>> To: dotNetRDF Bug Report tracking and resolution >>>><dot...@li...> >>>> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >>>> >>>> Also, here's a test repo https://bitbucket.org/tpluscode/sparql-test >>>> >>>> >>>> On Wed, May 21, 2014 at 2:18 PM, Tomek Pluskiewicz >>>><to...@pl...> wrote: >>>>> >>>>> Hi Rob >>>>> >>>>> We've developing a ORM solution complete with Linq for some time >>>>>now. Will be open source'd at some point. Currently we've been >>>>>experiencing problems with query speed and reliability. Let me >>>>>acquaint you with how things work. >>>>> >>>>> Each resource is contained within its own named graph and >>>>>additionally there is a meta-graph, which connects graphs and the >>>>>described entities (there could be many graphs for one resource). For >>>>>example >>>>> >>>>> # meta graph >>>>> <http://foo.com/productList/> >>>>> { >>>>> ex:Wrench1 foaf:primaryTopic ex:Wrench1 . >>>>> } >>>>> >>>>> # wrench >>>>> ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . } >>>>> >>>>> The problem is with a query >>>>> >>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >>>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >>>>> PREFIX schema: <http://schema.org/> >>>>> PREFIX foaf: <http://xmlns.com/foaf/0.1/> >>>>> >>>>> SELECT ?s ?p ?o ?Gp0 ?p0 >>>>> WHERE >>>>> { >>>>> GRAPH ?Gp0 >>>>> { >>>>> ?s ?p ?o . >>>>> ?p0_sub schema:name ?name0_sub . >>>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>>>> ?p0 rdf:type schema:Product . >>>>> { >>>>> SELECT DISTINCT ?p0_sub >>>>> WHERE >>>>> { >>>>> GRAPH ?Gp0_sub >>>>> { >>>>> ?p0_sub rdf:type schema:Product . >>>>> ?p0_sub schema:name ?name0_sub . >>>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>>>> } >>>>> GRAPH <http://foo.com/productList/> >>>>> { >>>>> ?Gp0_sub foaf:primaryTopic ?p0_sub . >>>>> } >>>>> } >>>>> #ORDER BY ?p0_sub >>>>> LIMIT 2 >>>>> } >>>>> FILTER(?p0_sub=?p0) >>>>> } >>>>> >>>>> GRAPH <http://foo.com/productList/> >>>>> { >>>>> ?Gp0 foaf:primaryTopic ?p0 . >>>>> } >>>>> } >>>>> >>>>> transformed from the following Linq >>>>> >>>>> Query<IProduct>().Where(p => >>>>>p.Name.ToUpper().Contains(name.ToUpper())).Take(2) >>>>> >>>>> There are two problems here. The query returns different results on >>>>>subsequent runs against the same dataset and it runs very slow. >>>>>Uncommenting the ORDER BY helps with the varying result count though >>>>>I'm not exactly sure why it should be necessary. However I'm not sure >>>>>what's with performance. Obviously it has something to do with the >>>>>subquery but I was unable to alter this SELECT so that it executed >>>>>quickly. Even as small a dataset as 9 quads (3 resources * (2 triples >>>>>+ 1 meta-triple)) takes 1 second to complete and the time seems to >>>>>increase exponentially. At 90 quads/30 graphs it is already taking >>>>>close to 3 minutes. >>>>> >>>>> We've first observed the performance problems with version 1.0.4 but >>>>>with a synthetic dataset the same issues arise in previous releases >>>>>and 1.0.5+. >>>>> >>>>> Hope you can help. Would you like any additional info? >>>>> >>>>> Regards, >>>>> Tom >>>> >>>> >>>> >>>>----------------------------------------------------------------------- >>>>------- "Accelerate Dev Cycles with Automated Cross-Browser Testing - >>>>For FREE Instantly run your Selenium tests across 300+ browser/OS >>>>combos. Get unparalleled scalability from the best Selenium testing >>>>platform available Simple to use. Nothing to install. Get started now >>>>for free." >>>>http://p.sf.net/sfu/SauceLabs__________________________________________ >>>>_____ dotNetRDF-bugs mailing list >>>>dot...@li...https://lists.sourceforge.net/lists >>>>/listinfo/dotnetrdf-bugs >>>> >>>> >>>> >>>>----------------------------------------------------------------------- >>>>------- >>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>> Instantly run your Selenium tests across 300+ browser/OS combos. >>>> Get unparalleled scalability from the best Selenium testing platform >>>>available >>>> Simple to use. Nothing to install. Get started now for free." >>>> http://p.sf.net/sfu/SauceLabs >>>> _______________________________________________ >>>> dotNetRDF-bugs mailing list >>>> dot...@li... >>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>> >>> >>> >>>------------------------------------------------------------------------ >>>------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - >>>For FREE Instantly run your Selenium tests across 300+ browser/OS >>>combos. Get unparalleled scalability from the best Selenium testing >>>platform available Simple to use. Nothing to install. Get started now >>>for free." >>>http://p.sf.net/sfu/SauceLabs___________________________________________ >>>____ dotNetRDF-bugs mailing list >>>dot...@li...https://lists.sourceforge.net/lists/ >>>listinfo/dotnetrdf-bugs >>> >>> >>> >>>------------------------------------------------------------------------ >>>------ >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>> Instantly run your Selenium tests across 300+ browser/OS combos. >>> Get unparalleled scalability from the best Selenium testing platform >>>available >>> Simple to use. Nothing to install. Get started now for free." >>> http://p.sf.net/sfu/SauceLabs >>> _______________________________________________ >>> dotNetRDF-bugs mailing list >>> dot...@li... >>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>> >> >>------------------------------------------------------------------------- >>----- "Accelerate Dev Cycles with Automated Cross-Browser Testing - For >>FREE Instantly run your Selenium tests across 300+ browser/OS combos. >>Get unparalleled scalability from the best Selenium testing platform >>available Simple to use. Nothing to install. Get started now for free." >>http://p.sf.net/sfu/SauceLabs____________________________________________ >>___ dotNetRDF-bugs mailing list dot...@li... >>https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> >> >> >>------------------------------------------------------------------------- >>----- >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >>available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> dotNetRDF-bugs mailing list >> dot...@li... >> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> > >-------------------------------------------------------------------------- >---- >"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >Instantly run your Selenium tests across 300+ browser/OS combos. >Get unparalleled scalability from the best Selenium testing platform >available >Simple to use. Nothing to install. Get started now for free." >http://p.sf.net/sfu/SauceLabs >_______________________________________________ >dotNetRDF-bugs mailing list >dot...@li... >https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |
From: Tomek P. <to...@pl...> - 2014-05-23 12:50:09
|
Thanks. I'm always equally impressed with the speed and efficiency! Any idea though why the ORDER BY is required for the query to return correct results reliably? We're good with 1.0.3 for now so you need not rush. Cheers, Tom On May 22, 2014 5:53 PM, "Rob Vesse" <rv...@do...> wrote: > > Ah, I think I see what the problem is (well there's two in fact) > > One is that the sub-query is getting scheduled too early in the query which I have fixed > > The other I have just found was likely introduced by a commit that went into 1.0.4 hence why I was asking if this was a regression from 1.0.3. It relates to algebra generation and means we're potentially executing the graph clause too many times. This is probably gonna be a little tricker to fix but I will aim to have it fixed for 1.0.5 and try and get you a pre-release build with a fix as soon as I can > > Rob > > From: Tomek Pluskiewicz <to...@pl...> > Reply-To: dotNetRDF Bug Report tracking and resolution <dot...@li...> > Date: Thursday, 22 May 2014 16:18 > To: dotNetRDF Bug Report tracking and resolution <dot...@li...> > Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries > > I tried 1.0.4 and 1.0.5-pre2 and both are equally slow. > > Tom > > On May 22, 2014 4:40 PM, "Rob Vesse" <rv...@do...> wrote: >> >> Tom >> >> Are you saying that performance is substantially worse with 1.0.4 versus 1.0.3 or the performance is just as bad across all recent releases? >> >> Rob >> >> From: Tomasz Pluskiewicz <tom...@gm...> >> Reply-To: dotNetRDF Bug Report tracking and resolution <dot...@li...> >> Date: Thursday, 22 May 2014 14:48 >> To: dotNetRDF Bug Report tracking and resolution <dot...@li...> >> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >> >> Rob, thanks for responding. >> >> Always +1 for additional diagnostic tools (I mean the ExplainProcessor enhancement). >> >> I've been fiddling with our query and the ?s ?p ?o pattern seems to have little but noticeable impact on the synthetic dataset. But indeed moving the subquery as-is outside the first GRAPH ?var boosts the query by an order of magnitude. I've also tried to remove the duplicate triple patterns on both GRAPH ?v patterns but it doesn't help much either. Interestingly a query which combines subquery moved, ?s ?p ?o extracted and duplicate triple patters removed is significantly slower then the one with just subquery moved outside the GRAPH ?var. >> >> I've ran all kinds of queries against our real-life data (20k quads in over 900 graphs) and the conclusions are the same. Moving subquery and ?s ?p ?o graph pattern gives best results. >> >> Regarding the ORDER BY it still seems like a bug. I wanted to blame inconsistent results on the fact that the subquery is nested inside the GRAPH ?var but with the subquery moved I observe the same bahaviour. >> >> All the above is true for 1.0.3. Now regarding 1.0.4+ there are additional problems as I wrote yesterday. With the real-life data the original query takes over 2.5 minutes to complete, while in previous version only about a quarter of a second is needed! The optimized queries actually took so long that I never had them finished. >> >> Tom >> >> >> On Wed, May 21, 2014 at 3:47 PM, Rob Vesse <rv...@do...> wrote: >>> >>> Tom >>> >>> Thanks for the report, I haven't done any debugging yet but I have a few thoughts based on what you've described >>> >>> ORDER BY causing indeterminate results could be a bug but it also could just be an artefact of two things: >>> >>> SPARQL only defines a partial ordering so there are some combinations of terms for which ordering is left to the implementation though since we're just talking about dotNetRDF such indeterminate orderings should be defined consistently >>> That you have multiple terms in the data that compare to be equivalent, in this case we're at the mercy of .Net's sort implementation for which items float to the top and so are returned each time >>> >>> GRAPH ?var can be quite expensive because what it does is evaluate the inner operations over each individual named graph in the dataset in turn. Where ?var is already bound this might be a small subset but given the structure of your query I suspect there are at least some places where this is happening. So with two points in your query where you have GRAPH ?var being potentially unbound (or bound to a large number of possible values) you would get the O(n2) exponential scaling behaviour you describe >>> >>> Also the ?s ?p ?o in the start of your first GRAPH clause may be causing a substantial increase in intermediate results early on in the query. It might be better to have a separate GRAPH clause after the first GRAPH clause to pull out all the triples once you've determined the graphs you actually care about. >>> >>> There is of course a possibility that dotNetRDF is optimising the query badly but that will require some debugging to figure out if this is the case. >>> >>> Using the ExplainQueryProcessor (http://www.dotnetrdf.org/api/index.asp?Topic=VDS.RDF.Query.ExplainQueryProcessor) with the ExplanationLevel turned up to Full as described at https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/HowTo/Debug%20SPARQL%20Queries.wiki#!debugging-sparql-queries might be enlightening since it'll include things like intermediate result count. Though it doesn't currently analyse how many graphs a given GRAPH clause has to consider which it'll make it hard to spot that exponential looping on GRAPH ?var if that is the culprit, that would certainly be interesting information so I may try and add that in the future. >>> >>> Let me know if you guys figure anything more out, I'll aim to take a proper look and debug this later in the week >>> >>> Cheers, >>> >>> Rob >>> >>> From: Tomek Pluskiewicz <to...@pl...> >>> Reply-To: dotNetRDF Bug Report tracking and resolution <dot...@li...> >>> Date: Wednesday, 21 May 2014 13:46 >>> To: dotNetRDF Bug Report tracking and resolution <dot...@li...> >>> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >>> >>> Also, here's a test repo https://bitbucket.org/tpluscode/sparql-test >>> >>> >>> On Wed, May 21, 2014 at 2:18 PM, Tomek Pluskiewicz <to...@pl...> wrote: >>>> >>>> Hi Rob >>>> >>>> We've developing a ORM solution complete with Linq for some time now. Will be open source'd at some point. Currently we've been experiencing problems with query speed and reliability. Let me acquaint you with how things work. >>>> >>>> Each resource is contained within its own named graph and additionally there is a meta-graph, which connects graphs and the described entities (there could be many graphs for one resource). For example >>>> >>>> # meta graph >>>> <http://foo.com/productList/> >>>> { >>>> ex:Wrench1 foaf:primaryTopic ex:Wrench1 . >>>> } >>>> >>>> # wrench >>>> ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . } >>>> >>>> The problem is with a query >>>> >>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >>>> PREFIX schema: <http://schema.org/> >>>> PREFIX foaf: <http://xmlns.com/foaf/0.1/> >>>> >>>> SELECT ?s ?p ?o ?Gp0 ?p0 >>>> WHERE >>>> { >>>> GRAPH ?Gp0 >>>> { >>>> ?s ?p ?o . >>>> ?p0_sub schema:name ?name0_sub . >>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>>> ?p0 rdf:type schema:Product . >>>> { >>>> SELECT DISTINCT ?p0_sub >>>> WHERE >>>> { >>>> GRAPH ?Gp0_sub >>>> { >>>> ?p0_sub rdf:type schema:Product . >>>> ?p0_sub schema:name ?name0_sub . >>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>>> } >>>> GRAPH <http://foo.com/productList/> >>>> { >>>> ?Gp0_sub foaf:primaryTopic ?p0_sub . >>>> } >>>> } >>>> #ORDER BY ?p0_sub >>>> LIMIT 2 >>>> } >>>> FILTER(?p0_sub=?p0) >>>> } >>>> >>>> GRAPH <http://foo.com/productList/> >>>> { >>>> ?Gp0 foaf:primaryTopic ?p0 . >>>> } >>>> } >>>> >>>> transformed from the following Linq >>>> >>>> Query<IProduct>().Where(p => p.Name.ToUpper().Contains(name.ToUpper())).Take(2) >>>> >>>> There are two problems here. The query returns different results on subsequent runs against the same dataset and it runs very slow. Uncommenting the ORDER BY helps with the varying result count though I'm not exactly sure why it should be necessary. However I'm not sure what's with performance. Obviously it has something to do with the subquery but I was unable to alter this SELECT so that it executed quickly. Even as small a dataset as 9 quads (3 resources * (2 triples + 1 meta-triple)) takes 1 second to complete and the time seems to increase exponentially. At 90 quads/30 graphs it is already taking close to 3 minutes. >>>> >>>> We've first observed the performance problems with version 1.0.4 but with a synthetic dataset the same issues arise in previous releases and 1.0.5+. >>>> >>>> Hope you can help. Would you like any additional info? >>>> >>>> Regards, >>>> Tom >>> >>> >>> ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs_______________________________________________ dotNetRDF-bugs mailing list dot...@li...https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>> >>> >>> ------------------------------------------------------------------------------ >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>> Instantly run your Selenium tests across 300+ browser/OS combos. >>> Get unparalleled scalability from the best Selenium testing platform available >>> Simple to use. Nothing to install. Get started now for free." >>> http://p.sf.net/sfu/SauceLabs >>> _______________________________________________ >>> dotNetRDF-bugs mailing list >>> dot...@li... >>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>> >> >> ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs_______________________________________________ dotNetRDF-bugs mailing list dot...@li...https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> dotNetRDF-bugs mailing list >> dot...@li... >> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> > ------------------------------------------------------------------------------ "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free." http://p.sf.net/sfu/SauceLabs_______________________________________________ dotNetRDF-bugs mailing list dot...@li... https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > |
From: Rob V. <rv...@do...> - 2014-05-22 15:53:50
|
Ah, I think I see what the problem is (well there's two in fact) One is that the sub-query is getting scheduled too early in the query which I have fixed The other I have just found was likely introduced by a commit that went into 1.0.4 hence why I was asking if this was a regression from 1.0.3. It relates to algebra generation and means we're potentially executing the graph clause too many times. This is probably gonna be a little tricker to fix but I will aim to have it fixed for 1.0.5 and try and get you a pre-release build with a fix as soon as I can Rob From: Tomek Pluskiewicz <to...@pl...> Reply-To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Date: Thursday, 22 May 2014 16:18 To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries > > I tried 1.0.4 and 1.0.5-pre2 and both are equally slow. > > Tom > > On May 22, 2014 4:40 PM, "Rob Vesse" <rv...@do...> wrote: >> Tom >> >> Are you saying that performance is substantially worse with 1.0.4 versus >> 1.0.3 or the performance is just as bad across all recent releases? >> >> Rob >> >> From: Tomasz Pluskiewicz <tom...@gm...> >> Reply-To: dotNetRDF Bug Report tracking and resolution >> <dot...@li...> >> Date: Thursday, 22 May 2014 14:48 >> To: dotNetRDF Bug Report tracking and resolution >> <dot...@li...> >> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >> >>> Rob, thanks for responding. >>> >>> Always +1 for additional diagnostic tools (I mean the ExplainProcessor >>> enhancement). >>> >>> I've been fiddling with our query and the ?s ?p ?o pattern seems to have >>> little but noticeable impact on the synthetic dataset. But indeed moving the >>> subquery as-is outside the first GRAPH ?var boosts the query by an order of >>> magnitude. I've also tried to remove the duplicate triple patterns on both >>> GRAPH ?v patterns but it doesn't help much either. Interestingly a query >>> which combines subquery moved, ?s ?p ?o extracted and duplicate triple >>> patters removed is significantly slower then the one with just subquery >>> moved outside the GRAPH ?var. >>> >>> I've ran all kinds of queries against our real-life data (20k quads in over >>> 900 graphs) and the conclusions are the same. Moving subquery and ?s ?p ?o >>> graph pattern gives best results. >>> >>> Regarding the ORDER BY it still seems like a bug. I wanted to blame >>> inconsistent results on the fact that the subquery is nested inside the >>> GRAPH ?var but with the subquery moved I observe the same bahaviour. >>> >>> All the above is true for 1.0.3. Now regarding 1.0.4+ there are additional >>> problems as I wrote yesterday. With the real-life data the original query >>> takes over 2.5 minutes to complete, while in previous version only about a >>> quarter of a second is needed! The optimized queries actually took so long >>> that I never had them finished. >>> >>> Tom >>> >>> >>> On Wed, May 21, 2014 at 3:47 PM, Rob Vesse <rv...@do...> wrote: >>>> Tom >>>> >>>> Thanks for the report, I haven't done any debugging yet but I have a few >>>> thoughts based on what you've described >>>> >>>> ORDER BY causing indeterminate results could be a bug but it also could >>>> just be an artefact of two things: >>>> 1. SPARQL only defines a partial ordering so there are some combinations of >>>> terms for which ordering is left to the implementation though since we're >>>> just talking about dotNetRDF such indeterminate orderings should be defined >>>> consistently >>>> 2. That you have multiple terms in the data that compare to be equivalent, >>>> in this case we're at the mercy of .Net's sort implementation for which >>>> items float to the top and so are returned each time >>>> GRAPH ?var can be quite expensive because what it does is evaluate the >>>> inner operations over each individual named graph in the dataset in turn. >>>> Where ?var is already bound this might be a small subset but given the >>>> structure of your query I suspect there are at least some places where this >>>> is happening. So with two points in your query where you have GRAPH ?var >>>> being potentially unbound (or bound to a large number of possible values) >>>> you would get the O(n2) exponential scaling behaviour you describe >>>> >>>> Also the ?s ?p ?o in the start of your first GRAPH clause may be causing a >>>> substantial increase in intermediate results early on in the query. It >>>> might be better to have a separate GRAPH clause after the first GRAPH >>>> clause to pull out all the triples once you've determined the graphs you >>>> actually care about. >>>> >>>> There is of course a possibility that dotNetRDF is optimising the query >>>> badly but that will require some debugging to figure out if this is the >>>> case. >>>> >>>> Using the ExplainQueryProcessor >>>> (http://www.dotnetrdf.org/api/index.asp?Topic=VDS.RDF.Query.ExplainQueryPro >>>> cessor) with the ExplanationLevel turned up to Full as described at >>>> https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/HowTo/Debug%20SPARQL%20Queri >>>> es.wiki#!debugging-sparql-queries might be enlightening since it'll include >>>> things like intermediate result count. Though it doesn't currently analyse >>>> how many graphs a given GRAPH clause has to consider which it'll make it >>>> hard to spot that exponential looping on GRAPH ?var if that is the culprit, >>>> that would certainly be interesting information so I may try and add that >>>> in the future. >>>> >>>> Let me know if you guys figure anything more out, I'll aim to take a proper >>>> look and debug this later in the week >>>> >>>> Cheers, >>>> >>>> Rob >>>> >>>> From: Tomek Pluskiewicz <to...@pl...> >>>> Reply-To: dotNetRDF Bug Report tracking and resolution >>>> <dot...@li...> >>>> Date: Wednesday, 21 May 2014 13:46 >>>> To: dotNetRDF Bug Report tracking and resolution >>>> <dot...@li...> >>>> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >>>> >>>>> Also, here's a test repo https://bitbucket.org/tpluscode/sparql-test >>>>> >>>>> >>>>> On Wed, May 21, 2014 at 2:18 PM, Tomek Pluskiewicz <to...@pl...> >>>>> wrote: >>>>>> Hi Rob >>>>>> >>>>>> We've developing a ORM solution complete with Linq for some time now. >>>>>> Will be open source'd at some point. Currently we've been experiencing >>>>>> problems with query speed and reliability. Let me acquaint you with how >>>>>> things work. >>>>>> >>>>>> Each resource is contained within its own named graph and additionally >>>>>> there is a meta-graph, which connects graphs and the described entities >>>>>> (there could be many graphs for one resource). For example >>>>>> >>>>>> # meta graph >>>>>> <http://foo.com/productList/> >>>>>> { >>>>>> ex:Wrench1 foaf:primaryTopic ex:Wrench1 . >>>>>> } >>>>>> >>>>>> # wrench >>>>>> ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . } >>>>>> >>>>>> The problem is with a query >>>>>> >>>>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >>>>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >>>>>> PREFIX schema: <http://schema.org/> >>>>>> PREFIX foaf: <http://xmlns.com/foaf/0.1/> >>>>>> >>>>>> SELECT ?s ?p ?o ?Gp0 ?p0 >>>>>> WHERE >>>>>> { >>>>>> GRAPH ?Gp0 >>>>>> { >>>>>> ?s ?p ?o . >>>>>> ?p0_sub schema:name ?name0_sub . >>>>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>>>>> ?p0 rdf:type schema:Product . >>>>>> { >>>>>> SELECT DISTINCT ?p0_sub >>>>>> WHERE >>>>>> { >>>>>> GRAPH ?Gp0_sub >>>>>> { >>>>>> ?p0_sub rdf:type schema:Product . >>>>>> ?p0_sub schema:name ?name0_sub . >>>>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>>>>> } >>>>>> GRAPH <http://foo.com/productList/> >>>>>> { >>>>>> ?Gp0_sub foaf:primaryTopic ?p0_sub . >>>>>> } >>>>>> } >>>>>> #ORDER BY ?p0_sub >>>>>> LIMIT 2 >>>>>> } >>>>>> FILTER(?p0_sub=?p0) >>>>>> } >>>>>> >>>>>> GRAPH <http://foo.com/productList/> >>>>>> { >>>>>> ?Gp0 foaf:primaryTopic ?p0 . >>>>>> } >>>>>> } >>>>>> >>>>>> transformed from the following Linq >>>>>> >>>>>> Query<IProduct>().Where(p => >>>>>> p.Name.ToUpper().Contains(name.ToUpper())).Take(2) >>>>>> >>>>>> There are two problems here. The query returns different results on >>>>>> subsequent runs against the same dataset and it runs very slow. >>>>>> Uncommenting the ORDER BY helps with the varying result count though I'm >>>>>> not exactly sure why it should be necessary. However I'm not sure what's >>>>>> with performance. Obviously it has something to do with the subquery but >>>>>> I was unable to alter this SELECT so that it executed quickly. Even as >>>>>> small a dataset as 9 quads (3 resources * (2 triples + 1 meta-triple)) >>>>>> takes 1 second to complete and the time seems to increase exponentially. >>>>>> At 90 quads/30 graphs it is already taking close to 3 minutes. >>>>>> >>>>>> We've first observed the performance problems with version 1.0.4 but with >>>>>> a synthetic dataset the same issues arise in previous releases and >>>>>> 1.0.5+. >>>>>> >>>>>> Hope you can help. Would you like any additional info? >>>>>> >>>>>> Regards, >>>>>> Tom >>>>> >>>>> -------------------------------------------------------------------------- >>>>> ---- "Accelerate Dev Cycles with Automated Cross-Browser Testing - For >>>>> FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get >>>>> unparalleled scalability from the best Selenium testing platform available >>>>> Simple to use. Nothing to install. Get started now for free." >>>>> http://p.sf.net/sfu/SauceLabs_____________________________________________ >>>>> __ dotNetRDF-bugs mailing list >>>>> dot...@li...https://lists.sourceforge.net/lists/li >>>>> stinfo/dotnetrdf-bugs >>>> >>>> --------------------------------------------------------------------------- >>>> --- >>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>> Instantly run your Selenium tests across 300+ browser/OS combos. >>>> Get unparalleled scalability from the best Selenium testing platform >>>> available >>>> Simple to use. Nothing to install. Get started now for free." >>>> http://p.sf.net/sfu/SauceLabs >>>> _______________________________________________ >>>> dotNetRDF-bugs mailing list >>>> dot...@li... >>>> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >>>> >>> >>> ---------------------------------------------------------------------------- >>> -- "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>> Instantly run your Selenium tests across 300+ browser/OS combos. Get >>> unparalleled scalability from the best Selenium testing platform available >>> Simple to use. Nothing to install. Get started now for free." >>> http://p.sf.net/sfu/SauceLabs_______________________________________________ >>> dotNetRDF-bugs mailing list >>> dot...@li...https://lists.sourceforge.net/lists/list >>> info/dotnetrdf-bugs >> >> ----------------------------------------------------------------------------->> - >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >> available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> dotNetRDF-bugs mailing list >> dot...@li... >> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________ > dotNetRDF-bugs mailing list dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |
From: Tomek P. <to...@pl...> - 2014-05-22 15:18:54
|
I tried 1.0.4 and 1.0.5-pre2 and both are equally slow. Tom On May 22, 2014 4:40 PM, "Rob Vesse" <rv...@do...> wrote: > Tom > > Are you saying that performance is substantially worse with 1.0.4 versus > 1.0.3 or the performance is just as bad across all recent releases? > > Rob > > From: Tomasz Pluskiewicz <tom...@gm...> > Reply-To: dotNetRDF Bug Report tracking and resolution < > dot...@li...> > Date: Thursday, 22 May 2014 14:48 > To: dotNetRDF Bug Report tracking and resolution < > dot...@li...> > Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries > > Rob, thanks for responding. > > Always +1 for additional diagnostic tools (I mean the ExplainProcessor > enhancement). > > I've been fiddling with our query and the ?s ?p ?o pattern seems to have > little but noticeable impact on the synthetic dataset. But indeed moving > the subquery as-is outside the first GRAPH ?var boosts the query by an > order of magnitude. I've also tried to remove the duplicate triple patterns > on both GRAPH ?v patterns but it doesn't help much either. Interestingly a > query which combines subquery moved, ?s ?p ?o extracted and duplicate > triple patters removed is significantly slower then the one with just > subquery moved outside the GRAPH ?var. > > I've ran all kinds of queries against our real-life data (20k quads in > over 900 graphs) and the conclusions are the same. Moving subquery and ?s > ?p ?o graph pattern gives best results. > > Regarding the ORDER BY it still seems like a bug. I wanted to blame > inconsistent results on the fact that the subquery is nested inside the > GRAPH ?var but with the subquery moved I observe the same bahaviour. > > All the above is true for 1.0.3. Now regarding 1.0.4+ there are additional > problems as I wrote yesterday. With the real-life data the original query > takes over 2.5 minutes to complete, while in previous version only about a > quarter of a second is needed! The optimized queries actually took so long > that I never had them finished. > > Tom > > > On Wed, May 21, 2014 at 3:47 PM, Rob Vesse <rv...@do...> wrote: > >> Tom >> >> Thanks for the report, I haven't done any debugging yet but I have a few >> thoughts based on what you've described >> >> ORDER BY causing indeterminate results could be a bug but it also could >> just be an artefact of two things: >> >> 1. SPARQL only defines a partial ordering so there are some >> combinations of terms for which ordering is left to the implementation >> though since we're just talking about dotNetRDF such indeterminate >> orderings should be defined consistently >> 2. That you have multiple terms in the data that compare to be >> equivalent, in this case we're at the mercy of .Net's sort implementation >> for which items float to the top and so are returned each time >> >> GRAPH ?var can be quite expensive because what it does is evaluate the >> inner operations over each individual named graph in the dataset in turn. >> Where ?var is already bound this might be a small subset but given the >> structure of your query I suspect there are at least some places where this >> is happening. So with two points in your query where you have GRAPH ?var >> being potentially unbound (or bound to a large number of possible values) >> you would get the O(n2) exponential scaling behaviour you describe >> >> Also the ?s ?p ?o in the start of your first GRAPH clause may be causing >> a substantial increase in intermediate results early on in the query. It >> might be better to have a separate GRAPH clause after the first GRAPH >> clause to pull out all the triples once you've determined the graphs you >> actually care about. >> >> There is of course a possibility that dotNetRDF is optimising the query >> badly but that will require some debugging to figure out if this is the >> case. >> >> Using the ExplainQueryProcessor ( >> http://www.dotnetrdf.org/api/index.asp?Topic=VDS.RDF.Query.ExplainQueryProcessor) >> with the ExplanationLevel turned up to Full as described at >> https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/HowTo/Debug%20SPARQL%20Queries.wiki#!debugging-sparql-queries might >> be enlightening since it'll include things like intermediate result count. >> Though it doesn't currently analyse how many graphs a given GRAPH clause >> has to consider which it'll make it hard to spot that exponential looping >> on GRAPH ?var if that is the culprit, that would certainly be interesting >> information so I may try and add that in the future. >> >> Let me know if you guys figure anything more out, I'll aim to take a >> proper look and debug this later in the week >> >> Cheers, >> >> Rob >> >> From: Tomek Pluskiewicz <to...@pl...> >> Reply-To: dotNetRDF Bug Report tracking and resolution < >> dot...@li...> >> Date: Wednesday, 21 May 2014 13:46 >> To: dotNetRDF Bug Report tracking and resolution < >> dot...@li...> >> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >> >> Also, here's a test repo https://bitbucket.org/tpluscode/sparql-test >> >> >> On Wed, May 21, 2014 at 2:18 PM, Tomek Pluskiewicz <to...@pl... >> > wrote: >> >>> Hi Rob >>> >>> We've developing a ORM solution complete with Linq for some time now. >>> Will be open source'd at some point. Currently we've been experiencing >>> problems with query speed and reliability. Let me acquaint you with how >>> things work. >>> >>> Each resource is contained within its own named graph and additionally >>> there is a meta-graph, which connects graphs and the described entities >>> (there could be many graphs for one resource). For example >>> >>> # meta graph >>> <http://foo.com/productList/> >>> { >>> ex:Wrench1 foaf:primaryTopic ex:Wrench1 . >>> } >>> >>> # wrench >>> ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . } >>> >>> The problem is with a query >>> >>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >>> PREFIX schema: <http://schema.org/> >>> PREFIX foaf: <http://xmlns.com/foaf/0.1/> >>> >>> SELECT ?s ?p ?o ?Gp0 ?p0 >>> WHERE >>> { >>> GRAPH ?Gp0 >>> { >>> ?s ?p ?o . >>> ?p0_sub schema:name ?name0_sub . >>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>> ?p0 rdf:type schema:Product . >>> { >>> SELECT DISTINCT ?p0_sub >>> WHERE >>> { >>> GRAPH ?Gp0_sub >>> { >>> ?p0_sub rdf:type schema:Product . >>> ?p0_sub schema:name ?name0_sub . >>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>> } >>> GRAPH <http://foo.com/productList/> >>> { >>> ?Gp0_sub foaf:primaryTopic ?p0_sub . >>> } >>> } >>> #ORDER BY ?p0_sub >>> LIMIT 2 >>> } >>> FILTER(?p0_sub=?p0) >>> } >>> >>> GRAPH <http://foo.com/productList/> >>> { >>> ?Gp0 foaf:primaryTopic ?p0 . >>> } >>> } >>> >>> transformed from the following Linq >>> >>> Query<IProduct>().Where(p => >>> p.Name.ToUpper().Contains(name.ToUpper())).Take(2) >>> >>> There are two problems here. The query returns different results on >>> subsequent runs against the same dataset and it runs very slow. >>> Uncommenting the ORDER BY helps with the varying result count though >>> I'm not exactly sure why it should be necessary. However I'm not sure >>> what's with performance. Obviously it has something to do with the subquery >>> but I was unable to alter this SELECT so that it executed quickly. Even >>> as small a dataset as 9 quads (3 resources * (2 triples + 1 meta-triple)) >>> takes 1 second to complete and the time seems to increase exponentially. At >>> 90 quads/30 graphs it is already taking close to 3 minutes. >>> >>> We've first observed the performance problems with version 1.0.4 but >>> with a synthetic dataset the same issues arise in previous releases and >>> 1.0.5+. >>> >>> Hope you can help. Would you like any additional info? >>> >>> Regards, >>> Tom >>> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. Get >> unparalleled scalability from the best Selenium testing platform available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs_______________________________________________dotNetRDF-bugs mailing list >> dot...@li... >> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> >> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >> available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> dotNetRDF-bugs mailing list >> dot...@li... >> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> >> > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > |
From: Rob V. <rv...@do...> - 2014-05-22 14:40:30
|
Tom Are you saying that performance is substantially worse with 1.0.4 versus 1.0.3 or the performance is just as bad across all recent releases? Rob From: Tomasz Pluskiewicz <tom...@gm...> Reply-To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Date: Thursday, 22 May 2014 14:48 To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries > Rob, thanks for responding. > > Always +1 for additional diagnostic tools (I mean the ExplainProcessor > enhancement). > > I've been fiddling with our query and the ?s ?p ?o pattern seems to have > little but noticeable impact on the synthetic dataset. But indeed moving the > subquery as-is outside the first GRAPH ?var boosts the query by an order of > magnitude. I've also tried to remove the duplicate triple patterns on both > GRAPH ?v patterns but it doesn't help much either. Interestingly a query which > combines subquery moved, ?s ?p ?o extracted and duplicate triple patters > removed is significantly slower then the one with just subquery moved outside > the GRAPH ?var. > > I've ran all kinds of queries against our real-life data (20k quads in over > 900 graphs) and the conclusions are the same. Moving subquery and ?s ?p ?o > graph pattern gives best results. > > Regarding the ORDER BY it still seems like a bug. I wanted to blame > inconsistent results on the fact that the subquery is nested inside the GRAPH > ?var but with the subquery moved I observe the same bahaviour. > > All the above is true for 1.0.3. Now regarding 1.0.4+ there are additional > problems as I wrote yesterday. With the real-life data the original query > takes over 2.5 minutes to complete, while in previous version only about a > quarter of a second is needed! The optimized queries actually took so long > that I never had them finished. > > Tom > > > On Wed, May 21, 2014 at 3:47 PM, Rob Vesse <rv...@do...> wrote: >> Tom >> >> Thanks for the report, I haven't done any debugging yet but I have a few >> thoughts based on what you've described >> >> ORDER BY causing indeterminate results could be a bug but it also could just >> be an artefact of two things: >> 1. SPARQL only defines a partial ordering so there are some combinations of >> terms for which ordering is left to the implementation though since we're >> just talking about dotNetRDF such indeterminate orderings should be defined >> consistently >> 2. That you have multiple terms in the data that compare to be equivalent, in >> this case we're at the mercy of .Net's sort implementation for which items >> float to the top and so are returned each time >> GRAPH ?var can be quite expensive because what it does is evaluate the inner >> operations over each individual named graph in the dataset in turn. Where >> ?var is already bound this might be a small subset but given the structure of >> your query I suspect there are at least some places where this is happening. >> So with two points in your query where you have GRAPH ?var being potentially >> unbound (or bound to a large number of possible values) you would get the >> O(n2) exponential scaling behaviour you describe >> >> Also the ?s ?p ?o in the start of your first GRAPH clause may be causing a >> substantial increase in intermediate results early on in the query. It might >> be better to have a separate GRAPH clause after the first GRAPH clause to >> pull out all the triples once you've determined the graphs you actually care >> about. >> >> There is of course a possibility that dotNetRDF is optimising the query badly >> but that will require some debugging to figure out if this is the case. >> >> Using the ExplainQueryProcessor >> (http://www.dotnetrdf.org/api/index.asp?Topic=VDS.RDF.Query.ExplainQueryProce >> ssor) with the ExplanationLevel turned up to Full as described at >> https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/HowTo/Debug%20SPARQL%20Queries >> .wiki#!debugging-sparql-queries might be enlightening since it'll include >> things like intermediate result count. Though it doesn't currently analyse >> how many graphs a given GRAPH clause has to consider which it'll make it hard >> to spot that exponential looping on GRAPH ?var if that is the culprit, that >> would certainly be interesting information so I may try and add that in the >> future. >> >> Let me know if you guys figure anything more out, I'll aim to take a proper >> look and debug this later in the week >> >> Cheers, >> >> Rob >> >> From: Tomek Pluskiewicz <to...@pl...> >> Reply-To: dotNetRDF Bug Report tracking and resolution >> <dot...@li...> >> Date: Wednesday, 21 May 2014 13:46 >> To: dotNetRDF Bug Report tracking and resolution >> <dot...@li...> >> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries >> >>> Also, here's a test repo https://bitbucket.org/tpluscode/sparql-test >>> >>> >>> On Wed, May 21, 2014 at 2:18 PM, Tomek Pluskiewicz <to...@pl...> >>> wrote: >>>> Hi Rob >>>> >>>> We've developing a ORM solution complete with Linq for some time now. Will >>>> be open source'd at some point. Currently we've been experiencing problems >>>> with query speed and reliability. Let me acquaint you with how things work. >>>> >>>> Each resource is contained within its own named graph and additionally >>>> there is a meta-graph, which connects graphs and the described entities >>>> (there could be many graphs for one resource). For example >>>> >>>> # meta graph >>>> <http://foo.com/productList/> >>>> { >>>> ex:Wrench1 foaf:primaryTopic ex:Wrench1 . >>>> } >>>> >>>> # wrench >>>> ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . } >>>> >>>> The problem is with a query >>>> >>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >>>> PREFIX schema: <http://schema.org/> >>>> PREFIX foaf: <http://xmlns.com/foaf/0.1/> >>>> >>>> SELECT ?s ?p ?o ?Gp0 ?p0 >>>> WHERE >>>> { >>>> GRAPH ?Gp0 >>>> { >>>> ?s ?p ?o . >>>> ?p0_sub schema:name ?name0_sub . >>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>>> ?p0 rdf:type schema:Product . >>>> { >>>> SELECT DISTINCT ?p0_sub >>>> WHERE >>>> { >>>> GRAPH ?Gp0_sub >>>> { >>>> ?p0_sub rdf:type schema:Product . >>>> ?p0_sub schema:name ?name0_sub . >>>> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >>>> } >>>> GRAPH <http://foo.com/productList/> >>>> { >>>> ?Gp0_sub foaf:primaryTopic ?p0_sub . >>>> } >>>> } >>>> #ORDER BY ?p0_sub >>>> LIMIT 2 >>>> } >>>> FILTER(?p0_sub=?p0) >>>> } >>>> >>>> GRAPH <http://foo.com/productList/> >>>> { >>>> ?Gp0 foaf:primaryTopic ?p0 . >>>> } >>>> } >>>> >>>> transformed from the following Linq >>>> >>>> Query<IProduct>().Where(p => >>>> p.Name.ToUpper().Contains(name.ToUpper())).Take(2) >>>> >>>> There are two problems here. The query returns different results on >>>> subsequent runs against the same dataset and it runs very slow. >>>> Uncommenting the ORDER BY helps with the varying result count though I'm >>>> not exactly sure why it should be necessary. However I'm not sure what's >>>> with performance. Obviously it has something to do with the subquery but I >>>> was unable to alter this SELECT so that it executed quickly. Even as small >>>> a dataset as 9 quads (3 resources * (2 triples + 1 meta-triple)) takes 1 >>>> second to complete and the time seems to increase exponentially. At 90 >>>> quads/30 graphs it is already taking close to 3 minutes. >>>> >>>> We've first observed the performance problems with version 1.0.4 but with a >>>> synthetic dataset the same issues arise in previous releases and 1.0.5+. >>>> >>>> Hope you can help. Would you like any additional info? >>>> >>>> Regards, >>>> Tom >>> >>> ---------------------------------------------------------------------------- >>> -- "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>> Instantly run your Selenium tests across 300+ browser/OS combos. Get >>> unparalleled scalability from the best Selenium testing platform available >>> Simple to use. Nothing to install. Get started now for free." >>> http://p.sf.net/sfu/SauceLabs_______________________________________________ >>> dotNetRDF-bugs mailing list >>> dot...@li...https://lists.sourceforge.net/lists/list >>> info/dotnetrdf-bugs >> >> ----------------------------------------------------------------------------->> - >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform >> available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> dotNetRDF-bugs mailing list >> dot...@li... >> https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs >> > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________ > dotNetRDF-bugs mailing list dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |
From: Rob V. <rv...@do...> - 2014-05-22 14:34:49
|
Yes The newer specifications use IRIs in which spaces are forbidden Adding the location information to the error from the NTriples tokeniser is a trivial fix and I'll get that in for the next release Rob From: Tomek Pluskiewicz <to...@pl...> Reply-To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Date: Thursday, 22 May 2014 13:46 To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Subject: [dotNetRDF-bugs] Spaces in URIs > Hi Rob > > I see that with dnr 1.0.4 spaces are not allowed in URIs in NQuads data. > Seeing that Turtle has the same limitation I assume that it's according to the > specification. > > It would be nice though to if the error pointed to the faulty line in file. > > Regards, > Tom > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________ > dotNetRDF-bugs mailing list dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |
From: Tomasz P. <tom...@gm...> - 2014-05-22 13:48:51
|
Rob, thanks for responding. Always +1 for additional diagnostic tools (I mean the ExplainProcessor enhancement). I've been fiddling with our query and the ?s ?p ?o pattern seems to have little but noticeable impact on the synthetic dataset. But indeed moving the subquery as-is outside the first GRAPH ?var boosts the query by an order of magnitude. I've also tried to remove the duplicate triple patterns on both GRAPH ?v patterns but it doesn't help much either. Interestingly a query which combines subquery moved, ?s ?p ?o extracted and duplicate triple patters removed is significantly slower then the one with just subquery moved outside the GRAPH ?var. I've ran all kinds of queries against our real-life data (20k quads in over 900 graphs) and the conclusions are the same. Moving subquery and ?s ?p ?o graph pattern gives best results. Regarding the ORDER BY it still seems like a bug. I wanted to blame inconsistent results on the fact that the subquery is nested inside the GRAPH ?var but with the subquery moved I observe the same bahaviour. All the above is true for 1.0.3. Now regarding 1.0.4+ there are additional problems as I wrote yesterday. With the real-life data the original query takes over 2.5 minutes to complete, while in previous version only about a quarter of a second is needed! The optimized queries actually took so long that I never had them finished. Tom On Wed, May 21, 2014 at 3:47 PM, Rob Vesse <rv...@do...> wrote: > Tom > > Thanks for the report, I haven't done any debugging yet but I have a few > thoughts based on what you've described > > ORDER BY causing indeterminate results could be a bug but it also could > just be an artefact of two things: > > 1. SPARQL only defines a partial ordering so there are some > combinations of terms for which ordering is left to the implementation > though since we're just talking about dotNetRDF such indeterminate > orderings should be defined consistently > 2. That you have multiple terms in the data that compare to be > equivalent, in this case we're at the mercy of .Net's sort implementation > for which items float to the top and so are returned each time > > GRAPH ?var can be quite expensive because what it does is evaluate the > inner operations over each individual named graph in the dataset in turn. > Where ?var is already bound this might be a small subset but given the > structure of your query I suspect there are at least some places where this > is happening. So with two points in your query where you have GRAPH ?var > being potentially unbound (or bound to a large number of possible values) > you would get the O(n2) exponential scaling behaviour you describe > > Also the ?s ?p ?o in the start of your first GRAPH clause may be causing a > substantial increase in intermediate results early on in the query. It > might be better to have a separate GRAPH clause after the first GRAPH > clause to pull out all the triples once you've determined the graphs you > actually care about. > > There is of course a possibility that dotNetRDF is optimising the query > badly but that will require some debugging to figure out if this is the > case. > > Using the ExplainQueryProcessor ( > http://www.dotnetrdf.org/api/index.asp?Topic=VDS.RDF.Query.ExplainQueryProcessor) > with the ExplanationLevel turned up to Full as described at > https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/HowTo/Debug%20SPARQL%20Queries.wiki#!debugging-sparql-queries might > be enlightening since it'll include things like intermediate result count. > Though it doesn't currently analyse how many graphs a given GRAPH clause > has to consider which it'll make it hard to spot that exponential looping > on GRAPH ?var if that is the culprit, that would certainly be interesting > information so I may try and add that in the future. > > Let me know if you guys figure anything more out, I'll aim to take a > proper look and debug this later in the week > > Cheers, > > Rob > > From: Tomek Pluskiewicz <to...@pl...> > Reply-To: dotNetRDF Bug Report tracking and resolution < > dot...@li...> > Date: Wednesday, 21 May 2014 13:46 > To: dotNetRDF Bug Report tracking and resolution < > dot...@li...> > Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries > > Also, here's a test repo https://bitbucket.org/tpluscode/sparql-test > > > On Wed, May 21, 2014 at 2:18 PM, Tomek Pluskiewicz <to...@pl...>wrote: > >> Hi Rob >> >> We've developing a ORM solution complete with Linq for some time now. >> Will be open source'd at some point. Currently we've been experiencing >> problems with query speed and reliability. Let me acquaint you with how >> things work. >> >> Each resource is contained within its own named graph and additionally >> there is a meta-graph, which connects graphs and the described entities >> (there could be many graphs for one resource). For example >> >> # meta graph >> <http://foo.com/productList/> >> { >> ex:Wrench1 foaf:primaryTopic ex:Wrench1 . >> } >> >> # wrench >> ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . } >> >> The problem is with a query >> >> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >> PREFIX schema: <http://schema.org/> >> PREFIX foaf: <http://xmlns.com/foaf/0.1/> >> >> SELECT ?s ?p ?o ?Gp0 ?p0 >> WHERE >> { >> GRAPH ?Gp0 >> { >> ?s ?p ?o . >> ?p0_sub schema:name ?name0_sub . >> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >> ?p0 rdf:type schema:Product . >> { >> SELECT DISTINCT ?p0_sub >> WHERE >> { >> GRAPH ?Gp0_sub >> { >> ?p0_sub rdf:type schema:Product . >> ?p0_sub schema:name ?name0_sub . >> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >> } >> GRAPH <http://foo.com/productList/> >> { >> ?Gp0_sub foaf:primaryTopic ?p0_sub . >> } >> } >> #ORDER BY ?p0_sub >> LIMIT 2 >> } >> FILTER(?p0_sub=?p0) >> } >> >> GRAPH <http://foo.com/productList/> >> { >> ?Gp0 foaf:primaryTopic ?p0 . >> } >> } >> >> transformed from the following Linq >> >> Query<IProduct>().Where(p => >> p.Name.ToUpper().Contains(name.ToUpper())).Take(2) >> >> There are two problems here. The query returns different results on >> subsequent runs against the same dataset and it runs very slow. >> Uncommenting the ORDER BY helps with the varying result count though I'm >> not exactly sure why it should be necessary. However I'm not sure what's >> with performance. Obviously it has something to do with the subquery but I >> was unable to alter this SELECT so that it executed quickly. Even as >> small a dataset as 9 quads (3 resources * (2 triples + 1 meta-triple)) >> takes 1 second to complete and the time seems to increase exponentially. At >> 90 quads/30 graphs it is already taking close to 3 minutes. >> >> We've first observed the performance problems with version 1.0.4 but with >> a synthetic dataset the same issues arise in previous releases and 1.0.5+. >> >> Hope you can help. Would you like any additional info? >> >> Regards, >> Tom >> > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform > available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs > > |
From: Tomek P. <to...@pl...> - 2014-05-22 12:47:11
|
Hi Rob I see that with dnr 1.0.4 spaces are not allowed in URIs in NQuads data. Seeing that Turtle has the same limitation I assume that it's according to the specification. It would be nice though to if the error pointed to the faulty line in file. Regards, Tom |
From: Rob V. <rv...@do...> - 2014-05-21 14:04:38
|
Tom Thanks for the report, I haven't done any debugging yet but I have a few thoughts based on what you've described ORDER BY causing indeterminate results could be a bug but it also could just be an artefact of two things: 1. SPARQL only defines a partial ordering so there are some combinations of terms for which ordering is left to the implementation though since we're just talking about dotNetRDF such indeterminate orderings should be defined consistently 2. That you have multiple terms in the data that compare to be equivalent, in this case we're at the mercy of .Net's sort implementation for which items float to the top and so are returned each time GRAPH ?var can be quite expensive because what it does is evaluate the inner operations over each individual named graph in the dataset in turn. Where ?var is already bound this might be a small subset but given the structure of your query I suspect there are at least some places where this is happening. So with two points in your query where you have GRAPH ?var being potentially unbound (or bound to a large number of possible values) you would get the O(n2) exponential scaling behaviour you describe Also the ?s ?p ?o in the start of your first GRAPH clause may be causing a substantial increase in intermediate results early on in the query. It might be better to have a separate GRAPH clause after the first GRAPH clause to pull out all the triples once you've determined the graphs you actually care about. There is of course a possibility that dotNetRDF is optimising the query badly but that will require some debugging to figure out if this is the case. Using the ExplainQueryProcessor (http://www.dotnetrdf.org/api/index.asp?Topic=VDS.RDF.Query.ExplainQueryProc essor) with the ExplanationLevel turned up to Full as described at https://bitbucket.org/dotnetrdf/dotnetrdf/wiki/HowTo/Debug%20SPARQL%20Querie s.wiki#!debugging-sparql-queries might be enlightening since it'll include things like intermediate result count. Though it doesn't currently analyse how many graphs a given GRAPH clause has to consider which it'll make it hard to spot that exponential looping on GRAPH ?var if that is the culprit, that would certainly be interesting information so I may try and add that in the future. Let me know if you guys figure anything more out, I'll aim to take a proper look and debug this later in the week Cheers, Rob From: Tomek Pluskiewicz <to...@pl...> Reply-To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Date: Wednesday, 21 May 2014 13:46 To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Subject: Re: [dotNetRDF-bugs] Problems with SPARQL queries > Also, here's a test repo https://bitbucket.org/tpluscode/sparql-test > > > On Wed, May 21, 2014 at 2:18 PM, Tomek Pluskiewicz <to...@pl...> > wrote: >> Hi Rob >> >> We've developing a ORM solution complete with Linq for some time now. Will be >> open source'd at some point. Currently we've been experiencing problems with >> query speed and reliability. Let me acquaint you with how things work. >> >> Each resource is contained within its own named graph and additionally there >> is a meta-graph, which connects graphs and the described entities (there >> could be many graphs for one resource). For example >> >> # meta graph >> <http://foo.com/productList/> >> { >> ex:Wrench1 foaf:primaryTopic ex:Wrench1 . >> } >> >> # wrench >> ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . } >> >> The problem is with a query >> >> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> >> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> >> PREFIX schema: <http://schema.org/> >> PREFIX foaf: <http://xmlns.com/foaf/0.1/> >> >> SELECT ?s ?p ?o ?Gp0 ?p0 >> WHERE >> { >> GRAPH ?Gp0 >> { >> ?s ?p ?o . >> ?p0_sub schema:name ?name0_sub . >> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >> ?p0 rdf:type schema:Product . >> { >> SELECT DISTINCT ?p0_sub >> WHERE >> { >> GRAPH ?Gp0_sub >> { >> ?p0_sub rdf:type schema:Product . >> ?p0_sub schema:name ?name0_sub . >> FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) >> } >> GRAPH <http://foo.com/productList/> >> { >> ?Gp0_sub foaf:primaryTopic ?p0_sub . >> } >> } >> #ORDER BY ?p0_sub >> LIMIT 2 >> } >> FILTER(?p0_sub=?p0) >> } >> >> GRAPH <http://foo.com/productList/> >> { >> ?Gp0 foaf:primaryTopic ?p0 . >> } >> } >> >> transformed from the following Linq >> >> Query<IProduct>().Where(p => >> p.Name.ToUpper().Contains(name.ToUpper())).Take(2) >> >> There are two problems here. The query returns different results on >> subsequent runs against the same dataset and it runs very slow. Uncommenting >> the ORDER BY helps with the varying result count though I'm not exactly sure >> why it should be necessary. However I'm not sure what's with performance. >> Obviously it has something to do with the subquery but I was unable to alter >> this SELECT so that it executed quickly. Even as small a dataset as 9 quads >> (3 resources * (2 triples + 1 meta-triple)) takes 1 second to complete and >> the time seems to increase exponentially. At 90 quads/30 graphs it is already >> taking close to 3 minutes. >> >> We've first observed the performance problems with version 1.0.4 but with a >> synthetic dataset the same issues arise in previous releases and 1.0.5+. >> >> Hope you can help. Would you like any additional info? >> >> Regards, >> Tom > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________ > dotNetRDF-bugs mailing list dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |
From: Tomek P. <to...@pl...> - 2014-05-21 12:47:09
|
Also, here's a test repo https://bitbucket.org/tpluscode/sparql-test On Wed, May 21, 2014 at 2:18 PM, Tomek Pluskiewicz <to...@pl...>wrote: > Hi Rob > > We've developing a ORM solution complete with Linq for some time now. Will > be open source'd at some point. Currently we've been experiencing problems > with query speed and reliability. Let me acquaint you with how things work. > > Each resource is contained within its own named graph and additionally > there is a meta-graph, which connects graphs and the described entities > (there could be many graphs for one resource). For example > > # meta graph > <http://foo.com/productList/> > { > ex:Wrench1 foaf:primaryTopic ex:Wrench1 . > } > > # wrench > ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . } > > The problem is with a query > > PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> > PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> > PREFIX schema: <http://schema.org/> > PREFIX foaf: <http://xmlns.com/foaf/0.1/> > > SELECT ?s ?p ?o ?Gp0 ?p0 > WHERE > { > GRAPH ?Gp0 > { > ?s ?p ?o . > ?p0_sub schema:name ?name0_sub . > FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) > ?p0 rdf:type schema:Product . > { > SELECT DISTINCT ?p0_sub > WHERE > { > GRAPH ?Gp0_sub > { > ?p0_sub rdf:type schema:Product . > ?p0_sub schema:name ?name0_sub . > FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) > } > GRAPH <http://foo.com/productList/> > { > ?Gp0_sub foaf:primaryTopic ?p0_sub . > } > } > #ORDER BY ?p0_sub > LIMIT 2 > } > FILTER(?p0_sub=?p0) > } > > GRAPH <http://foo.com/productList/> > { > ?Gp0 foaf:primaryTopic ?p0 . > } > } > > transformed from the following Linq > > Query<IProduct>().Where(p => > p.Name.ToUpper().Contains(name.ToUpper())).Take(2) > > There are two problems here. The query returns different results on > subsequent runs against the same dataset and it runs very slow. > Uncommenting the ORDER BY helps with the varying result count though I'm > not exactly sure why it should be necessary. However I'm not sure what's > with performance. Obviously it has something to do with the subquery but I > was unable to alter this SELECT so that it executed quickly. Even as > small a dataset as 9 quads (3 resources * (2 triples + 1 meta-triple)) > takes 1 second to complete and the time seems to increase exponentially. At > 90 quads/30 graphs it is already taking close to 3 minutes. > > We've first observed the performance problems with version 1.0.4 but with > a synthetic dataset the same issues arise in previous releases and 1.0.5+. > > Hope you can help. Would you like any additional info? > > Regards, > Tom > |
From: Tomek P. <to...@pl...> - 2014-05-21 12:18:59
|
Hi Rob We've developing a ORM solution complete with Linq for some time now. Will be open source'd at some point. Currently we've been experiencing problems with query speed and reliability. Let me acquaint you with how things work. Each resource is contained within its own named graph and additionally there is a meta-graph, which connects graphs and the described entities (there could be many graphs for one resource). For example # meta graph <http://foo.com/productList/> { ex:Wrench1 foaf:primaryTopic ex:Wrench1 . } # wrench ex:Wrench1 { ex:Wrench1 a sch:Product ; sch:name "Wrench" . } The problem is with a query PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX schema: <http://schema.org/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?s ?p ?o ?Gp0 ?p0 WHERE { GRAPH ?Gp0 { ?s ?p ?o . ?p0_sub schema:name ?name0_sub . FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) ?p0 rdf:type schema:Product . { SELECT DISTINCT ?p0_sub WHERE { GRAPH ?Gp0_sub { ?p0_sub rdf:type schema:Product . ?p0_sub schema:name ?name0_sub . FILTER (CONTAINS(UCASE(?name0_sub),"W"^^xsd:string)) } GRAPH <http://foo.com/productList/> { ?Gp0_sub foaf:primaryTopic ?p0_sub . } } #ORDER BY ?p0_sub LIMIT 2 } FILTER(?p0_sub=?p0) } GRAPH <http://foo.com/productList/> { ?Gp0 foaf:primaryTopic ?p0 . } } transformed from the following Linq Query<IProduct>().Where(p => p.Name.ToUpper().Contains(name.ToUpper())).Take(2) There are two problems here. The query returns different results on subsequent runs against the same dataset and it runs very slow. Uncommenting the ORDER BY helps with the varying result count though I'm not exactly sure why it should be necessary. However I'm not sure what's with performance. Obviously it has something to do with the subquery but I was unable to alter this SELECT so that it executed quickly. Even as small a dataset as 9 quads (3 resources * (2 triples + 1 meta-triple)) takes 1 second to complete and the time seems to increase exponentially. At 90 quads/30 graphs it is already taking close to 3 minutes. We've first observed the performance problems with version 1.0.4 but with a synthetic dataset the same issues arise in previous releases and 1.0.5+. Hope you can help. Would you like any additional info? Regards, Tom |
From: Rob V. <rv...@do...> - 2014-05-16 12:13:37
|
Thanks for reporting this This has been tracked as CORE-412 (http://dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=412) and is fixed in default and will be included in our next release Thanks, Rob From: "Rassokhin, Dmitrii [JRDUS]" <DRA...@it...> Reply-To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Date: Saturday, 10 May 2014 02:30 To: "dot...@li..." <dot...@li...> Subject: [dotNetRDF-bugs] incorrect usage of Single.NaN and Double.NaN constants in numeric comparisons > For example, this type of error is encountered at line 102 in NumericNode.cs > (version 1.04 stable): > > return this.AsFloat() != 0.0f && this.AsFloat() != Single.NaN; > > The second part of the expression, this.AsFloat() != Single.NaN, will always > return true, even when this.AsFloat() returns the Single.NaN. > NaN does not equal to any floating-point number, including self. Use > !Single.IsNan(this.AsFloat()) instead. > > I have found errors of this type in several other places throughout the source > code. Please search for all usages of double.NaN, float.NaN, Double.NaN and > Single.NaN. > > Regards, > Dmitrii Rassokhin. > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________ > dotNetRDF-bugs mailing list dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |
From: Rassokhin, D. [JRDUS] <DRA...@it...> - 2014-05-10 01:31:00
|
For example, this type of error is encountered at line 102 in NumericNode.cs (version 1.04 stable): return this.AsFloat() != 0.0f && this.AsFloat() != Single.NaN; The second part of the expression, this.AsFloat() != Single.NaN, will always return true, even when this.AsFloat() returns the Single.NaN. NaN does not equal to any floating-point number, including self. Use !Single.IsNan(this.AsFloat()) instead. I have found errors of this type in several other places throughout the source code. Please search for all usages of double.NaN, float.NaN, Double.NaN and Single.NaN. Regards, Dmitrii Rassokhin. |
From: Rob V. <rv...@do...> - 2014-03-14 11:52:26
|
Hi All We're pleased to announce the release of dotNetRDF 1.0.4 (http://www.dotnetrdf.org/blogitem.asp?blogID=80). This is a minor feature and bug fix release which adds support for RDF 1.1 NTriples and NQuads and fixes various reported bugs. Thanks as always to everyone who reported bugs/provided patches and helped produce this release. Cheers, Rob |
From: Rob V. <rv...@do...> - 2014-03-07 16:24:14
|
This is now fixed in default and will be included in the next release Rob From: Rob Vesse <rv...@do...> Reply-To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Date: Friday, 7 March 2014 15:37 To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Subject: Re: [dotNetRDF-bugs] Store Manager Remote Query/Update - Sparql Parse always enabled > It is a bug in the formatting of UNION graph patterns, filed as CORE-402 > (http://dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=402) and I'm working > on a fix > > Rob > > From: Eugen F <feu...@ya...> > Reply-To: Eugen F <feu...@ya...> > Date: Saturday, 1 March 2014 11:26 > To: "dot...@li..." > <dot...@li...>, Rob Vesse <rv...@do...> > Subject: Fw: Store Manager Remote Query/Update - Sparql Parse always enabled > >> I think there is a parsing problem when using store manager from tools_103 >> (and previous versions) with UNION. >> The following query in store manager (using sparql query/update connection): >> ---------------------------------------- >> SELECT ?b ?c >> WHERE >> { >> { >> GRAPH <http://AliceIRI> >> { >> <http://local.virt/foo> ?b ?c} >> } >> UNION >> { >> GRAPH <http://BobIRI> { >> <http://local.virt/foo> ?b ?c} >> } >> >> } >> -------------------------------------------- >> >> is sent to the sparql endpoint as (notice removal of "{}" before UNION): >> >> -------------------------------------------------------- >> SELECT ?b ?c >> WHERE >> { >> GRAPH <http://aliceiri/> { <http://local.virt/foo> ?b ?c . } >> UNION >> { >> GRAPH <http://bobiri/> { <http://local.virt/foo> ?b ?c . } >> } >> } >> -------------------------------------------- >> and of course it fails on the server. >> I forwarded to prev message because this could be fixed by fixing the parser >> or by allowing to skip local parsing. >> >> If this isn't fixed already is a dev branch, I could fix the "skip local >> parsing" in a dnr fork, since I have to fix this anyway because at this point >> I can't write the query. >> >> >> >> >> >> On Sunday, December 22, 2013 3:26 PM, Eugen F <feu...@ya...> wrote: >> >> >> >> When using store manager(query/update endpoint) with custom sparql queries it >> always performs parsing because SparqlConnector _skipLocalParsing is always >> false(UI code skips parsing, but the connector enforces it). >> >> Maybe it's better for the connector to catch parsing error and default to no >> parsing (same as UI/manager code). >> >> >> >> >> >> > ------------------------------------------------------------------------------ > Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce. > With Perforce, you get hassle-free workflows. Merge that actually works. > Faster operations. Version large binaries. Built-in WAN optimization and the > freedom to use Git, Perforce or both. Make the move to Perforce. > http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk__ > _____________________________________________ dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |
From: Rob V. <rv...@do...> - 2014-03-07 15:37:59
|
It is a bug in the formatting of UNION graph patterns, filed as CORE-402 (http://dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=402) and I'm working on a fix Rob From: Eugen F <feu...@ya...> Reply-To: Eugen F <feu...@ya...> Date: Saturday, 1 March 2014 11:26 To: "dot...@li..." <dot...@li...>, Rob Vesse <rv...@do...> Subject: Fw: Store Manager Remote Query/Update - Sparql Parse always enabled > I think there is a parsing problem when using store manager from tools_103 > (and previous versions) with UNION. > The following query in store manager (using sparql query/update connection): > ---------------------------------------- > SELECT ?b ?c > WHERE > { > { > GRAPH <http://AliceIRI> > { > <http://local.virt/foo> ?b ?c} > } > UNION > { > GRAPH <http://BobIRI> { > <http://local.virt/foo> ?b ?c} > } > > } > -------------------------------------------- > > is sent to the sparql endpoint as (notice removal of "{}" before UNION): > > -------------------------------------------------------- > SELECT ?b ?c > WHERE > { > GRAPH <http://aliceiri/> { <http://local.virt/foo> ?b ?c . } > UNION > { > GRAPH <http://bobiri/> { <http://local.virt/foo> ?b ?c . } > } > } > -------------------------------------------- > and of course it fails on the server. > I forwarded to prev message because this could be fixed by fixing the parser > or by allowing to skip local parsing. > > If this isn't fixed already is a dev branch, I could fix the "skip local > parsing" in a dnr fork, since I have to fix this anyway because at this point > I can't write the query. > > > > > > On Sunday, December 22, 2013 3:26 PM, Eugen F <feu...@ya...> wrote: > > > > When using store manager(query/update endpoint) with custom sparql queries it > always performs parsing because SparqlConnector _skipLocalParsing is always > false(UI code skips parsing, but the connector enforces it). > > Maybe it's better for the connector to catch parsing error and default to no > parsing (same as UI/manager code). > > > > > > |
From: Eugen F <feu...@ya...> - 2014-03-01 11:26:39
|
I think there is a parsing problem when using store manager from tools_103 (and previous versions) with UNION. The following query in store manager (using sparql query/update connection): ---------------------------------------- SELECT ?b ?c WHERE { { GRAPH <http://AliceIRI> { <http://local.virt/foo> ?b ?c} } UNION { GRAPH <http://BobIRI> { <http://local.virt/foo> ?b ?c} } } -------------------------------------------- is sent to the sparql endpoint as (notice removal of "{}" before UNION): -------------------------------------------------------- SELECT ?b ?c WHERE { GRAPH <http://aliceiri/> { <http://local.virt/foo> ?b ?c . } UNION { GRAPH <http://bobiri/> { <http://local.virt/foo> ?b ?c . } } } -------------------------------------------- and of course it fails on the server. I forwarded to prev message because this could be fixed by fixing the parser or by allowing to skip local parsing. If this isn't fixed already is a dev branch, I could fix the "skip local parsing" in a dnr fork, since I have to fix this anyway because at this point I can't write the query. On Sunday, December 22, 2013 3:26 PM, Eugen F <feu...@ya...> wrote: When using store manager(query/update endpoint) with custom sparql queries it always performs parsing because SparqlConnector _skipLocalParsing is always false(UI code skips parsing, but the connector enforces it). Maybe it's better for the connector to catch parsing error and default to no parsing (same as UI/manager code). |
From: Rob V. <rv...@do...> - 2014-01-30 06:25:37
|
Christopher This is unintuitive although by design, the Nodes collection only returns things that are in the Subject/Object position in a triple and not things that are predicates This question and my answer at http://stackoverflow.com/questions/17967686/retrieving-specific-rdf-graph-tr iples-based-on-predicate-nodes discusses the thinking behind this and the eventual planned changes around this feature Rob From: "Penny, Christopher" <Chr...@ds...> Reply-To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Date: Wednesday, 29 January 2014 20:28 To: "dot...@li..." <dot...@li...> Subject: [dotNetRDF-bugs] UriNodes in predicate position only don't appear in Graph.Nodes [SEC=UNCLASSIFIED] > UNCLASSIFIED > > Hello again all, > > I believe I¹ve stumbled across another (possible) bug. UriNodes which only > appear in the predicate position do not seem to find their way into > Graph.Nodes.UriNodes(), Graph.Nodes or become returnable via getUriNode. The > code below demonstrates the problem I¹m having. > > privatevoid getUriNodeProblemExample() > { > //Create graph and add namespace + nodes > Graph graph = newGraph(); > graph.NamespaceMap.AddNamespace("test", > newUri("http://test.com/test#")); > IUriNode subjectNode = graph.CreateUriNode("test:subject"); > IUriNode predicateNode = graph.CreateUriNode("test:predicate"); > IUriNode objectNode = graph.CreateUriNode("test:object"); > > //The UriNode is null > IUriNode uriNode = graph.GetUriNode("test:subject"); > if(uriNode == null) > { > System.Console.WriteLine("Urinode is null"); > } > else > { > System.Console.WriteLine("Returned IUriNode is: " + uriNode); > } > > //Source indicates that GetUriNode iterates over > graph.Nodes.UriNodes() > System.Console.WriteLine("Printing URINodes before first > triple:"); > foreach (IUriNode node in graph.Nodes.UriNodes()) > { > System.Console.WriteLine(">Node in graph.Nodes.UriNodes() is: > " +node.ToString()); > } > > //Iterate over all nodes > System.Console.WriteLine("Printing ALL Nodes before first > triple:"); > foreach (INode node in graph.Nodes) > { > System.Console.WriteLine(">Node: "+node.ToString()+" has type: > "+node.NodeType); > } > > //Adding a triple seems necessary to get nodes to appear in the > graph. However, only the subject and object nodes appear after the triple is > created. > graph.Assert(newTriple(subjectNode,predicateNode,objectNode)); > > //Source indicates that GetUriNode iterates over > graph.Nodes.UriNodes() > System.Console.WriteLine("Printing URINodes after first triple:"); > foreach (IUriNode node in graph.Nodes.UriNodes()) > { > System.Console.WriteLine(">Node in graph.Nodes.UriNodes() is: > " + node.ToString()); > } > > //Iterate over all nodes > System.Console.WriteLine("Printing All nodes after first > triple:"); > foreach (INode node in graph.Nodes) > { > System.Console.WriteLine(">Node: " + node.ToString() + " has > type: " + node.NodeType); > } > > //Only the subject and object nodes were found. Lets switch the > order. All nodes get printed this time. > graph.Assert(newTriple(predicateNode, subjectNode, objectNode)); > > //Source indicates that GetUriNode iterates over > graph.Nodes.UriNodes() > System.Console.WriteLine("Printing URINodes after second > triple:"); > foreach (IUriNode node in graph.Nodes.UriNodes()) > { > System.Console.WriteLine(">Node in graph.Nodes.UriNodes() is: > " + node.ToString()); > } > > //Iterate over all nodes > System.Console.WriteLine("Printing All nodes after second > triple:"); > foreach (INode node in graph.Nodes) > { > System.Console.WriteLine(">Node: " + node.ToString() + " has > type: " + node.NodeType); > } > > System.Console.WriteLine("End of function"); > } > > I get the following output for the above code: > Urinode is null > Printing URINodes before first triple: > Printing ALL Nodes before first triple: > Printing URINodes after first triple: >> >Node in graph.Nodes.UriNodes() is: http://test.com/test#subject >> >Node in graph.Nodes.UriNodes() is: http://test.com/test#object > Printing All nodes after first triple: >> >Node: http://test.com/test#subject has type: Uri >> >Node: http://test.com/test#object has type: Uri > Printing URINodes after second triple: >> >Node in graph.Nodes.UriNodes() is: http://test.com/test#subject >> >Node in graph.Nodes.UriNodes() is: http://test.com/test#predicate >> >Node in graph.Nodes.UriNodes() is: http://test.com/test#object > Printing All nodes after second triple: >> >Node: http://test.com/test#subject has type: Uri >> >Node: http://test.com/test#predicate has type: Uri >> >Node: http://test.com/test#object has type: Uri > End of function > > I poked around a bit more in the source. Might the problem be in BaseGraph.cs: > public virtual IEnumerable<INode> Nodes > { > get > { > return (from t in this._triples > select t.Subject).Concat(from t in this._triples > select t.Object).Distinct(); > } > } > Needing to also .Concat t.Predicate? > > Unless the reasoning is that a graph is composed of subject/object nodes with > predicate links¹. However, it would be nice to retrieve an IUriNode for a > predicate from a graph for use with > Graph.GetTriplesWithSubjectPredicate(INode, INode). To get the rdf:type of the > subject, for example. > > Let me know if more information is required. > > Regards, Chris. > IMPORTANT: This email remains the property of the Department of Defence and is > subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have > received this email in error, you are requested to contact the sender and > delete the email. > ------------------------------------------------------------------------------ > WatchGuard Dimension instantly turns raw network data into actionable security > intelligence. It gives you real-time visual feedback on key security issues > and trends. Skip the complicated setup - simply import a virtual appliance > and go from zero to informed in seconds. > http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk__ > _____________________________________________ dotNetRDF-bugs mailing list > dot...@li... > https://lists.sourceforge.net/lists/listinfo/dotnetrdf-bugs |
From: Penny, C. <Chr...@ds...> - 2014-01-30 05:30:45
|
UNCLASSIFIED Hello again all, I believe I've stumbled across another (possible) bug. UriNodes which only appear in the predicate position do not seem to find their way into Graph.Nodes.UriNodes(), Graph.Nodes or become returnable via getUriNode. The code below demonstrates the problem I'm having. private void getUriNodeProblemExample() { //Create graph and add namespace + nodes Graph graph = new Graph(); graph.NamespaceMap.AddNamespace("test", new Uri("http://test.com/test#")); IUriNode subjectNode = graph.CreateUriNode("test:subject"); IUriNode predicateNode = graph.CreateUriNode("test:predicate"); IUriNode objectNode = graph.CreateUriNode("test:object"); //The UriNode is null IUriNode uriNode = graph.GetUriNode("test:subject"); if(uriNode == null) { System.Console.WriteLine("Urinode is null"); } else { System.Console.WriteLine("Returned IUriNode is: " + uriNode); } //Source indicates that GetUriNode iterates over graph.Nodes.UriNodes() System.Console.WriteLine("Printing URINodes before first triple:"); foreach (IUriNode node in graph.Nodes.UriNodes()) { System.Console.WriteLine(">Node in graph.Nodes.UriNodes() is: " +node.ToString()); } //Iterate over all nodes System.Console.WriteLine("Printing ALL Nodes before first triple:"); foreach (INode node in graph.Nodes) { System.Console.WriteLine(">Node: "+node.ToString()+" has type: "+node.NodeType); } //Adding a triple seems necessary to get nodes to appear in the graph. However, only the subject and object nodes appear after the triple is created. graph.Assert(new Triple(subjectNode,predicateNode,objectNode)); //Source indicates that GetUriNode iterates over graph.Nodes.UriNodes() System.Console.WriteLine("Printing URINodes after first triple:"); foreach (IUriNode node in graph.Nodes.UriNodes()) { System.Console.WriteLine(">Node in graph.Nodes.UriNodes() is: " + node.ToString()); } //Iterate over all nodes System.Console.WriteLine("Printing All nodes after first triple:"); foreach (INode node in graph.Nodes) { System.Console.WriteLine(">Node: " + node.ToString() + " has type: " + node.NodeType); } //Only the subject and object nodes were found. Lets switch the order. All nodes get printed this time. graph.Assert(new Triple(predicateNode, subjectNode, objectNode)); //Source indicates that GetUriNode iterates over graph.Nodes.UriNodes() System.Console.WriteLine("Printing URINodes after second triple:"); foreach (IUriNode node in graph.Nodes.UriNodes()) { System.Console.WriteLine(">Node in graph.Nodes.UriNodes() is: " + node.ToString()); } //Iterate over all nodes System.Console.WriteLine("Printing All nodes after second triple:"); foreach (INode node in graph.Nodes) { System.Console.WriteLine(">Node: " + node.ToString() + " has type: " + node.NodeType); } System.Console.WriteLine("End of function"); } I get the following output for the above code: Urinode is null Printing URINodes before first triple: Printing ALL Nodes before first triple: Printing URINodes after first triple: >Node in graph.Nodes.UriNodes() is: http://test.com/test#subject >Node in graph.Nodes.UriNodes() is: http://test.com/test#object Printing All nodes after first triple: >Node: http://test.com/test#subject has type: Uri >Node: http://test.com/test#object has type: Uri Printing URINodes after second triple: >Node in graph.Nodes.UriNodes() is: http://test.com/test#subject >Node in graph.Nodes.UriNodes() is: http://test.com/test#predicate >Node in graph.Nodes.UriNodes() is: http://test.com/test#object Printing All nodes after second triple: >Node: http://test.com/test#subject has type: Uri >Node: http://test.com/test#predicate has type: Uri >Node: http://test.com/test#object has type: Uri End of function I poked around a bit more in the source. Might the problem be in BaseGraph.cs: public virtual IEnumerable<INode> Nodes { get { return (from t in this._triples select t.Subject).Concat(from t in this._triples select t.Object).Distinct(); } } Needing to also .Concat t.Predicate? Unless the reasoning is that a graph is composed of subject/object nodes with predicate 'links'. However, it would be nice to retrieve an IUriNode for a predicate from a graph for use with Graph.GetTriplesWithSubjectPredicate(INode, INode). To get the rdf:type of the subject, for example. Let me know if more information is required. Regards, Chris. IMPORTANT: This email remains the property of the Department of Defence and is subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in error, you are requested to contact the sender and delete the email. |
From: Rob V. <rv...@do...> - 2014-01-13 16:45:02
|
Christopher Thanks for the report, this was indeed a bug which has been filed as CORE-394 (http://dotnetrdf.org/tracker/Issues/IssueDetail.aspx?&id=394) This has been fixed on our default branch and will be included in the next release Regards, Rob Vesse From: "Penny, Christopher" <Chr...@ds...> Reply-To: dotNetRDF Bug Report tracking and resolution <dot...@li...> Date: Monday, 13 January 2014 05:32 To: "dot...@li..." <dot...@li...> Subject: [dotNetRDF-bugs] Storing Named Graphs in Fuseki Omit URI Fragment [SEC=UNCLASSIFIED] > UNCLASSIFIED > > Hi All, > > It appears as though the fragment part (# onwards) of the graphs BaseUri is > not being saved to the Fuseki server. Below is a quick example to demonstrate > what I mean. > > privatevoid graphURIFragmentProblemExample(string fusekiTripleStore) > { > //Setup graph > IGraph graph = newGraph(); > graph.BaseUri = newUri("http://test.com/test#graph1"); > graph.NamespaceMap.AddNamespace("data", > UriFactory.Create("http://test.com/test#")); > > //Create nodes and a tuple > IUriNode subject1Node = graph.CreateUriNode("data:subject1"); > IUriNode object1Node = graph.CreateUriNode("data:object1"); > IUriNode subject2Node = graph.CreateUriNode("data:subject2"); > IUriNode object2Node = graph.CreateUriNode("data:object2"); > IUriNode predicateNode = graph.CreateUriNode("data:predicate"); > graph.Assert(newTriple(subject1Node, object1Node, predicateNode)); > graph.Assert(newTriple(subject2Node, object2Node, predicateNode)); > > //Print out local graph > System.Console.WriteLine("Printing local graph"); > CompressingTurtleWriter writer = newCompressingTurtleWriter(); > var sw = new System.IO.StringWriter(); > writer.Save(graph, sw); > Console.WriteLine(sw.ToString()); > > //Save graph to triplestore > FusekiConnector store = > newFusekiConnector(newUri(fusekiTripleStore)); > store.SaveGraph(graph); > > //Query store for all data, including the graph it belongs to > StringBuilder sb = newStringBuilder(); > sb.Append("select ?graph ?subject ?predicate > ?object"+Environment.NewLine); > sb.Append("where" + Environment.NewLine); > sb.Append("{" + Environment.NewLine); > sb.Append("GRAPH ?graph" + Environment.NewLine); > sb.Append("{" + Environment.NewLine); > sb.Append("?subject ?predicate ?object" + Environment.NewLine); > sb.Append("}" + Environment.NewLine); > sb.Append("}" + Environment.NewLine); > > Object results = store.Query(sb.ToString()); > SparqlResultSet rset = (SparqlResultSet)results; > > //Print results > System.Console.WriteLine("Printing query results"); > foreach (SparqlResult r in rset) > { > System.Console.WriteLine(r.ToString()); > } > } > > Output: > Printing local graph > @base <http://test.com/test#graph1>. > > @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>. > @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. > @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. > @prefix data: <http://test.com/test#>. > > <http://test.com/test#subject1> <http://test.com/test#object1> > <http://test.com/test#predicate>. > <http://test.com/test#subject2> <http://test.com/test#object2> > <http://test.com/test#predicate>. > > Printing query results > ?graph = http://test.com/test , ?subject = http://test.com/test#subject1 , > ?predicate = http://test.com/test#object1 , ?object = > http://test.com/test#predicate > ?graph = http://test.com/test , ?subject = http://test.com/test#subject2 , > ?predicate = http://test.com/test#object2 , ?object = > http://test.com/test#predicate > > As can be seen, the #graph1 component of the graph¹s URI hasn¹t been bound to > the ?graph variable of the query (although it is present in the local graph > object). Additionally, I tried doing the above with Apache-Jena components. > Using s-query to execute following query confirms the missing fragment from > the graph URI: > > Command line: > ${JENA_HOME}/s-query --service ${FUSEKI_SERVER}/query --query > ${TURTLE_HOME}/query.arq > > Where query.arq contains the query: > > select ?g ?s ?p ?o > where > { > GRAPH ?g > { > ?s ?p ?o . > } > } > > The results are as follows (note the missing #graph1): > > { > "head": { > "vars": [ "g" , "s" , "p" , "o" ] > } , > "results": { > "bindings": [ > { > "g": { "type": "uri" , "value": "http://test.com/test" } , > "s": { "type": "uri" , "value": "http://test.com/test#subject1" } , > "p": { "type": "uri" , "value": "http://test.com/test#object1" } , > "o": { "type": "uri" , "value": "http://test.com/test#predicate" } > } , > { > "g": { "type": "uri" , "value": "http://test.com/test" } , > "s": { "type": "uri" , "value": "http://test.com/test#subject2" } , > "p": { "type": "uri" , "value": "http://test.com/test#object2" } , > "o": { "type": "uri" , "value": "http://test.com/test#predicate" } > } > ] > } > } > > In an attempt to eliminate Fuseki as a souce of the problem I inserted the > same triples into the same graph using Apache-Jena: > > Command line: > ${JENA_HOME}/s-put ${FUSEKI_SERVER}/data http://test.com/test#graph1 > named_graph_bug_example.ttl > > Where ³named_graph_bug_example.ttl² contains the following content: > > PREFIX data: <http://test.com/test#> > > data:subject1 data:predicate data:object1 . > data:subject2 data:predicate data:object2 . > > Rerunning the original command now shows the new tuples in a graph with the > correct URI (alongside the old tuples): > > { > "head": { > "vars": [ "g" , "s" , "p" , "o" ] > } , > "results": { > "bindings": [ > { > "g": { "type": "uri" , "value": "http://test.com/test" } , > "s": { "type": "uri" , "value": "http://test.com/test#subject1" } , > "p": { "type": "uri" , "value": "http://test.com/test#object1" } , > "o": { "type": "uri" , "value": "http://test.com/test#predicate" } > } , > { > "g": { "type": "uri" , "value": "http://test.com/test#graph1" } , > "s": { "type": "uri" , "value": "http://test.com/test#subject1" } , > "p": { "type": "uri" , "value": "http://test.com/test#predicate" } , > "o": { "type": "uri" , "value": "http://test.com/test#object1" } > } , > { > "g": { "type": "uri" , "value": "http://test.com/test#graph1" } , > "s": { "type": "uri" , "value": "http://test.com/test#subject2" } , > "p": { "type": "uri" , "value": "http://test.com/test#predicate" } , > "o": { "type": "uri" , "value": "http://test.com/test#object2" } > } , > { > "g": { "type": "uri" , "value": "http://test.com/test" } , > "s": { "type": "uri" , "value": "http://test.com/test#subject2" } , > "p": { "type": "uri" , "value": "http://test.com/test#object2" } , > "o": { "type": "uri" , "value": "http://test.com/test#predicate" } > } > ] > } > } > > It¹s entirely possible that I¹ve misunderstood how to set a graph¹s URI in > dotnetRDF, but it looks like there may be another problem here. Let me know if > you need any more information. > > Regards, Chris. > IMPORTANT: This email remains the property of the Department of Defence and is > subject to the jurisdiction of section 70 of the Crimes Act 1914. If you have > received this email in error, you are requested to contact the sender and > delete the email. > ------------------------------------------------------------------------------ > CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More > Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development > Environments & Everything In Between. Get a Quote or Start a Free Trial Today. > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk__ > _____________________________________________ dotNetRDF-bugs mailing list > dot...@li...https://lists.sourceforge.net/lists/listin > fo/dotnetrdf-bugs |