make yoga threadsafe #852

Closed
tcdat96 wants to merge 2 commits from yoga_threadsafe into master
tcdat96 commented 2019-01-08 02:46:39 -08:00 (Migrated from github.com)

Continuing https://github.com/facebook/yoga/pull/791
@nokia6686 is a former member of our team, so we are trying to pick up what he left and carry out the pull request.

Solution

Improved from previous solution with @jpap's suggestions.

  1. Using atomic for gNodeInstanceCount and gConfigInstanceCount
std::atomic<int32_t> gNodeInstanceCount(0);
std::atomic<int32_t> gConfigInstanceCount(0);
  1. Passing gDepth and gCurrentGenerationCount (renamed to depth and generationCount respectively) between function calls that stem from YGNodeCalculateLayout.
    In YGNodeCalculateLayout, pass depth as value 0, to indicate the root depth.
  2. Using #define with gPrintChanges and gPrintSkips to optimized out unnecessary logging and incrementing depth state.
#define PRINT_CHANGES false
#define PRINT_SKIPS false
  1. Prevent memory leak by removing static keyword in YGConfigGetDefault
Continuing https://github.com/facebook/yoga/pull/791 @nokia6686 is a former member of our team, so we are trying to pick up what he left and carry out the pull request. # Solution Improved from previous solution with @jpap's suggestions. 1. Using **atomic** for ```gNodeInstanceCount``` and ```gConfigInstanceCount``` ``` std::atomic<int32_t> gNodeInstanceCount(0); std::atomic<int32_t> gConfigInstanceCount(0); ``` 2. Passing ```gDepth``` and ```gCurrentGenerationCount``` (renamed to **_depth_** and **_generationCount_** respectively) between function calls that stem from ```YGNodeCalculateLayout```. In ```YGNodeCalculateLayout```, pass ```depth``` as value 0, to indicate the root depth. 3. Using ```#define``` with ```gPrintChanges``` and ```gPrintSkips``` to optimized out unnecessary logging and incrementing depth state. ``` #define PRINT_CHANGES false #define PRINT_SKIPS false ``` 4. Prevent memory leak by removing **static** keyword in ```YGConfigGetDefault```
facebook-github-bot commented 2019-01-08 02:46:48 -08:00 (Migrated from github.com)

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. **If you are contributing on behalf of someone else (eg your employer)**, the individual CLA may not be sufficient and your employer may need the corporate CLA signed. If you have received this in error or have any questions, please contact us at [cla@fb.com](mailto:cla@fb.com?subject=CLA%20for%20facebook%2Fyoga%20%23852). Thanks!
facebook-github-bot commented 2019-01-08 02:49:18 -08:00 (Migrated from github.com)

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!
tcdat96 commented 2019-01-08 03:25:54 -08:00 (Migrated from github.com)

The check also failed with master branch with the same error (https://travis-ci.org/facebook/yoga/builds/475317585), I think it should be irrelevant to this pull request.

The check also failed with master branch with the same error (https://travis-ci.org/facebook/yoga/builds/475317585), I think it should be irrelevant to this pull request.
jpap (Migrated from github.com) requested changes 2019-01-11 13:57:38 -08:00
jpap (Migrated from github.com) left a comment

Thanks for taking the time to refactor the PR, @tcdat96. Some further comments given.

Thanks for taking the time to refactor the PR, @tcdat96. Some further comments given.
jpap (Migrated from github.com) commented 2019-01-11 13:35:51 -08:00

The {computedFlexBasisGeneration, generationCount} members of the YGLayout type should be updated to 16-bit unsigned.

The {`computedFlexBasisGeneration`, `generationCount`} members of the `YGLayout` type should be updated to 16-bit unsigned.
jpap (Migrated from github.com) commented 2019-01-11 13:48:48 -08:00

This is probably OK, but it is safer to use #define XXX 0 here for safer compatibility against various pre-processors, that look for a "C expression of integer type" (gcc) or "must have integral type and can include only integer constants, character constants, and the defined operator" (msvc).

Alternatively, you could use #define XXX FALSE where FALSE is suitably defined after an #if !defined(FALSE) check in a central header.

This is probably OK, but it is safer to use `#define XXX 0` here for safer compatibility against various pre-processors, that look for a "C expression of integer type" (gcc) or "must have integral type and can include only integer constants, character constants, and the defined operator" (msvc). Alternatively, you could use `#define XXX FALSE` where `FALSE` is suitably defined after an `#if !defined(FALSE)` check in a central header.
jpap (Migrated from github.com) commented 2019-01-11 13:25:18 -08:00

I suspect the generationCount should remain as a member of the YGNodeRef, and incremented on each invocation of YGNodeCalculateLayout(...) so that subsequent calls can reuse computed layout when things don't change (given by an equal generation). Subsequently passing the new generation counter around by function call as you've done is a good idea, to avoid the pointer-to-root situation of the last PR.

I suspect the `generationCount` should remain as a member of the `YGNodeRef`, and incremented on each invocation of `YGNodeCalculateLayout(...)` so that subsequent calls can reuse computed layout when things don't change (given by an equal generation). Subsequently passing the new generation counter around by function call as you've done is a good idea, to avoid the pointer-to-root situation of the last PR.
jpap (Migrated from github.com) commented 2019-01-11 13:52:48 -08:00

It would be better to use int for the depth type, so as not to limit to a tree height of 65,535. (The generation count, a separate thing, is likely OK to wrap at 16-bits.)

To simplify this further you might consider removing the variable declaration, and writing it directly as an argument when invoking YGLayoutNodeInternal below. For example:

  if (YGLayoutNodeInternal(
          node,
          width,
          height,
          ownerDirection,
          widthMeasureMode,
          heightMeasureMode,
          ownerWidth,
          ownerHeight,
	  true,
	  "initial",
	  node->getConfig(),
	  0, // tree root
	  generationCount)) {
It would be better to use `int` for the depth type, so as not to limit to a tree height of 65,535. (The generation count, a separate thing, is likely OK to wrap at 16-bits.) To simplify this further you might consider removing the variable declaration, and writing it directly as an argument when invoking `YGLayoutNodeInternal` below. For example: ``` if (YGLayoutNodeInternal( node, width, height, ownerDirection, widthMeasureMode, heightMeasureMode, ownerWidth, ownerHeight, true, "initial", node->getConfig(), 0, // tree root generationCount)) { ```
tcdat96 (Migrated from github.com) reviewed 2019-01-14 20:22:10 -08:00
tcdat96 (Migrated from github.com) commented 2019-01-14 20:22:10 -08:00

I've refactor other points according to your advice, but I find this comment a little confusing.

As far as I understand, generationCount should remain as a member of the YGNodeRef, then in YGNodeCalculateLayout(), it should be passed to subsequent calls as a pointer and incremented as needed. But with this approach, we will have to store the root reference in every node, and perform a full subtree traversal with setChildRoot, the same as the last PR, which would be quite expensive as you said. How exactly can we handle this issue, @jpap?

I've refactor other points according to your advice, but I find this comment a little confusing. As far as I understand, ```generationCount``` should remain as a member of the ```YGNodeRef```, then in ```YGNodeCalculateLayout()```, it should be passed to subsequent calls as a pointer and incremented as needed. But with this approach, we will have to store the root reference in every node, and perform a full subtree traversal with ```setChildRoot```, the same as the last PR, which would be quite expensive as you said. How exactly can we handle this issue, @jpap?
jpap (Migrated from github.com) reviewed 2019-01-14 23:00:18 -08:00
jpap (Migrated from github.com) commented 2019-01-14 23:00:17 -08:00

My apologies; what I meant there was:

  • Store the generation in the YGNodeRef, with a per-node uint16_t two-byte overhead: let's refer to it as generationCount_ below.
  • When we call YGNodeCalculateLayout, we're passing the root node for the subtree to be laid out, so we can immediately increment (update) generationCount_ of that root node, and then assign the newly incremented value to the generationCount local variable you have defined, instead of always starting from the zero value. (There is a second increment in the same function, where we would do the same as above.)
    • For a single tree, when we go back to re-compute a subsequent layout, we end up using the equivalent value of generationCount as the existing global-variable based implementation. That's nice.
    • For a second (and subsequent) tree, it gets its own independent generationCount, which is also nice because each tree is now separate. (Thread-safety also follows.)
    • When we share (clone) nodes, and use an arbitrary descendent of the original tree root to lay out a now-independent subtree, we will use a new version of the generationCount, likely zero. This may result in some nodes being recomputed as the generation now differs. If this is an issue, we could update YGNodeClone so as to follow the YGNode.owner_ path up to the root, to copy the root generationCount when a node is cloned. Either way, such overhead (re-computation or traversal) is limited to cloning. (I did not consider this earlier; your thoughts here are appreciated.)
  • Once we have seeded the generationCount local variable in YGNodeCalculateLayout, we can pass it around on the stack, as the layout proceeds down the tree, thus avoiding the pointer-to-the-root issue.
My apologies; what I meant there was: * Store the generation in the `YGNodeRef`, with a per-node `uint16_t` two-byte overhead: let's refer to it as `generationCount_` below. * When we call `YGNodeCalculateLayout`, we're passing the root node for the subtree to be laid out, so we can immediately increment (update) `generationCount_` of that root node, and then assign the newly incremented value to the `generationCount` local variable you have defined, instead of always starting from the zero value. (There is a second increment in the same function, where we would do the same as above.) * For a single tree, when we go back to re-compute a subsequent layout, we end up using the equivalent value of `generationCount` as the existing global-variable based implementation. That's nice. * For a second (and subsequent) tree, it gets its own independent `generationCount`, which is also nice because each tree is now separate. (Thread-safety also follows.) * When we share (clone) nodes, and use an arbitrary descendent of the original tree root to lay out a now-independent subtree, we will use a new version of the `generationCount`, likely zero. This may result in some nodes being recomputed as the generation now differs. If this is an issue, we could update `YGNodeClone` so as to follow the `YGNode.owner_` path up to the root, to copy the root `generationCount` when a node is cloned. Either way, such overhead (re-computation or traversal) is limited to cloning. (I did not consider this earlier; your thoughts here are appreciated.) * Once we have seeded the `generationCount` local variable in `YGNodeCalculateLayout`, we can pass it around on the stack, as the layout proceeds down the tree, thus avoiding the pointer-to-the-root issue.
jpap (Migrated from github.com) reviewed 2019-01-14 23:01:30 -08:00
jpap (Migrated from github.com) commented 2019-01-14 23:01:29 -08:00

What you need here is simply a non-zero integer value.

#define TRUE 1
What you need here is simply a non-zero integer value. ```suggestion #define TRUE 1 ```
tcdat96 (Migrated from github.com) reviewed 2019-01-15 02:46:18 -08:00
tcdat96 (Migrated from github.com) commented 2019-01-15 02:46:18 -08:00

Please take a look at the latest commit, @jpap. When cloning nodes, I went with the traversal approach, thought that would be much cheaper than re-computing the differing nodes.
If there's still any issue, please let me know.

Please take a look at the latest commit, @jpap. When cloning nodes, I went with the traversal approach, thought that would be much cheaper than re-computing the differing nodes. If there's still any issue, please let me know.
jpap (Migrated from github.com) reviewed 2019-01-15 02:56:34 -08:00
jpap (Migrated from github.com) commented 2019-01-15 02:56:34 -08:00

Looks like this loop always terminates early: the parent is set to nullptr above the loop on line 242.

Might be a good idea to add a test? ;-)

Looks like this loop always terminates early: the parent is set to `nullptr` above the loop on line 242. Might be a good idea to add a test? ;-)
tcdat96 (Migrated from github.com) reviewed 2019-01-15 03:11:30 -08:00
tcdat96 (Migrated from github.com) commented 2019-01-15 03:11:30 -08:00

Oh my apologies, thought it just a simple loop, so didn't test it thoroughly :-(
Okay, let me add a test 👍

Oh my apologies, thought it just a simple loop, so didn't test it thoroughly :-( Okay, let me add a test :+1:
jpap (Migrated from github.com) reviewed 2019-01-15 03:37:51 -08:00
jpap (Migrated from github.com) commented 2019-01-15 03:37:51 -08:00

No worries.

Thinking about this a bit more, I'm not sure we need to do this extra work on clone: the generation is incremented each time we perform a YGNodeCalculateLayout, so even if we preserved the root generation counter for a cloned subtree, we're still incrementing it on a call to YGNodeCalculateLayout for that subtree and before we proceed with any work. Any comparison to the old generation count will be false.

Is the counter used to help avoid node recalculations during multiple passes of a given layout (generation) calculation? I'd need to spend some time to really understand how the generation counter is used for this; and unfortunately the code isn't terribly straightforward or self-documenting. If you're on the FB team, can you ask the original author for a short-cut?

It might be that all is needed is for the generation counter to be monotonically increase on each call to YGNodeCalculateLayout, which means we'll still have to stash it in the root node (if so, the clone stuff should remain). What happens when it wraps to zero?

No worries. Thinking about this a bit more, I'm not sure we need to do this extra work on clone: the generation is incremented each time we perform a `YGNodeCalculateLayout`, so even if we preserved the root generation counter for a cloned subtree, we're still incrementing it on a call to `YGNodeCalculateLayout` for that subtree and before we proceed with any work. Any comparison to the old generation count will be false. Is the counter used to help avoid node recalculations **during** multiple passes of a given layout (generation) calculation? I'd need to spend some time to really understand how the generation counter is used for this; and unfortunately the code isn't terribly straightforward or self-documenting. If you're on the FB team, can you ask the original author for a short-cut? It might be that all is needed is for the generation counter to be monotonically increase on each call to `YGNodeCalculateLayout`, which means we'll still have to stash it in the root node (if so, the clone stuff should remain). What happens when it wraps to zero?
tcdat96 (Migrated from github.com) reviewed 2019-01-15 04:11:08 -08:00
tcdat96 (Migrated from github.com) commented 2019-01-15 04:11:08 -08:00

Unfortunately we aren't from the FB team and neither do we have a full understanding how generetionCount works. My team encounter the race condition problem several months ago, so we opened the previous pull request to try to fix that.
I actually thought you are the one on the FB team ;-)

Unfortunately we aren't from the FB team and neither do we have a full understanding how generetionCount works. My team encounter the race condition problem several months ago, so we opened the previous pull request to try to fix that. I actually thought you are the one on the FB team ;-)
rockerhieu (Migrated from github.com) reviewed 2019-01-15 19:45:26 -08:00
rockerhieu (Migrated from github.com) commented 2019-01-15 19:45:26 -08:00

I would suggest us to stick to the original idea of #791 because it focus on solving the data race condition issue without worrying about optimizing the generationCount itself. This can be done in a separated PR and leave this PR with minimal changes.

I would suggest us to stick to the original idea of #791 because it focus on solving the data race condition issue without worrying about optimizing the `generationCount` itself. This can be done in a separated PR and leave this PR with minimal changes.
jpap (Migrated from github.com) reviewed 2019-01-15 20:44:12 -08:00
jpap (Migrated from github.com) commented 2019-01-15 20:44:12 -08:00

I would suggest us to stick to the original idea of #791 because it focus on solving the data race condition issue without worrying about optimizing the generationCount itself. This can be done in a separated PR and leave this PR with minimal changes.

That would be unwise because #791:

  1. uses more heap than necessary;
  2. introduces an expensive subtree traversal to set up the pointer-to-the-root, which is performed often; and
  3. does not address node cloning.

I am reading through the code to understand how the generation is treated and will report back shortly.

> I would suggest us to stick to the original idea of #791 because it focus on solving the data race condition issue without worrying about optimizing the `generationCount` itself. This can be done in a separated PR and leave this PR with minimal changes. That would be unwise because #791: 1. uses more heap than necessary; 1. introduces an expensive subtree traversal to set up the pointer-to-the-root, which is performed often; and 1. does not address node cloning. I am reading through the code to understand how the generation is treated and will report back shortly.
jpap (Migrated from github.com) reviewed 2019-01-15 22:45:36 -08:00
jpap (Migrated from github.com) commented 2019-01-15 22:45:36 -08:00

Here's what I've learned from reading the code...

There are two generation counters: generationCount and computedFlexBasisGeneration stored in YGLayout, which are compared to the global gCurrentGenerationCount. An overview of each is given below.

At the end, I'll provide a few options on how to move forward with this PR. Apologies for the length in advance.

YGLayout::generationCount

  1. This counter is used to invalidate the layout cache in YGLayoutNodeInternal when a node is dirty, and the node has not been visited thus far for a given YGNodeCalculateLayout call.
  2. It is updated on exit of the function whenever layout is performed.
  3. In YGNodeCalculateLayout, the gCurrentGenerationCount is incremented a second time before performing the optional a legacy "stretch behavior" comparison on a cloned tree that has all nodes marked dirty. The new gCurrentGenerationCount here is to ensure the layout cache is cleared as above.
  4. It looks like the purpose of this counter is to ensure the layout cache is invalidated whenever a node is dirty once per YGNodeCalculateLayout call.

YGLayout::computedFlexBasisGeneration

  1. This counter is used exclusively in YGNodeComputeFlexBasisForChild.
  2. It gates computation of a child's computed flex basis when the YGExperimentalFeatureWebFlexBasis feature is in use.
  3. It is updated on exit of the function every time.
  4. It looks like the purpose of this counter is to limit the above computation once per YGNodeCalculateLayout call.

gCurrentGenerationCount

  1. This counter is incremented in YGNodeCalculateLayout before performing a full-tree layout.
  2. It looks like the purpose of this counter is to uniquely identify a given layout run, which makes sense when you consider its name: "generation" implying that each layout gives birth to a new tree.
  3. Because of the way the counter is used (compare with equality only; node update on exit of relevant function), I don't think it matters when it wraps. This means we could probably get away with an even smaller storage type (e.g. uint8_t) for the global counter and associated class member storage. For cloning, what is important is that we use a value different to the "last time".
  4. With the above in mind, the comments in the source now make more sense (to me):
    // Instead of recomputing the entire layout every single time, we
    // cache some information to break early when nothing changed
    
    and
    // Increment the generation count. This will force the recursive routine to
    // visit
    // all dirty nodes at least once. Subsequent visits will be skipped if the
    // input
    // parameters don't change.
    

Moving forward

Some options:

  1. Minimize changes:
    • Drop the YGNodeClone change.
    • Remove the generationCount_ member from YGNode and bring back the global gCurrentGenerationCount.
    • Increment and retrieve gCurrentGenerationCount atomically in YGNodeCalculateLayout and use it to seed the generationCount stack variable that is passed around by function call.
    • Keep the other changes in the current PR, relating to depth and PRINT_XXX.
  2. Minimize changes with some further heap memory savings:
    • As above, but using the uint8_t type for the generation counters.
  3. Minimize heap memory use:
    • Drop the gCurrentGenerationCount global.
    • Drop the computedFlexBasisGeneration and generationCount members of YGLayout.
    • Declare a new std::map<YGNode,uint8_t> stack variable in YGNodeCalculateLayout that is passed around like depth is in the current PR. (Or to minimize function call overhead, the depth and a pointer to the map can be placed in a struct, and a pointer to that struct is passed around instead.)
    • The uint8_t value field of the map is a bitfield:
      • define bit 0 as "I have visited the node" in lieu of the existing layout->generationCount != gCurrentGenerationCount check in YGLayoutNodeInternal.
      • define bit 1 as "I have computed the flex basis for this child" in lieu of the existing computedFlexBasisGeneration != gCurrentGenerationCount check in YGNodeComputeFlexBasisForChild.
    • Throw out the map on return of YGNodeCalculateLayout.

What do you think?

Here's what I've learned from reading the code... There are two generation counters: `generationCount` and `computedFlexBasisGeneration` stored in `YGLayout`, which are compared to the global `gCurrentGenerationCount`. An overview of each is given below. At the end, I'll provide a few options on how to move forward with this PR. Apologies for the length in advance. #### YGLayout::generationCount 1. This counter is used to invalidate the layout cache in `YGLayoutNodeInternal` when a node is dirty, and the node has not been visited thus far for a given `YGNodeCalculateLayout` call. 1. It is updated on exit of the function whenever layout is performed. 1. In `YGNodeCalculateLayout`, the `gCurrentGenerationCount` is incremented a second time before performing the optional a legacy "stretch behavior" comparison on a cloned tree that has all nodes marked dirty. The new `gCurrentGenerationCount` here is to ensure the layout cache is cleared as above. 1. It looks like the purpose of this counter is to ensure the layout cache is invalidated whenever a node is dirty once per `YGNodeCalculateLayout` call. #### YGLayout::computedFlexBasisGeneration 1. This counter is used exclusively in `YGNodeComputeFlexBasisForChild`. 1. It gates computation of a child's computed flex basis when the `YGExperimentalFeatureWebFlexBasis` feature is in use. 1. It is updated on exit of the function every time. 1. It looks like the purpose of this counter is to limit the above computation once per `YGNodeCalculateLayout` call. #### gCurrentGenerationCount 1. This counter is incremented in `YGNodeCalculateLayout` before performing a full-tree layout. 1. It looks like the purpose of this counter is to uniquely identify a given layout run, which makes sense when you consider its name: "generation" implying that each layout gives birth to a new tree. 1. Because of the way the counter is used (compare with equality only; node update on exit of relevant function), I don't think it matters when it wraps. This means we could probably get away with an even smaller storage type (e.g. `uint8_t`) for the global counter and associated class member storage. For cloning, what is important is that we use a value different to the "last time". 1. With the above in mind, the comments in the source now make more sense (to me): ``` // Instead of recomputing the entire layout every single time, we // cache some information to break early when nothing changed ``` and ``` // Increment the generation count. This will force the recursive routine to // visit // all dirty nodes at least once. Subsequent visits will be skipped if the // input // parameters don't change. ``` ### Moving forward Some options: 1. Minimize changes: * Drop the `YGNodeClone` change. * Remove the `generationCount_` member from `YGNode` and bring back the global `gCurrentGenerationCount`. * Increment and retrieve `gCurrentGenerationCount` atomically in `YGNodeCalculateLayout` and use it to seed the `generationCount` stack variable that is passed around by function call. * Keep the other changes in the current PR, relating to `depth` and `PRINT_XXX`. 1. Minimize changes with some further heap memory savings: * As above, but using the `uint8_t` type for the generation counters. 1. Minimize heap memory use: * Drop the `gCurrentGenerationCount` global. * Drop the `computedFlexBasisGeneration` and `generationCount` members of `YGLayout`. * Declare a new `std::map<YGNode,uint8_t>` stack variable in `YGNodeCalculateLayout` that is passed around like `depth` is in the current PR. (Or to minimize function call overhead, the `depth` and a pointer to the map can be placed in a struct, and a pointer to that struct is passed around instead.) * The `uint8_t` value field of the map is a bitfield: * define bit 0 as "I have visited the node" in lieu of the existing `layout->generationCount != gCurrentGenerationCount` check in `YGLayoutNodeInternal`. * define bit 1 as "I have computed the flex basis for this child" in lieu of the existing `computedFlexBasisGeneration != gCurrentGenerationCount` check in `YGNodeComputeFlexBasisForChild`. * Throw out the map on return of `YGNodeCalculateLayout`. What do you think?
tcdat96 (Migrated from github.com) reviewed 2019-01-16 02:59:39 -08:00
tcdat96 (Migrated from github.com) commented 2019-01-16 02:59:39 -08:00

The implementation for the first and second options would be quite straightforward, but I'm having a hard time understanding the third optimization. How exactly can we substitute those 3 variables with the map approach? Personally, I find it really risky to make such a big change, and we don't really have the confidence in our understanding of Yoga to carry out the implementation by ourselves neither.

The implementation for the first and second options would be quite straightforward, but I'm having a hard time understanding the third optimization. How exactly can we substitute those 3 variables with the map approach? Personally, I find it really risky to make such a big change, and we don't really have the confidence in our understanding of Yoga to carry out the implementation by ourselves neither.
jpap (Migrated from github.com) reviewed 2019-01-16 18:42:20 -08:00
jpap (Migrated from github.com) commented 2019-01-16 18:42:20 -08:00

I've submitted PR #855 that takes the third approach: it appears to work well!

So from here, let's see if @davidaurelio or others on the FB team can offer their input. @tcdat96, in the meantime if you have time to update your PR with option 2, it can give others an opportunity to review both PRs and weigh in.

Hopefully we can land this thread-safety fix soon. I'm happy to amend my PR if there are preferences on style, etc.

I've submitted PR #855 that takes the third approach: it appears to work well! So from here, let's see if @davidaurelio or others on the FB team can offer their input. @tcdat96, in the meantime if you have time to update your PR with option 2, it can give others an opportunity to review both PRs and weigh in. Hopefully we can land this thread-safety fix soon. I'm happy to amend my PR if there are preferences on style, etc.
tcdat96 (Migrated from github.com) reviewed 2019-01-17 01:51:20 -08:00
tcdat96 (Migrated from github.com) commented 2019-01-17 01:51:20 -08:00

I updated the PR with the first 2 options, but ran into error undefined reference to `pthread_getspecific' when running gtest. Have you ever encountered a similar problem, @jpap?

I updated the PR with the first 2 options, but ran into error ```undefined reference to `pthread_getspecific'``` when running gtest. Have you ever encountered a similar problem, @jpap?
jpap (Migrated from github.com) reviewed 2019-01-17 16:54:57 -08:00
jpap (Migrated from github.com) commented 2019-01-17 16:54:57 -08:00

error undefined reference to `pthread_getspecific' when running gtest.

Sadly I've never experienced that; sounds like a linker configuration problem? Perhaps you've inadvertently modified some of the build configuration someplace. I would try:

  • Clean and rebuild?
  • Create a new local branch based off master, then cherry-pick only your changes then rebuild/test.
> error `` undefined reference to `pthread_getspecific' `` when running gtest. Sadly I've never experienced that; sounds like a linker configuration problem? Perhaps you've inadvertently modified some of the build configuration someplace. I would try: * Clean and rebuild? * Create a new local branch based off master, then cherry-pick **only** your changes then rebuild/test.
tcdat96 (Migrated from github.com) reviewed 2019-01-20 18:50:29 -08:00
tcdat96 (Migrated from github.com) commented 2019-01-20 18:50:29 -08:00

I've already tried both but unfortunately nothing works. The error appeared even with the master commit so I think it should be my configuration problem, but sadly I don't really have any experience in this type of problem :-(

I've already tried both but unfortunately nothing works. The error appeared even with the master commit so I think it should be my configuration problem, but sadly I don't really have any experience in this type of problem :-(
tcdat96 (Migrated from github.com) reviewed 2019-01-28 00:06:43 -08:00
tcdat96 (Migrated from github.com) commented 2019-01-28 00:06:43 -08:00

It seems that the error only emerges in our ubuntu machines, the same source code runs without any issue in MacOS.
I modified the BUCK file in lib/gtest with the workaround described in https://github.com/facebook/buck/issues/1443 and it worked. Now this PR runs successfully with all tests passed.

If you have the time, please take a look at both PRs, @davidaurelio :-)

It seems that the error only emerges in our ubuntu machines, the same source code runs without any issue in MacOS. I modified the BUCK file in lib/gtest with the workaround described in https://github.com/facebook/buck/issues/1443 and it worked. Now this PR runs successfully with all tests passed. If you have the time, please take a look at both PRs, @davidaurelio :-)
davidaurelio (Migrated from github.com) requested changes 2019-01-28 12:34:15 -08:00
davidaurelio (Migrated from github.com) left a comment

Thank you for your contribution. I see good things in here, e.g. passing generation count and depth along as function parameters.

  1. However, I don’t think we really need atomics here. Not using them wouldn’t affect correctness, and avoid the potential bus overhead.

  2. is awesome

  3. please use constexpr – check my inline comments. And please put it in a separate pull request. I will happily merge it.

  4. see inline comment – I don’t agree with the notion of a leak, and would like to retain the behaviour of having a single default config. If you feel strongly about running a destructor when terminating the process, it could be wrapped into a static unique_ptr. But I don’t think we have to do this at all.

Thank you for your contribution. I see good things in here, e.g. passing generation count and depth along as function parameters. 1. However, I don’t think we really need atomics here. Not using them wouldn’t affect correctness, and avoid the potential bus overhead. 2. is awesome 3. please use `constexpr` – check my inline comments. And please put it in a separate pull request. I will happily merge it. 4. see inline comment – I don’t agree with the notion of a leak, and would like to retain the behaviour of having a single default config. If you feel strongly about running a destructor when terminating the process, it could be wrapped into a static `unique_ptr`. But I don’t think we have to do this at all.
@@ -24,3 +24,3 @@
uint32_t computedFlexBasisGeneration = 0;
uint8_t computedFlexBasisGeneration = 0;
YGFloatOptional computedFlexBasis = {};
davidaurelio (Migrated from github.com) commented 2019-01-28 11:43:15 -08:00

What was the motivation for this change? Due to alignment requirements, this won’t even save memory without reordering data members.

What was the motivation for this change? Due to alignment requirements, this won’t even save memory without reordering data members.
davidaurelio (Migrated from github.com) commented 2019-01-28 11:44:04 -08:00

This would get exposed to every importer of Yoga.h. Let’s not do this here, please.

This would get exposed to every importer of `Yoga.h`. Let’s not do this here, please.
davidaurelio (Migrated from github.com) commented 2019-01-28 11:54:22 -08:00

While this serialises the access to these variables, it causes additional synchronisation needs between cores when creating nodes on different threads in parallel.

At this point, I’d rather let the count go out of sync, instead of adding any bus overhead. Its accuracy isn’t a correctness problem, is it?

While this serialises the access to these variables, it causes additional synchronisation needs between cores when creating nodes on different threads in parallel. At this point, I’d rather let the count go out of sync, instead of adding any bus overhead. Its accuracy isn’t a correctness problem, is it?
davidaurelio (Migrated from github.com) commented 2019-01-28 11:55:09 -08:00

Is this to avoid the “leak”? This is a statically constructed instance that will live until the process exits, which seems perfectly fine. Could you explain which problem you are trying to solve?

Is this to avoid the “leak”? This is a statically constructed instance that will live until the process exits, which seems perfectly fine. Could you explain which problem you are trying to solve?
davidaurelio (Migrated from github.com) commented 2019-01-28 12:20:38 -08:00

Again, this causes additional synchronisation, and I am not sure it’s necessary:

this will usually be loaded into a register, incremented, and written back to memory.

When two threads do this in parallel, they might both load the old value, increment by one, and write back – with the global value being incremented by one, not two, as a result.

That will still increment the count, and cause layouts to be run – there isn’t really a need to synchronise it.

Am I missing something?

Minor nitpick, please use uniform initialization (std::atomic<uint8_t> gCurrentGenerationCount{0};)

Again, this causes additional synchronisation, and I am not sure it’s necessary: this will usually be loaded into a register, incremented, and written back to memory. When two threads do this in parallel, they might both load the old value, increment by one, and write back – with the global value being incremented by one, not two, as a result. That will still increment the count, and cause layouts to be run – there isn’t really a need to synchronise it. Am I missing something? Minor nitpick, please use uniform initialization (`std::atomic<uint8_t> gCurrentGenerationCount{0};`)
davidaurelio (Migrated from github.com) commented 2019-01-28 11:59:12 -08:00

constexpr, please. E.g. constexpr bool kPrintChanges = false;

`constexpr`, please. E.g. `constexpr bool kPrintChanges = false;`
davidaurelio (Migrated from github.com) commented 2019-01-28 12:09:06 -08:00

regular if (kPrintChanges)

regular `if (kPrintChanges)`
davidaurelio (Migrated from github.com) commented 2019-01-28 12:01:26 -08:00

move to Yoga.cpp

a) it’s not used here
b) this header needs to be valid C

move to `Yoga.cpp` a) it’s not used here b) this header needs to be valid C
tcdat96 (Migrated from github.com) reviewed 2019-01-28 20:35:38 -08:00
tcdat96 (Migrated from github.com) commented 2019-01-28 20:35:38 -08:00

Honestly, I don't have the full knowledge about how these variables could affect the correctness of Yoga. These variables were flagged by ThreadSanitizer to be a possible data race problem, so I only tried to fix it at the language level.
But if its accuracy is not important to the program's flow, then what's the point of having these variables?

Honestly, I don't have the full knowledge about how these variables could affect the correctness of Yoga. These variables were flagged by ThreadSanitizer to be a possible data race problem, so I only tried to fix it at the language level. But if its accuracy is not important to the program's flow, then what's the point of having these variables?
jpap (Migrated from github.com) reviewed 2019-01-29 18:02:32 -08:00
jpap (Migrated from github.com) commented 2019-01-29 18:02:32 -08:00

These instance tracking variables are only used for tests. You could conditionally compile in this functionality for tests by adding a new preprocessor token in the BUCK file, under TEST_COMPILER_FLAGS.

These instance tracking variables are only used for tests. You could conditionally compile in this functionality for tests by adding a new preprocessor token in the `BUCK` file, under `TEST_COMPILER_FLAGS`.
jpap (Migrated from github.com) reviewed 2019-01-29 18:03:22 -08:00
jpap (Migrated from github.com) commented 2019-01-29 18:03:22 -08:00

These instance tracking variables are only used for tests. You could conditionally compile in this functionality for tests by adding a new preprocessor token in the BUCK file, under TEST_COMPILER_FLAGS.

These instance tracking variables are only used for tests. You could conditionally compile in this functionality for tests by adding a new preprocessor token in the `BUCK` file, under `TEST_COMPILER_FLAGS`.
jpap (Migrated from github.com) reviewed 2019-01-29 18:11:49 -08:00
@@ -24,3 +24,3 @@
uint32_t computedFlexBasisGeneration = 0;
uint8_t computedFlexBasisGeneration = 0;
YGFloatOptional computedFlexBasis = {};
jpap (Migrated from github.com) commented 2019-01-29 18:11:49 -08:00

The motivation is to save heap: there are many bits left over from alignment of the bitfields above it, so you could combine computedFlexBasisGeneration with it. I suspect you can get away with just one bit, so long as the generation counter is the same width (or is truncated with a logical AND mask when the generation is incremented).

The motivation is to save heap: there are many bits left over from alignment of the bitfields above it, so you could combine `computedFlexBasisGeneration` with it. I suspect you can get away with just one bit, so long as the generation counter is the same width (or is truncated with a logical AND mask when the generation is incremented).
tcdat96 (Migrated from github.com) reviewed 2019-03-04 19:50:24 -08:00
tcdat96 (Migrated from github.com) commented 2019-03-04 19:50:24 -08:00

I'll switch to constexpr in another pull request, so we won't need this anymore

I'll switch to constexpr in another pull request, so we won't need this anymore
tcdat96 (Migrated from github.com) reviewed 2019-03-04 20:31:54 -08:00
tcdat96 (Migrated from github.com) commented 2019-03-04 20:31:54 -08:00

This is actually my misunderstanding about how static variables work. This was marked by CppUTest as a memory leak but it appears to be only a false alarm. I'll revert this.

This is actually my misunderstanding about how static variables work. This was marked by CppUTest as a memory leak but it appears to be only a false alarm. I'll revert this.
tcdat96 commented 2019-03-05 00:25:01 -08:00 (Migrated from github.com)

@davidaurelio @jpap Sorry for delaying this for too long, I was a little too busy last month.

In the new update, I've:

  • reverted the atomics
  • reverted the macros. I'll switch to constexpr in another pull request
  • reverted the static mistake

So this pull request will only have 2 changes now:

  1. using uint8_t for generationCount
  2. passing depth and generationCount via function parameters, instead of using global variables.
@davidaurelio @jpap Sorry for delaying this for too long, I was a little too busy last month. In the new update, I've: * reverted the atomics * reverted the macros. I'll switch to ```constexpr``` in another pull request * reverted the static mistake So this pull request will only have 2 changes now: 1. using **uint8_t** for ```generationCount``` 2. passing ```depth``` and ```generationCount``` via function parameters, instead of using global variables.
tcdat96 commented 2019-03-15 03:13:43 -07:00 (Migrated from github.com)

any update, @davidaurelio ?

any update, @davidaurelio ?
rockerhieu commented 2019-04-22 00:46:55 -07:00 (Migrated from github.com)

@davidaurelio @jpap hello?

@davidaurelio @jpap hello?
appleguy commented 2019-05-06 19:35:51 -07:00 (Migrated from github.com)

Hey all - this is still flagging our thread-sanitizer runs, which makes it impossible to find real issues since it happens so frequency. It looks bad for Yoga's usage at a certain big company :).

Could the team find some way to fix or silence the warning with annotations? cc @davidaurelio

Our team had also previously opened this PR (sat for ~2 months): https://github.com/facebook/yoga/pull/770

Hey all - this is still flagging our thread-sanitizer runs, which makes it impossible to find real issues since it happens so frequency. It looks bad for Yoga's usage at a certain big company :). Could the team find some way to fix or silence the warning with annotations? cc @davidaurelio Our team had also previously opened this PR (sat for ~2 months): https://github.com/facebook/yoga/pull/770
rockerhieu commented 2019-05-29 01:28:48 -07:00 (Migrated from github.com)

@davidaurelio it's been several months. Could you take a look at this?

@davidaurelio it's been several months. Could you take a look at this?
davidaurelio commented 2019-05-29 09:03:14 -07:00 (Migrated from github.com)

Hi folks, sorry for being unresponsive.

The aspect of this creating too much noise in sanitizers is new to me, sorry about that. I thought this was mainly due to concerns about the data races themselves.

Let me see how well this merges …

Hi folks, sorry for being unresponsive. The aspect of this creating too much noise in sanitizers is new to me, sorry about that. I thought this was mainly due to concerns about the data races themselves. Let me see how well this merges …
facebook-github-bot (Migrated from github.com) reviewed 2019-05-29 09:03:24 -07:00
facebook-github-bot (Migrated from github.com) left a comment

@davidaurelio has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@davidaurelio has imported this pull request. If you are a Facebook employee, you can view this diff [on Phabricator](https://phabricator.internmc.facebook.com/D15537450).

Pull request closed

Sign in to join this conversation.
No description provided.